Messages from 140750

Article: 140750
Subject: Adders with multiple inputs?
From: sbattazz@yahoo.co.jp
Date: Mon, 25 May 2009 02:11:22 -0700 (PDT)
Links: << >> << T >> << A >>

Hi guys,
At the moment I'm waiting to find out whether I will be using Xilinx
or Actel for my project, and so I'm putting it together for both just
in case.

In the Actel IP cores, there is an array adder which allows a good
number of inputs, and there's some optional pipelining. I figure it's
sufficient to just drop this in and wire up as many inputs as I need.

Xilinx IP cores seem to have only 2-input adders, and I guess these
are probably inferred by XST with the + operator anyway, so I don't
want to bother with the IP core gen unless there's some reason why I
should.
Supposing I want:

Result <= A + B + C + D + E;
Note, I used only five inputs in my example for brevity, I will have
more like 25 in my actual system.

(looking in the XST manual, I can either pad the inputs with leading
zeros or convert to integer and back to std_logic_vector to get carry
bits to fill my wider result)

At the end of the day, when I synthesize this, would there be any
difference between coding it in stages (adding pairs of two together,
then adding their sums together, and so on until all are added up) and
just putting A+B+C+D+E in one statement?
All I can think of is that (depending how well conversions to/from
integer are optimized in XST) I might save a few bits of space in the
first stages.
Using the bit padding method, I suppose that all of the adders in the
first stages would wind up unnecessarily being the same width as the
result.

Anyway, I'm just curious how this will end up working... any insight
appreciated!

Steve

Article: 140751
Subject: Re: Adders with multiple inputs?
From: "Andrew Holme" <ah@nospam.co.uk>
Date: Mon, 25 May 2009 10:43:25 +0100
Links: << >> << T >> << A >>


<sbattazz@yahoo.co.jp> wrote in message 
news:2508079e-f147-4e15-b6bd-ac96f220afbd@s1g2000prd.googlegroups.com...
> Hi guys,
> At the moment I'm waiting to find out whether I will be using Xilinx
> or Actel for my project, and so I'm putting it together for both just
> in case.
>
> In the Actel IP cores, there is an array adder which allows a good
> number of inputs, and there's some optional pipelining. I figure it's
> sufficient to just drop this in and wire up as many inputs as I need.
>
> Xilinx IP cores seem to have only 2-input adders, and I guess these
> are probably inferred by XST with the + operator anyway, so I don't
> want to bother with the IP core gen unless there's some reason why I
> should.
> Supposing I want:
>
> Result <= A + B + C + D + E;
> Note, I used only five inputs in my example for brevity, I will have
> more like 25 in my actual system.
>
> (looking in the XST manual, I can either pad the inputs with leading
> zeros or convert to integer and back to std_logic_vector to get carry
> bits to fill my wider result)
>
> At the end of the day, when I synthesize this, would there be any
> difference between coding it in stages (adding pairs of two together,
> then adding their sums together, and so on until all are added up) and
> just putting A+B+C+D+E in one statement?
> All I can think of is that (depending how well conversions to/from
> integer are optimized in XST) I might save a few bits of space in the
> first stages.
> Using the bit padding method, I suppose that all of the adders in the
> first stages would wind up unnecessarily being the same width as the
> result.
>
> Anyway, I'm just curious how this will end up working... any insight
> appreciated!
>
> Steve

How fast do you need to clock it?  How many bits wide is your result?

Article: 140752
Subject: Re: Adders with multiple inputs?
From: sbattazz@yahoo.co.jp
Date: Mon, 25 May 2009 02:56:43 -0700 (PDT)
Links: << >> << T >> << A >>

On May 25, 6:43=A0pm, "Andrew Holme" <a...@nospam.co.uk> wrote:
> <sbatt...@yahoo.co.jp> wrote in message
>
> news:2508079e-f147-4e15-b6bd-ac96f220afbd@s1g2000prd.googlegroups.com...
>
>
>
> > Hi guys,
> > At the moment I'm waiting to find out whether I will be using Xilinx
> > or Actel for my project, and so I'm putting it together for both just
> > in case.
>
> > In the Actel IP cores, there is an array adder which allows a good
> > number of inputs, and there's some optional pipelining. I figure it's
> > sufficient to just drop this in and wire up as many inputs as I need.
>
> > Xilinx IP cores seem to have only 2-input adders, and I guess these
> > are probably inferred by XST with the + operator anyway, so I don't
> > want to bother with the IP core gen unless there's some reason why I
> > should.
> > Supposing I want:
>
> > Result <=3D A + B + C + D + E;
> > Note, I used only five inputs in my example for brevity, I will have
> > more like 25 in my actual system.
>
> > (looking in the XST manual, I can either pad the inputs with leading
> > zeros or convert to integer and back to std_logic_vector to get carry
> > bits to fill my wider result)
>
> > At the end of the day, when I synthesize this, would there be any
> > difference between coding it in stages (adding pairs of two together,
> > then adding their sums together, and so on until all are added up) and
> > just putting A+B+C+D+E in one statement?
> > All I can think of is that (depending how well conversions to/from
> > integer are optimized in XST) I might save a few bits of space in the
> > first stages.
> > Using the bit padding method, I suppose that all of the adders in the
> > first stages would wind up unnecessarily being the same width as the
> > result.
>
> > Anyway, I'm just curious how this will end up working... any insight
> > appreciated!
>
> > Steve
>
> How fast do you need to clock it? =A0How many bits wide is your result?

Assuming 25 8-bit inputs, the maximum result is 25*255 =3D 6375, meaning
13-bit output.
Serial data comes in at 57.6 kilobits/second =3D 7200 bytes/second, and
the sum of my array is checked once per byte so there will be a little
over 1ms between clock pulses (I can't imagine that being anywhere
near playing with timing issues). For the project I won't need
anything any faster than that.

I'm just wondering how XST would handle such an addition statement
with multiple operands (my synthesis report doesn't say anything about
adders). Is it smart enough to automatically do some kind of tree
algorithm, or would it do a "dumb" array of one adder feeding into the
next for each extra operand?

Thanks for the quick response!

Steve

Article: 140753
Subject: Re: Muli-Cycle Path Constrains in RTL
From: Andy <jonesandy@comcast.net>
Date: Mon, 25 May 2009 06:18:24 -0700 (PDT)
Links: << >> << T >> << A >>

On May 25, 12:50=A0am, Kim Enkovaara <kim.enkova...@iki.fi> wrote:
> luudee wrote:
>
> > In my case the MCP is applied to data that travels from one
> > clock domain to another and is properly "latched" by synchronized
> > control logic. There are many good and bad uses for MCP ...
>
> Isn't clock domain crossing usually specified as false path not
> multicycle path.
>
> --Kim

Most static timing tools will not attempt to verify timing on paths
between two unrelated (i.e. asynchronous) clocks. No false path
constraint is required.

Andy

Article: 140754
Subject: Re: Adders with multiple inputs?
From: Andy <jonesandy@comcast.net>
Date: Mon, 25 May 2009 06:29:46 -0700 (PDT)
Links: << >> << T >> << A >>

On May 25, 4:56=A0am, sbatt...@yahoo.co.jp wrote:
> On May 25, 6:43=A0pm, "Andrew Holme" <a...@nospam.co.uk> wrote:
>
>
>
>
>
> > <sbatt...@yahoo.co.jp> wrote in message
>
> >news:2508079e-f147-4e15-b6bd-ac96f220afbd@s1g2000prd.googlegroups.com...
>
> > > Hi guys,
> > > At the moment I'm waiting to find out whether I will be using Xilinx
> > > or Actel for my project, and so I'm putting it together for both just
> > > in case.
>
> > > In the Actel IP cores, there is an array adder which allows a good
> > > number of inputs, and there's some optional pipelining. I figure it's
> > > sufficient to just drop this in and wire up as many inputs as I need.
>
> > > Xilinx IP cores seem to have only 2-input adders, and I guess these
> > > are probably inferred by XST with the + operator anyway, so I don't
> > > want to bother with the IP core gen unless there's some reason why I
> > > should.
> > > Supposing I want:
>
> > > Result <=3D A + B + C + D + E;
> > > Note, I used only five inputs in my example for brevity, I will have
> > > more like 25 in my actual system.
>
> > > (looking in the XST manual, I can either pad the inputs with leading
> > > zeros or convert to integer and back to std_logic_vector to get carry
> > > bits to fill my wider result)
>
> > > At the end of the day, when I synthesize this, would there be any
> > > difference between coding it in stages (adding pairs of two together,
> > > then adding their sums together, and so on until all are added up) an=
d
> > > just putting A+B+C+D+E in one statement?
> > > All I can think of is that (depending how well conversions to/from
> > > integer are optimized in XST) I might save a few bits of space in the
> > > first stages.
> > > Using the bit padding method, I suppose that all of the adders in the
> > > first stages would wind up unnecessarily being the same width as the
> > > result.
>
> > > Anyway, I'm just curious how this will end up working... any insight
> > > appreciated!
>
> > > Steve
>
> > How fast do you need to clock it? =A0How many bits wide is your result?
>
> Assuming 25 8-bit inputs, the maximum result is 25*255 =3D 6375, meaning
> 13-bit output.
> Serial data comes in at 57.6 kilobits/second =3D 7200 bytes/second, and
> the sum of my array is checked once per byte so there will be a little
> over 1ms between clock pulses (I can't imagine that being anywhere
> near playing with timing issues). For the project I won't need
> anything any faster than that.
>
> I'm just wondering how XST would handle such an addition statement
> with multiple operands (my synthesis report doesn't say anything about
> adders). Is it smart enough to automatically do some kind of tree
> algorithm, or would it do a "dumb" array of one adder feeding into the
> next for each extra operand?
>
> Thanks for the quick response!
>
> Steve- Hide quoted text -
>
> - Show quoted text -

Do you really need to recompute the entire array sum every time, or
can you compute a running sum (accumulator) as the data comes in? You
can also subtract the last discarded term from your running sum if you
are looking for a continuous N-term running sum (as is used in a
boxcar filter, etc.)

As long as integers will handle your data size, you are much better
off using them than padding vectors. Simulations will run much faster,
and there is no hardware associated with conversion from/to SLV/signed/
unsigned to/from integer.

Andy

Article: 140755
Subject: Re: Architecture of FPGA
From: "MM" <mbmsv@yahoo.com>
Date: Mon, 25 May 2009 10:21:13 -0400
Links: << >> << T >> << A >>

> 3. Why DSP and Memory are rectangular in shape ?

Do you mean why they are not round?


/Mikhail

Article: 140756
Subject: Doubt about a Microblaze Based Multiprocessor SoC
From: pantgom@gmail.com
Date: Mon, 25 May 2009 08:28:37 -0700 (PDT)
Links: << >> << T >> << A >>

I have problems with a Microblaze Based Multiprocessor SoC. I have two
Microblaze Cores joined by FSL links. This design works and these
cores can communicate with each other. But now, I am trying to make
these two Microblaze Cores run from an external memory. The linker
script associated with my software applications presents three
possible memories; BRAM (that is, internal ram), DDR_MEM_0 and
DDR_MEM_1. SO, is it possible to load each software application in
each part of the external memory (that is, microblaze_0_app.elf in
DDR_MEM_0, and microblaze_1_app.elf in DDR_MEM_1)?


my best regards

Pablo

Article: 140757
Subject: Re: BSCAN_SPARTAN3 proper use with CAPTURE and UPDATE
From: jetmarc@hotmail.com
Date: Mon, 25 May 2009 09:03:09 -0700 (PDT)
Links: << >> << T >> << A >>

Thanks for posting the very useful list of links.

One thing in your code doesn't look right to me.  You update TDO on
the rising edge of TCK.  This will work in many situations, but the
correct behaviour is to delay the update until the falling edge of
TCK.

Best regards,
Marc

Article: 140758
Subject: Re: Adders with multiple inputs?
From: Peter Alfke <alfke@sbcglobal.net>
Date: Mon, 25 May 2009 09:16:36 -0700 (PDT)
Links: << >> << T >> << A >>

On May 25, 2:11=A0am, sbatt...@yahoo.co.jp wrote:
> Hi guys,
> At the moment I'm waiting to find out whether I will be using Xilinx
> or Actel for my project, and so I'm putting it together for both just
> in case.
>
> In the Actel IP cores, there is an array adder which allows a good
> number of inputs, and there's some optional pipelining. I figure it's
> sufficient to just drop this in and wire up as many inputs as I need.
>
> Xilinx IP cores seem to have only 2-input adders, and I guess these
> are probably inferred by XST with the + operator anyway, so I don't
> want to bother with the IP core gen unless there's some reason why I
> should.
> Supposing I want:
>
> Result <=3D A + B + C + D + E;
> Note, I used only five inputs in my example for brevity, I will have
> more like 25 in my actual system.
>
> (looking in the XST manual, I can either pad the inputs with leading
> zeros or convert to integer and back to std_logic_vector to get carry
> bits to fill my wider result)
>
> At the end of the day, when I synthesize this, would there be any
> difference between coding it in stages (adding pairs of two together,
> then adding their sums together, and so on until all are added up) and
> just putting A+B+C+D+E in one statement?
> All I can think of is that (depending how well conversions to/from
> integer are optimized in XST) I might save a few bits of space in the
> first stages.
> Using the bit padding method, I suppose that all of the adders in the
> first stages would wind up unnecessarily being the same width as the
> result.
>
> Anyway, I'm just curious how this will end up working... any insight
> appreciated!
>
> Steve

If I understand you right, you have 25 parallel inputs, each sending
you bit-serial data.
You need to convert the 25 inputs into one 6-bit binary word, and then
accumulate these words with increasing (or decreasing) binary weight.

Conversion of 25 lines to 6 bits can be done in many ways, including
sequential scanning or shifting, which requires a faster clock of >
1.5 MHz.
But here is an unconventional and simpler way:
Use 13 inputs as address to one port of a BlockRAM with 4 parallel
outputs  (8K x 4)
Use the remaining 12 inputs as address to the other port of the same
BlockRAM.
Store the conversion of (# of active inputs to a binary value) in the
BlockRAM.

Add the two 4 bit binary words together to form a 5-bit word that
always represents the number of active inputs.
Then feed this 5-bit value into a 13-bit accumulator, where you shift
the content after each clock tick.

This costs you one BlockRAM plus three or four CLBs in Xilinx
nomenclature, a tiny portion of the smallest Spartan or Virtex device,
and it could be run a few thousand times faster than you need.
If you have more than 26 inputs, just add another BlockRAM for a total
of up to 52 inputs, and extend the adder and accumulator by one bit.
(Yes, I know in Spartan you are limited to 12 address inputs, (4K x
4), but you can add the remaining bit outside...)

Peter Alfke, from home.

Article: 140759
Subject: Re: Doubt about a Microblaze Based Multiprocessor SoC
From: ljung@codetronix.com
Date: Mon, 25 May 2009 09:53:43 -0700 (PDT)
Links: << >> << T >> << A >>

On May 25, 8:28=A0am, pant...@gmail.com wrote:
> I have problems with a Microblaze Based Multiprocessor SoC. I have two
> Microblaze Cores joined by FSL links. This design works and these
> cores can communicate with each other. But now, I am trying to make
> these two Microblaze Cores run from an external memory. The linker
> script associated with my software applications presents three
> possible memories; BRAM (that is, internal ram), DDR_MEM_0 and
> DDR_MEM_1. SO, is it possible to load each software application in
> each part of the external memory (that is, microblaze_0_app.elf in
> DDR_MEM_0, and microblaze_1_app.elf in DDR_MEM_1)?
>
> my best regards
>
> Pablo

I use the MPMC to map memory ports to DDR2 external memory. This
mechanism has been successfully tested with up to 7 microblazes.

/Per

Article: 140760
Subject: V5 GTX clocking
From: Antti <Antti.Lukats@googlemail.com>
Date: Mon, 25 May 2009 10:06:26 -0700 (PDT)
Links: << >> << T >> << A >>

Hi

how is it possible to use N/S routing if GTX clock is connected to
tile other then where the GTP is used?

the wizard does have RADIOBUTTONS to route clock, but those selections
do not actually do anything

so is ist REALLY only way to use DRP to change the input clock mux?

Antti

Article: 140761
Subject: Re: V5 GTX clocking
From: "Antti.Lukats@googlemail.com" <Antti.Lukats@googlemail.com>
Date: Mon, 25 May 2009 10:26:52 -0700 (PDT)
Links: << >> << T >> << A >>

On May 25, 8:06=A0pm, Antti <Antti.Luk...@googlemail.com> wrote:
> Hi
>
> how is it possible to use N/S routing if GTX clock is connected to
> tile other then where the GTP is used?
>
> the wizard does have RADIOBUTTONS to route clock, but those selections
> do not actually do anything
>
> so is ist REALLY only way to use DRP to change the input clock mux?
>
> Antti

I answer myself, seems that DRP is really the only way :(

Article: 140762
Subject: passing data from fast to slow time domain
From: Frank Buss <fb@frank-buss.de>
Date: Mon, 25 May 2009 20:40:49 +0200
Links: << >> << T >> << A >>

I have two processes, one for sampling data with a high speed clock
(simplified code)

if rising_edge(fastClock) then
	shiftRegister <= shiftRegister(0) & dataIn;
	if shiftRegister = "10" then
		counter <= 0;
	end if;
	counter <= counter + 1;
	if counter = 2 then
		outBit <= shiftRegister(0);
		dataValid <= '1';
	end if;
	if dataRead = '1' then
		dataValid <= '0';
	end if;
end if;

and a slow process for evaluating and feeding to other entities:

if rising_edge(slowClock) then
	dataValidLatch <= dataValid;
	dataInLatch <= dataIn;
	if dataValidLatch = '1' then
		outShift <= outShift(x downto 0) & dataInLatch;
		dataRead <= '1';
	else
		dataRead <= '0';
	end if;
end if;

But the classic timing analyzer in Quartus thinks there are some paths,
with the outShift in it, which needs to be much faster than slowClock and
because of more complex processes it can't be synthesized for the fast
clock. My hope was that latching removes all dependencies to fastClock. How
to do it right? Maybe there are any other trick so sample biphase signals,
which can have a wide range of input frequencies, with a high speed clock?
fastClock is regenerated from the biphase signal with external chips and
looks nice, but a 4x PLL doesn't work, so I want to try 8x, which results
in about 200 MHz max frequency, which I hope should work with the simple
sampling process, but it doesn't work for all the other processes.

-- 
Frank Buss, fb@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

Article: 140763
Subject: Re: passing data from fast to slow time domain
From: Nathan Bialke <nathan@bialke.com>
Date: Mon, 25 May 2009 11:51:15 -0700 (PDT)
Links: << >> << T >> << A >>

On May 25, 11:40=A0am, Frank Buss <f...@frank-buss.de> wrote:
> But the classic timing analyzer in Quartus thinks there are some paths,
> with the outShift in it, which needs to be much faster than slowClock and
> because of more complex processes it can't be synthesized for the fast
> clock.

I suspect your issue is going to be with dataInLatch and
dataValidLatch. I'm assuming there is no deterministic phase
relationship between the fastClock and the slowClock. This is almost
definitely the case unless slowClock is derived from fastClock. If
that is the case, there's no way to reliably do what you're doing
because eventually dataIn/dataValid (generated in the fastClock
domain) will violate the setup and hold requirements in the slowClock
domain.

> My hope was that latching removes all dependencies to fastClock. How
> to do it right?

An asynchronous FIFO is probably the easiest way to do this.
Otherwise, there needs to be some set phase relationship between
fastClock and slowClock.

> Maybe there are any other trick so sample biphase signals,
> which can have a wide range of input frequencies, with a high speed clock=
?

What is a biphase signal?

> fastClock is regenerated from the biphase signal with external chips and
> looks nice, but a 4x PLL doesn't work

Why not?

>  so I want to try 8x, which results
> in about 200 MHz max frequency, which I hope should work with the simple
> sampling process, but it doesn't work for all the other processes.

Regardless of how fast your slowClock or fastClock clocks are,
transferring data between the two of them needs to be done in such a
way that ensures setup/hold requirements are always met. The issue (in
terms of data integrity and preventing metastability) is the phase
relationship between the two clocks. After that, you just need to
ensure the ratio of the two clocks is sufficient to ensure the
aggregate data rate generated by the fastClock can be consumed by the
slowClock without overflow or underflow.

Article: 140764
Subject: Re: Adders with multiple inputs?
From: Weng Tianxiang <wtxwtx@gmail.com>
Date: Mon, 25 May 2009 11:58:26 -0700 (PDT)
Links: << >> << T >> << A >>

On May 25, 9:16=A0am, Peter Alfke <al...@sbcglobal.net> wrote:
> On May 25, 2:11=A0am, sbatt...@yahoo.co.jp wrote:
>
>
>
>
>
> > Hi guys,
> > At the moment I'm waiting to find out whether I will be using Xilinx
> > or Actel for my project, and so I'm putting it together for both just
> > in case.
>
> > In the Actel IP cores, there is an array adder which allows a good
> > number of inputs, and there's some optional pipelining. I figure it's
> > sufficient to just drop this in and wire up as many inputs as I need.
>
> > Xilinx IP cores seem to have only 2-input adders, and I guess these
> > are probably inferred by XST with the + operator anyway, so I don't
> > want to bother with the IP core gen unless there's some reason why I
> > should.
> > Supposing I want:
>
> > Result <=3D A + B + C + D + E;
> > Note, I used only five inputs in my example for brevity, I will have
> > more like 25 in my actual system.
>
> > (looking in the XST manual, I can either pad the inputs with leading
> > zeros or convert to integer and back to std_logic_vector to get carry
> > bits to fill my wider result)
>
> > At the end of the day, when I synthesize this, would there be any
> > difference between coding it in stages (adding pairs of two together,
> > then adding their sums together, and so on until all are added up) and
> > just putting A+B+C+D+E in one statement?
> > All I can think of is that (depending how well conversions to/from
> > integer are optimized in XST) I might save a few bits of space in the
> > first stages.
> > Using the bit padding method, I suppose that all of the adders in the
> > first stages would wind up unnecessarily being the same width as the
> > result.
>
> > Anyway, I'm just curious how this will end up working... any insight
> > appreciated!
>
> > Steve
>
> If I understand you right, you have 25 parallel inputs, each sending
> you bit-serial data.
> You need to convert the 25 inputs into one 6-bit binary word, and then
> accumulate these words with increasing (or decreasing) binary weight.
>
> Conversion of 25 lines to 6 bits can be done in many ways, including
> sequential scanning or shifting, which requires a faster clock of >
> 1.5 MHz.
> But here is an unconventional and simpler way:
> Use 13 inputs as address to one port of a BlockRAM with 4 parallel
> outputs =A0(8K x 4)
> Use the remaining 12 inputs as address to the other port of the same
> BlockRAM.
> Store the conversion of (# of active inputs to a binary value) in the
> BlockRAM.
>
> Add the two 4 bit binary words together to form a 5-bit word that
> always represents the number of active inputs.
> Then feed this 5-bit value into a 13-bit accumulator, where you shift
> the content after each clock tick.
>
> This costs you one BlockRAM plus three or four CLBs in Xilinx
> nomenclature, a tiny portion of the smallest Spartan or Virtex device,
> and it could be run a few thousand times faster than you need.
> If you have more than 26 inputs, just add another BlockRAM for a total
> of up to 52 inputs, and extend the adder and accumulator by one bit.
> (Yes, I know in Spartan you are limited to 12 address inputs, (4K x
> 4), but you can add the remaining bit outside...)
>
> Peter Alfke, from home.- Hide quoted text -
>
> - Show quoted text -

Hi Steve,
1. Set up a 16*8 FIFO;
2. Each of 25 data sources is first registered in its 8-bit register
with valid bit when data bits are full from its serial data source;
3. When valid =3D '1', push the data into FIFO and clear the valid bit;
4. Set up a 13-bit register with initialized 0 data when a new
calculation starts;
5. When FIFO is not empty, add 13-bit register with high 5-bit being
'0' and low 8-bit from FIFO output.

There is no need for 25 data sources.

Weng

Article: 140765
Subject: Re: passing data from fast to slow time domain
From: Frank Buss <fb@frank-buss.de>
Date: Mon, 25 May 2009 21:04:06 +0200
Links: << >> << T >> << A >>

Nathan Bialke wrote:

> I suspect your issue is going to be with dataInLatch and
> dataValidLatch. I'm assuming there is no deterministic phase
> relationship between the fastClock and the slowClock. This is almost
> definitely the case unless slowClock is derived from fastClock. 

There is a simple phase relationship: I use one PLL with two outputs:
fastClock = 8 x the input clock and slowClock = 4 x the input clock. Maybe
then I can simplify the FIFO?

> An asynchronous FIFO is probably the easiest way to do this.
> Otherwise, there needs to be some set phase relationship between
> fastClock and slowClock.

Do you have some VHDL code for it? I think I could use a BRAM for it, but
would be overkill for such a simple case.

> What is a biphase signal?

http://en.wikipedia.org/wiki/Biphase_mark_code

For my application it is AES3.

>> fastClock is regenerated from the biphase signal with external chips and
>> looks nice, but a 4x PLL doesn't work
> 
> Why not?

Looks like there is too much jitter, because sometimes there are bits
missing, which depends on the frequency of the signal. Internally generated
signals are sampled nice. So my idea was to try it with higher sampling
rate.

> Regardless of how fast your slowClock or fastClock clocks are,
> transferring data between the two of them needs to be done in such a
> way that ensures setup/hold requirements are always met. The issue (in
> terms of data integrity and preventing metastability) is the phase
> relationship between the two clocks. After that, you just need to
> ensure the ratio of the two clocks is sufficient to ensure the
> aggregate data rate generated by the fastClock can be consumed by the
> slowClock without overflow or underflow.

Overflow should be no problem, because of the fixed relationship and fixed
bitrate, 4 times slower than slowClock.

-- 
Frank Buss, fb@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

Article: 140766
Subject: When is it to generate transparent latch or usual combinational
From: Weng Tianxiang <wtxwtx@gmail.com>
Date: Mon, 25 May 2009 12:52:08 -0700 (PDT)
Links: << >> << T >> << A >>

Hi,
Through discussions of last problem title "Are all these claims in
VHDL correct?" I understand how to recognize a transparent latch from
a register.

Here I gave an example to show what I am puzzled.

State1_A : process(CLK)
begin
   if CLK'event and CLK = '1' then
      if SINI = '1' then
         State1 <= Idle_S;
      else
         State1 <= State1_NS;
      end if;
   end if;
end if;

State1_B : process(State1, A1, A2)
begin
   case State1 is
      when Idle_S =>
         if A1 = '1' then
            State1_NS <= X_S;
         else
            State1_NS <= Idle_S;
         end if;

      when X_S =>
         if A2 = '1' then
            State1_NS <= Idle_S;
         else
            State1_NS <= X_S;
         end if;
   end case;
end process;

State2_A : process(SINI, CLK)
begin
   if CLK'event and CLK = '1' then
      if SINI = '1' then
         State2 <= Idle_S;
      else
         State2 <= State2_NS;
      end if;
   end if;
end if;

State2_B : process(State2, A1, A2)
begin
   case State2 is
      when Idle_S =>
         if A1 = '1' then
            State2_NS <= X_S;
--         else                     <-- key difference
--            State2_NS <= Idle_S;
         end if;

      when X_S =>
         if A2 = '1' then
            State2_NS <= Idle_S;
         else
            State2_NS <= X_S;
         end if;
   end case;
end process;

From my experiences with state machine, VHDL compiler would generate
warning for state2: "state machine state2 will be implemented as
latches".

Once It took me one week to have found the similar situation with the
above state2 in my a long state machine.

I don't know why VHDL compiler generate latches for state2.

Thank you.

Weng

Article: 140767
Subject: Re: Doubt about a Microblaze Based Multiprocessor SoC
From: Pablo <pbantunez@gmail.com>
Date: Mon, 25 May 2009 13:08:17 -0700 (PDT)
Links: << >> << T >> << A >>

>
> I use the MPMC to map memory ports to DDR2 external memory. This
> mechanism has been successfully tested with up to 7 microblazes.
>
> /Per

Firstly, thanks a lot.

Secondly, I would be grateful if you could tell me if you used XUP
Board and the version of Xilinx Platform Studio. Do you know anything
else of this type of design?

again, my best regards

Article: 140768
Subject: Re: When is it to generate transparent latch or usual combinational
From: Dave <dhschetz@gmail.com>
Date: Mon, 25 May 2009 14:10:52 -0700 (PDT)
Links: << >> << T >> << A >>

On May 25, 3:52=A0pm, Weng Tianxiang <wtx...@gmail.com> wrote:
> Hi,
> Through discussions of last problem title "Are all these claims in
> VHDL correct?" I understand how to recognize a transparent latch from
> a register.
>
> Here I gave an example to show what I am puzzled.
>
> State1_A : process(CLK)
> begin
> =A0 =A0if CLK'event and CLK =3D '1' then
> =A0 =A0 =A0 if SINI =3D '1' then
> =A0 =A0 =A0 =A0 =A0State1 <=3D Idle_S;
> =A0 =A0 =A0 else
> =A0 =A0 =A0 =A0 =A0State1 <=3D State1_NS;
> =A0 =A0 =A0 end if;
> =A0 =A0end if;
> end if;
>
> State1_B : process(State1, A1, A2)
> begin
> =A0 =A0case State1 is
> =A0 =A0 =A0 when Idle_S =3D>
> =A0 =A0 =A0 =A0 =A0if A1 =3D '1' then
> =A0 =A0 =A0 =A0 =A0 =A0 State1_NS <=3D X_S;
> =A0 =A0 =A0 =A0 =A0else
> =A0 =A0 =A0 =A0 =A0 =A0 State1_NS <=3D Idle_S;
> =A0 =A0 =A0 =A0 =A0end if;
>
> =A0 =A0 =A0 when X_S =3D>
> =A0 =A0 =A0 =A0 =A0if A2 =3D '1' then
> =A0 =A0 =A0 =A0 =A0 =A0 State1_NS <=3D Idle_S;
> =A0 =A0 =A0 =A0 =A0else
> =A0 =A0 =A0 =A0 =A0 =A0 State1_NS <=3D X_S;
> =A0 =A0 =A0 =A0 =A0end if;
> =A0 =A0end case;
> end process;
>
> State2_A : process(SINI, CLK)
> begin
> =A0 =A0if CLK'event and CLK =3D '1' then
> =A0 =A0 =A0 if SINI =3D '1' then
> =A0 =A0 =A0 =A0 =A0State2 <=3D Idle_S;
> =A0 =A0 =A0 else
> =A0 =A0 =A0 =A0 =A0State2 <=3D State2_NS;
> =A0 =A0 =A0 end if;
> =A0 =A0end if;
> end if;
>
> State2_B : process(State2, A1, A2)
> begin
> =A0 =A0case State2 is
> =A0 =A0 =A0 when Idle_S =3D>
> =A0 =A0 =A0 =A0 =A0if A1 =3D '1' then
> =A0 =A0 =A0 =A0 =A0 =A0 State2_NS <=3D X_S;
> -- =A0 =A0 =A0 =A0 else =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 <-- key d=
ifference
> -- =A0 =A0 =A0 =A0 =A0 =A0State2_NS <=3D Idle_S;
> =A0 =A0 =A0 =A0 =A0end if;
>
> =A0 =A0 =A0 when X_S =3D>
> =A0 =A0 =A0 =A0 =A0if A2 =3D '1' then
> =A0 =A0 =A0 =A0 =A0 =A0 State2_NS <=3D Idle_S;
> =A0 =A0 =A0 =A0 =A0else
> =A0 =A0 =A0 =A0 =A0 =A0 State2_NS <=3D X_S;
> =A0 =A0 =A0 =A0 =A0end if;
> =A0 =A0end case;
> end process;
>
> From my experiences with state machine, VHDL compiler would generate
> warning for state2: "state machine state2 will be implemented as
> latches".
>
> Once It took me one week to have found the similar situation with the
> above state2 in my a long state machine.
>
> I don't know why VHDL compiler generate latches for state2.
>
> Thank you.
>
> Weng

Are you sure the latch isn't being created for State2_NS? You may want
to put a "when others =3D>" clause at the end of the case statement to
make sure State2_NS gets assigned something en every case. A default
assignment at the top of the process would give similar effects.

Also, SINI doesn't need to be in the sensitivity list for the State2
process, but it shouldn't hurt anything other than simulation time.

Dave

Article: 140769
Subject: Re: When is it to generate transparent latch or usual combinational
From: Andy <jonesandy@comcast.net>
Date: Mon, 25 May 2009 14:28:23 -0700 (PDT)
Links: << >> << T >> << A >>

Weng,

You've told the synthesizer that state2_ns (the combinatorial signal,
not the register) has to remember its previous value under certain
circumstances, so it generates a latch to remember the value.

Your choices to avoid the latch include a) avoiding combinatorial
processes, b) including a default assignment (perhaps from the output
of the associated register) in combinatorial processes, or c) making
sure every possible execution path through the process results in all
driven signals being assigned a value (and not just to themselves).

I always choose (a). If you just have to use a combinatorial process,
then (b) is much easier to read/write/verify/review than is(c).

Andy

Article: 140770
Subject: Re: Architecture of FPGA
From: "Symon" <symon_brewer@hotmail.com>
Date: Mon, 25 May 2009 22:31:11 +0100
Links: << >> << T >> << A >>

MM wrote:
>> 3. Why DSP and Memory are rectangular in shape ?
>
> Do you mean why they are not round?
>
>
No, why are they rectangular? Do you know? The person who set the assignment 
does.

Article: 140771
Subject: Re: Can we expect ISE Gui and makefile to produce identical bit
From: David Antliff <david.antliff@gmail.com>
Date: Mon, 25 May 2009 14:48:33 -0700 (PDT)
Links: << >> << T >> << A >>

On May 21, 11:33=A0pm, Brian Drummond <brian_drumm...@btconnect.com>
wrote:
> Try running Translate from the command line with the exact command line g=
iven in
> the ".cmdlog" file - including the "-intstyle ISE" flag and see if that g=
ives
> the same result as the GUI flow.

Thanks for your reply, Brian.

Unfortunately, the cmdlog indicates that the GUI flow uses a different
executable - unwrapped/ngdbuild.exe which has the problem I described
with loading constraints when run from the command line. So one issue
right at the start is that the GUI uses different binaries.

> Indeed an apparently successful run through the GUI tools produces a
> non-functional bitfile! I have not had time to explore deeply enough to
> find out why.

I don't think it's safe to assume the GUI and command-line flows have
the same results. From what I've seen, the two have diverged.
Fortunately, by manually including the UCF file (and therefore
deliberately disregarding the -i flag that cmdlog said we should use)
we are able to produce an identical final bit-file, excluding
timestamp in header, even though the intermediate files are all
different.

That's with our current inputs at least. Who knows whether this will
be the case tomorrow, or next month?

-- David.

Article: 140772
Subject: Re: Can we expect ISE Gui and makefile to produce identical bit
From: David Antliff <david.antliff@gmail.com>
Date: Mon, 25 May 2009 14:53:35 -0700 (PDT)
Links: << >> << T >> << A >>

On May 23, 5:07=A0am, LittleAlex <alex.lo...@email.com> wrote:
> I have been successful converting a project from GUI to command line
> with identical bit files.

Hi LittleAlex, thank you for your comments.

We too get identical bitfiles (excluding header), but my concern is
that the intermediate files are all different. All this does is give
me confidence that *this* build is identical, but who can say if
future builds will be? We have no way to test this for every build
either. For serious use of the Xilinx tools in an engineering
environment, this sort of behaviour is ridiculous.

> Take a very close look at the log files left behind by the GUI build.
> There are options that you will probably not recognize - the GUI knows
> better than you what you want it to do :)

In fact the GUI got it wrong - it said to use the -i flag, but in fact
this was wrong and what was required was removal of the -i flag, and
use of the -uc flag instead.

> You can get rid of the -intstyle ISE flag. =A0I use a .prj file format
> and that works just fine for me.

What flow do you use with that? -intstyle xflow?

> Most options can be set in a number of places. =A0I'm not 100% sure
> which location for the option has priority; I had some weird results
> which went away when I made them all match.

Is the .prj format similar to Synplify's TCL .prj format? I'll have to
look up the .prj format - could be useful.

> Another thing to look out for: =A0The GUI scatters work directories all
> over the place. =A0Weirdness in these directories can cause run to run
> inconsistencies; audit the location and cleanup of these directories
> carefully.

Yes, our Makefile cleans up pretty well. A 'git clean -fdx' also does
the trick during a build.

Regards,

-- David.

Article: 140773
Subject: Re: Can we expect ISE Gui and makefile to produce identical bit
From: David Antliff <david.antliff@gmail.com>
Date: Mon, 25 May 2009 14:57:27 -0700 (PDT)
Links: << >> << T >> << A >>

On May 22, 2:30=A0am, phil hays <philh...@dont.spam> wrote:
> As the .ise working file changes every run, and is binary to boot, it can
> not be an input into a stable and maintainable build process. So the
> solution I've used when using gnu make under Cygwin is to delete the
> whole result directory (bld) at the start of the build. There are other
> files in the result directory (and sub-directories under it) that can
> influence the build, and the only way that I'm aware of to get a
> consistent result is to start with a fresh directory.

Hi phil, thank you for your reply.

> One option for doing this would be to have the make file call a Project
> Navigator Tcl script (using xtclsh). This script would create a
> fresh .ise file every run, and could also be used to run from the GUI. I
> posted a script for this sometime ago, and will update it if desired.

This sounds useful - can you direct me towards a recent version of
this please?

> This is because the ISE flow seems to read the UCF file into a data base
> first, and then applies the constraints later.

However the ISE flow is using the '-i' flag, which is supposed to
ignore constraints... I only get the same behaviour from the command
line if I ditch the -i flag and use -uc instead, to include
constraints.

> The .ise file has lots of date and time information. The solution to this
> is to think of the .ise file as a working file, rather than a project
> file and to delete it at the start of any build script.

In conjunction with your earlier comment, this makes sense. However I
understand if -intstyle ise is *not* used, there's no dependency on
the ise file whatsoever. I'd prefer this at build time, although we
would like to automatically create an ise file for local use inside
the GUI.

> To difference the .bit files, the header needs to be ignored. To make
> this automatic, I've written a little difference utility using Tcl. Would
> this be of interest?

Yes, this would be useful please.

Regards,

-- David.

Article: 140774
Subject: Re: Architecture of FPGA
From: Nobby Anderson <nobby@invalid.invalid>
Date: Mon, 25 May 2009 17:56:41 -0500
Links: << >> << T >> << A >>

Symon <symon_brewer@hotmail.com> wrote:
> MM wrote:
>>> 3. Why DSP and Memory are rectangular in shape ?
>>
>> Do you mean why they are not round?
>>
>>
> No, why are they rectangular? Do you know? The person who set the assignment 
> does. 
>
There you go, they actually don't.  That's why they're asking the
question.  Then they have a whole classload of students to try and
find the answer for then.  Thankfully some of them are smart enough to
come and ask here rather than doing their own homework assignments.


Nobby

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search