Site Home   Archive Home   FAQ Home   How to search the Archive   How to Navigate the Archive   
Compare FPGA features and resources   

Threads starting:
1994JulAugSepOctNovDec1994
1995JanFebMarAprMayJunJulAugSepOctNovDec1995
1996JanFebMarAprMayJunJulAugSepOctNovDec1996
1997JanFebMarAprMayJunJulAugSepOctNovDec1997
1998JanFebMarAprMayJunJulAugSepOctNovDec1998
1999JanFebMarAprMayJunJulAugSepOctNovDec1999
2000JanFebMarAprMayJunJulAugSepOctNovDec2000
2001JanFebMarAprMayJunJulAugSepOctNovDec2001
2002JanFebMarAprMayJunJulAugSepOctNovDec2002
2003JanFebMarAprMayJunJulAugSepOctNovDec2003
2004JanFebMarAprMayJunJulAugSepOctNovDec2004
2005JanFebMarAprMayJunJulAugSepOctNovDec2005
2006JanFebMarAprMayJunJulAugSepOctNovDec2006
2007JanFebMarAprMayJunJulAugSepOctNovDec2007
2008JanFebMarAprMayJunJulAugSepOctNovDec2008
2009JanFebMarAprMayJunJulAugSepOctNovDec2009
2010JanFebMarAprMayJunJulAugSepOctNovDec2010
2011JanFebMarAprMayJunJulAugSepOctNovDec2011
2012JanFebMarAprMayJunJulAugSepOctNovDec2012
2013JanFebMarAprMayJunJulAugSepOctNovDec2013
2014JanFebMarAprMayJunJulAugSepOctNovDec2014
2015JanFebMarAprMayJunJulAugSepOctNovDec2015
2016JanFebMarAprMayJunJulAugSepOctNovDec2016
2017JanFebMarAprMayJunJulAugSepOctNovDec2017
2018JanFebMarAprMayJunJulAugSepOctNovDec2018
2019JanFebMarAprMayJunJulAugSepOctNovDec2019
2020JanFebMarAprMay2020

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search

Messages from 158450

Article: 158450
Subject: Re: Simulation vs Synthesis
From: Simon <google@gornall.net>
Date: Mon, 30 Nov 2015 15:25:53 -0800 (PST)
Links: << >>  << T >>  << A >>
On Monday, November 30, 2015 at 1:58:18 PM UTC-8, rickman wrote:
>=20
> I don't think you are grasping the situation.  If the output of the=20
> register isn't connected to anything, you have no need for it, so it is=
=20
> removed.  It does not matter if any other instructions use this=20
> register, if *any* one part of the design uses this register output, it=
=20
> won't be removed... unless that part of the design is also removed first.

Maybe I'm not. What I was trying to say is that:

 - The execute module does write to 'newSPData' during execution of BRK, vi=
s:

					   newSPData[`NW:0]			=3D pcPlusOne[`ND:`W];
					   newSPData[`W*2-1:`W]		=3D pcPlusOne[`NW:0];
					   newSPData[`W*3-1:`W*2]		=3D ps | 8'h10;

 - The 'newSPData' register is declared in the execute module's port list

module execute
	(
	...
	output	reg [`W*3-1:0]  	newSPData,	        // Bytes to stuff onto stack=20
	output	reg [1:0]			numSPBytes		// Number of bytes to stuff onto stack
	);

 - The 6502 module does link through to these ports:

	execute execute_inst
		(
		...
		.newSPData(newSPData),
		.numSPBytes(numSPBytes)
		);=09

 - The 6502 module does use the 'wire' vars that link through to the execut=
e registers...

	...
	if ((action & `UPDATE_SP) =3D=3D `UPDATE_SP)
		begin
			if (numSPBytes =3D=3D 1)
				begin
					stack[SP]	<=3D newSPData[`NW:0];
					SP			<=3D SP - 1;
				end
			else if (numSPBytes =3D=3D 2)
				begin
					stack[SP]	<=3D newSPData[`NW:0];
					stack[SP-1]	<=3D newSPData[`W*2-1:`W];
					SP			<=3D SP - 2;
				end
			else if (numSPBytes =3D=3D 3)
				begin
					stack[SP]	<=3D newSPData[`NW:0];
					stack[SP-1]	<=3D newSPData[`W*2-1:`W];
					stack[SP-2]	<=3D newSPData[`W*3-1:`W*2];
					SP			<=3D SP - 3;
				end
		end

Now the coding could be a bit more elegant, but its been through several it=
erations of "perhaps I ought to be more explicit" ... so forgive me that :)

What I was trying (seemingly, failing :) to say is that nothing else uses '=
stack' (apart from an initialize block that zeros out the values), therefor=
e it could be optimized away, ergo the newSPData could be optimized away, e=
rgo my warning messages. Nothing else uses 'stack' because there's no other=
 stack-related CPU operations yet, just the BRK instruction which happens t=
o push stuff onto the stack.

I'm also getting advice to "just ignore the message and move on", and it's =
hard to correlate that with "I think you still need to look at the code and=
 figure out why the registers don't drive any inputs".=20

If I'm still not grasping it (or not explaining what I'm thinking well enou=
gh), the code is at http://0x0000ff.com/6502/ although the formatting leave=
s a little to be desired.

Cheers, and many thanks to everyone for all the help :)

Article: 158451
Subject: Re: Simulation vs Synthesis
From: BobH <wanderingmetalhead.nospam.please@yahoo.com>
Date: Mon, 30 Nov 2015 16:44:06 -0700
Links: << >>  << T >>  << A >>
On 11/30/2015 4:25 PM, Simon wrote:
> What I was trying (seemingly, failing :) to say is that nothing else
> uses 'stack' (apart from an initialize block that zeros out the values),
> therefore it could be optimized away, ergo the newSPData could be optimized
> away, ergo my warning messages. Nothing else uses 'stack' because there's no
> other stack-related CPU operations yet, just the BRK instruction which
> happens to push stuff onto the stack.

If nothing is using the stack output, there is a decent chance that it 
is getting optimized out, then there is no user for newSP's output and 
it will get optimized out. Check and see if the stack registers are 
getting optimized out.

If you have brought the stack out to a top level output (pin) it should 
not get optimized out.

A mistake that I have made, is to mis-spell the wire connection and then 
there is no user for the outputs. The easiest way to check that is to 
inspect the simulation at the inputs to the next stage that uses the 
data and make sure that they are wiggling as you expect and not showing 
undefined as they would for an undriven wire. The second easiest way to 
check that is to eyeball the naming for this problem.

Good Luck,
BobH


Article: 158452
Subject: Re: Simulation vs Synthesis
From: rickman <gnuarm@gmail.com>
Date: Mon, 30 Nov 2015 19:33:11 -0500
Links: << >>  << T >>  << A >>
On 11/30/2015 6:25 PM, Simon wrote:
> On Monday, November 30, 2015 at 1:58:18 PM UTC-8, rickman wrote:
>>
>> I don't think you are grasping the situation.  If the output of the
>> register isn't connected to anything, you have no need for it, so it is
>> removed.  It does not matter if any other instructions use this
>> register, if *any* one part of the design uses this register output, it
>> won't be removed... unless that part of the design is also removed first.
>
> Maybe I'm not. What I was trying to say is that:
>
>   - The execute module does write to 'newSPData' during execution of BRK, vis:
>
> 					   newSPData[`NW:0]			= pcPlusOne[`ND:`W];
> 					   newSPData[`W*2-1:`W]		= pcPlusOne[`NW:0];
> 					   newSPData[`W*3-1:`W*2]		= ps | 8'h10;
>
>   - The 'newSPData' register is declared in the execute module's port list
>
> module execute
> 	(
> 	...
> 	output	reg [`W*3-1:0]  	newSPData,	        // Bytes to stuff onto stack
> 	output	reg [1:0]			numSPBytes		// Number of bytes to stuff onto stack
> 	);
>
>   - The 6502 module does link through to these ports:
>
> 	execute execute_inst
> 		(
> 		...
> 		.newSPData(newSPData),
> 		.numSPBytes(numSPBytes)
> 		);	
>
>   - The 6502 module does use the 'wire' vars that link through to the execute registers...
>
> 	...
> 	if ((action & `UPDATE_SP) == `UPDATE_SP)
> 		begin
> 			if (numSPBytes == 1)
> 				begin
> 					stack[SP]	<= newSPData[`NW:0];
> 					SP			<= SP - 1;
> 				end
> 			else if (numSPBytes == 2)
> 				begin
> 					stack[SP]	<= newSPData[`NW:0];
> 					stack[SP-1]	<= newSPData[`W*2-1:`W];
> 					SP			<= SP - 2;
> 				end
> 			else if (numSPBytes == 3)
> 				begin
> 					stack[SP]	<= newSPData[`NW:0];
> 					stack[SP-1]	<= newSPData[`W*2-1:`W];
> 					stack[SP-2]	<= newSPData[`W*3-1:`W*2];
> 					SP			<= SP - 3;
> 				end
> 		end
>
> Now the coding could be a bit more elegant, but its been through several iterations of "perhaps I ought to be more explicit" ... so forgive me that :)
>
> What I was trying (seemingly, failing :) to say is that nothing else uses 'stack' (apart from an initialize block that zeros out the values), therefore it could be optimized away, ergo the newSPData could be optimized away, ergo my warning messages. Nothing else uses 'stack' because there's no other stack-related CPU operations yet, just the BRK instruction which happens to push stuff onto the stack.
>
> I'm also getting advice to "just ignore the message and move on", and it's hard to correlate that with "I think you still need to look at the code and figure out why the registers don't drive any inputs".
>
> If I'm still not grasping it (or not explaining what I'm thinking well enough), the code is at http://0x0000ff.com/6502/ although the formatting leaves a little to be desired.
>
> Cheers, and many thanks to everyone for all the help :)

Ok, I think you understand what I am saying.  If "stack" is being 
optimized away, I would expect to see that also be in the warning 
messages.  Is it?  If not, I can only assume that is not the problem.

I don't know that you need to actually have instructions in your design 
that utilize the stack data.  As long as there is a data path from 
"stack" to other logic and the control signals are driven from logic 
that is not optimized away it should remain.

I agree with the others that if you believe this problem is because your 
design is not complete, move on.  I don't think it will be any harder to 
find with a completed design than with a partial design, possibly the 
opposite.

I will also say however, that unit testing can be very useful if you 
aren't designing it on the fly.  If you have decomposed your modules 
with full specification of what they do and all the ins and outs, you 
should be able to write a test bench for each module.

-- 

Rick

Article: 158453
Subject: Re: Simulation vs Synthesis
From: rickman <gnuarm@gmail.com>
Date: Mon, 30 Nov 2015 19:34:07 -0500
Links: << >>  << T >>  << A >>
On 11/30/2015 6:44 PM, BobH wrote:
> On 11/30/2015 4:25 PM, Simon wrote:
>> What I was trying (seemingly, failing :) to say is that nothing else
>> uses 'stack' (apart from an initialize block that zeros out the values),
>> therefore it could be optimized away, ergo the newSPData could be
>> optimized
>> away, ergo my warning messages. Nothing else uses 'stack' because
>> there's no
>> other stack-related CPU operations yet, just the BRK instruction which
>> happens to push stuff onto the stack.
>
> If nothing is using the stack output, there is a decent chance that it
> is getting optimized out, then there is no user for newSP's output and
> it will get optimized out. Check and see if the stack registers are
> getting optimized out.
>
> If you have brought the stack out to a top level output (pin) it should
> not get optimized out.
>
> A mistake that I have made, is to mis-spell the wire connection and then
> there is no user for the outputs. The easiest way to check that is to
> inspect the simulation at the inputs to the next stage that uses the
> data and make sure that they are wiggling as you expect and not showing
> undefined as they would for an undriven wire. The second easiest way to
> check that is to eyeball the naming for this problem.

If you make a spelling error, won't that be flagged because that signal 
hasn't been declared?

-- 

Rick

Article: 158454
Subject: Re: Simulation vs Synthesis
From: Simon <google@gornall.net>
Date: Mon, 30 Nov 2015 16:35:31 -0800 (PST)
Links: << >>  << T >>  << A >>
On Monday, November 30, 2015 at 3:44:28 PM UTC-8, BobH wrote:
> 
> If nothing is using the stack output, there is a decent chance that it 
> is getting optimized out, then there is no user for newSP's output and 
> it will get optimized out. Check and see if the stack registers are 
> getting optimized out.

So, oddly enough, there's no mention of 'stack' in the synthesis report (CTRL-F doesn't find anything either), even though it's declared (as registers) alongside the zero-page file in an identical fashion:

    ////////////////////////////////////////////////////////////////////////////
    // Set up zero-page as register-based for speed reasons
    ////////////////////////////////////////////////////////////////////////////
    reg    [`NW:0]       		zp[0:255];		// Zero-page 
	
    ////////////////////////////////////////////////////////////////////////////
    // Same for stack
    ////////////////////////////////////////////////////////////////////////////
    reg    [`NW:0]			stack[0:255];		// Stack-page 

'zp' gets a lot of mentions (mainly that it's too sparse to go into blockRAM) but nary a hint of 'stack' anywhere to be seen.

Looking at the summaries, there are 268 8-bit registers declared, which is only sufficient for either 'stack' or 'zp', but not both together (unless it's cherry-picking the used ones from both declarations of course).

Curiouser and curiouser, quoth the raven^W^W^W said Alice...

> 
> If you have brought the stack out to a top level output (pin) it should 
> not get optimized out.

No I haven't, it's self-contained within the '6502' module, but I could try doing that tonight.
 
> A mistake that I have made, is to mis-spell the wire connection and then 
> there is no user for the outputs. The easiest way to check that is to 
> inspect the simulation at the inputs to the next stage that uses the 
> data and make sure that they are wiggling as you expect and not showing 
> undefined as they would for an undriven wire. The second easiest way to 
> check that is to eyeball the naming for this problem.

Yep, I've done that too :)

Article: 158455
Subject: Re: Simulation vs Synthesis
From: Simon <google@gornall.net>
Date: Mon, 30 Nov 2015 17:32:31 -0800 (PST)
Links: << >>  << T >>  << A >>
Just to follow up, it definitely is because it's being optimised away. If I=
 add a port which links to a byte of the stack register space, and link it =
to the top-level test bench...

	module cpu_6502
		(
		...
		output	reg [`NW:0]		stackff
		);
		=09
		/////////////////////////////////////////////////////////////////////////=
///
		// Set up the stack as a register array
		/////////////////////////////////////////////////////////////////////////=
///
		reg    [`NW:0]				stack[0:255];         // Stack-page=20
	=09
		always @ (posedge(clk))
			stackff <=3D stack[255];


... the report tells me that bits [31:8] of 'newSPData' are optimised away,=
 but bits [7:0] are not.=20

Aside: The report is also saying: "INFO: [Synth 8-5545] ROM "stack_reg[255]=
" won't be mapped to RAM because address size (32) is larger than maximum s=
upported(25)"

Am I misunderstanding this, or is my declaration wrong ? I'm trying to decl=
are 256 8-bit (`NW is defined to be 7) registers to represent a single page=
 (the 6502 uses page-1 as a stack, so its stack pointer is only 8-bits in s=
ize). 256 bytes ought to fit into a 4K-byte block-ram...

Cheers
   Simon.

Article: 158456
Subject: Re: Simulation vs Synthesis
From: Tim Wescott <seemywebsite@myfooter.really>
Date: Mon, 30 Nov 2015 19:53:16 -0600
Links: << >>  << T >>  << A >>
On Mon, 30 Nov 2015 18:23:43 +0000, Mark Curry wrote:

> In article <450e997a-afd7-4c3d-a181-b324af6ede3c@googlegroups.com>,
> Simon  <google@gornall.net> wrote:
>>So I have a partly-complete design for a 6502 CPU, it's simulating just
>>fine for the implemented opcodes, but when I run synthesis, I get a
>>whole load of "Sequential element (\newSPData_reg[23] ) is unused and
>>will be removed from module execute.", one for each bit in the register,
>>in fact.
>>
>>I know the logic is *trying* to use this register, I can see the values
>>in the register changing during simulation runs, but I can't for the
>>life of me see why it would be removed - the 'execute' module is
>>basically a case statement, with one of the cases explicitly setting the
>>value of the 'newSPData' register.
>>
>>Again, in the simulation, I see the case being executed, and the values
>>changing. I guess what I'm looking for is any tips on how to tackle the
>>problem ("The Knowledge", if you will), I've already tried the 'trace
>>through the logic for the case that should trigger the case in question,
>>and see if anything jumps out at me'.  I remain un-jumped-out-at [sigh].
> <snip>
> 
> Simon - just ignore the message and move on.  Really.
> Synthesis optimizations are quite advanced these days - both
> combinatorial and across registers.
> 
> Some sort of optimization that may not be obvious to you, may have
> combined your register bit with another,
> leaving this one "unused".  It's ok.  Trust the tool,
> and just move on.

It's been at least five years since I've actually done FPGA work, but I 
always took unexpected optimizations of this sort to mean that I didn't 
have my head screwed on straight, and I needed to figure out what I was 
doing wrong.

Most of the time, I was right.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Article: 158457
Subject: Re: Simulation vs Synthesis
From: BobH <wanderingmetalhead.nospam.please@yahoo.com>
Date: Mon, 30 Nov 2015 20:44:02 -0700
Links: << >>  << T >>  << A >>
On 11/30/2015 6:32 PM, Simon wrote:
> Just to follow up, it definitely is because it's being optimised away. If I
> add a port which links to a byte of the stack register space, and link it
> to the top-level test bench...
>
> 	module cpu_6502
> 		(
> 		...
> 		output	reg [`NW:0]		stackff
> 		);
> 			
> 		////////////////////////////////////////////////////////////////////////////
> 		// Set up the stack as a register array
> 		////////////////////////////////////////////////////////////////////////////
> 		reg    [`NW:0]				stack[0:255];         // Stack-page
> 		
> 		always @ (posedge(clk))
> 			stackff <= stack[255];
>
>
> ... the report tells me that bits [31:8] of 'newSPData' are optimised away, but bits [7:0]
> are not.
>
> Aside: The report is also saying: "INFO: [Synth 8-5545] ROM "stack_reg[255]" won't be
> mapped to RAM because address size (32) is larger than maximum supported(25)"
>
> Am I misunderstanding this, or is my declaration wrong ? I'm trying to
> declare 256 8-bit (`NW is defined to be 7) registers to represent a single
> page (the 6502 uses page-1 as a stack, so its stack pointer is only 8-bits in size).
> 256 bytes ought to fit into a 4K-byte block-ram...
>
> Cheers
>     Simon.
>

I am not clear what you are trying to do with the stack here. Do you 
have a relatively complete CPU implemented?

I would expect something like:
module CPU6502
(
   output wire [ 7:0] data_out,
   output wire [15:0] address_out,
   input  wire [ 7:0] data_in,

   output wire        write_enable,
   output wire        read_enable,

   input  wire        irq_in,
   input  wire        nmi_in,
   input  wire        clk,
   input  wire        rstn
)

I would expect that you would have an 8 bit stack pointer that would get 
muxed onto the address bus, possibly with offsets from the instruction 
stream. The newSP value would go into the stack pointer when you are 
updating the stack.

RAM would get hung on the address and data buses with block decode logic 
to decode the upper address bits into a chip select for the RAM and 
peripherals. Since FPGA's don't do tri-state buses, there will be a read 
data in mux to select the data source from the addressed bus target for 
reads.

sort of like:
module mcu
(
   output        uart_txd,
   input         uart_rxd,
   input         clk,
   input         rstn
)

   wire [15:0] address;
   wire [ 7:0] data_out;
   wire [ 7:0] ram_data, rom_data, uart_data;
   reg  [ 7:0] data_in;
   wire        ram_block_sel, rom_block_sel, uart_block_sel;
   wire        write_enable, read_enable;

CPU6502 cpu
(
   .data_out     (data_out),
   .data_in      (data_in),
   .address_out  (address),
   .write_enable (write_enable),
   .read_enable  (read_enable),
   .irq_in       (irq),
   .nmi_in       (1'b0),
   .clk          (clk),
   .rstn         (rstn)
);

RAM_1Kx8 ram
(
   .address_in   (address[9:0]),
   .data_in      (data_out),
   .data_out     (ram_data),
   .write_enable (write_enable),
   .chip_sel     (ram_block_sel)
);


ROM_1Kx8 rom
(
.address_in  (address[9:0]),
.data_out    (rom_data),
.read_enable (read_enable),
.chip_sel    (rom_block_sel)
);

UART uart
(
.txd          (uart_txd),
.rxd          (uart_rxd),
.reg_select   (address[1:0]), // 2 bits of address
.data_in      (data_out),
.data_out     (uart_data),
.write_enable (write_enable),
.read_enable  (read_enable),
.irq_out      (irq),
.clk          (clk),
.rstn         (rstn)
)

// address block decode
assign rom_block_sel  = address [15:13] == 3'b111;    // top address
assign ram_block_sel  = address [15:13] == 3'b000;    // bottom address
assign uart_block_sel = address [15:13] == 3'b001;

// read data path mux
always @( * )
begin
   case (address[15:13])
    3'b000:  data_in = ram_data;
    3'b001:  data_in = uard_data;
    3'b111:  data_in = rom_data;
    default: data_in = 8'h0;
   endcase
end

endmodule

Article: 158458
Subject: Re: Simulation vs Synthesis
From: rickman <gnuarm@gmail.com>
Date: Tue, 1 Dec 2015 00:23:01 -0500
Links: << >>  << T >>  << A >>
On 11/30/2015 8:53 PM, Tim Wescott wrote:
> On Mon, 30 Nov 2015 18:23:43 +0000, Mark Curry wrote:
>
>> In article <450e997a-afd7-4c3d-a181-b324af6ede3c@googlegroups.com>,
>> Simon  <google@gornall.net> wrote:
>>> So I have a partly-complete design for a 6502 CPU, it's simulating just
>>> fine for the implemented opcodes, but when I run synthesis, I get a
>>> whole load of "Sequential element (\newSPData_reg[23] ) is unused and
>>> will be removed from module execute.", one for each bit in the register,
>>> in fact.
>>>
>>> I know the logic is *trying* to use this register, I can see the values
>>> in the register changing during simulation runs, but I can't for the
>>> life of me see why it would be removed - the 'execute' module is
>>> basically a case statement, with one of the cases explicitly setting the
>>> value of the 'newSPData' register.
>>>
>>> Again, in the simulation, I see the case being executed, and the values
>>> changing. I guess what I'm looking for is any tips on how to tackle the
>>> problem ("The Knowledge", if you will), I've already tried the 'trace
>>> through the logic for the case that should trigger the case in question,
>>> and see if anything jumps out at me'.  I remain un-jumped-out-at [sigh].
>> <snip>
>>
>> Simon - just ignore the message and move on.  Really.
>> Synthesis optimizations are quite advanced these days - both
>> combinatorial and across registers.
>>
>> Some sort of optimization that may not be obvious to you, may have
>> combined your register bit with another,
>> leaving this one "unused".  It's ok.  Trust the tool,
>> and just move on.
>
> It's been at least five years since I've actually done FPGA work, but I
> always took unexpected optimizations of this sort to mean that I didn't
> have my head screwed on straight, and I needed to figure out what I was
> doing wrong.
>
> Most of the time, I was right.

Yeah, that makes sense.  There is a chain of

input --> A --> B --> C --> output

where A, B and C are registered values.  Each of them may have many 
other internal signals combining to produce the value in the signal and 
may be used in other logic internally.  If none of the destinations 
reach an output or if the inputs are optimized so they do not depend on 
anything, but rather are constants (like A and '0' which will always 
produce a '0' result) then that logic will be optimized away.  This can 
remove an entire chain, or perhaps just B and C or any other combination.

In Simon's case this may well be due to inputs which are not driven 
because the instruction decode logic is not implemented.  This can 
either be causing logic to be optimized because it is constant, or to be 
optimized because the output is never gated into the next register. 
Unless he provides the full code we can't debug this.

I always work from block diagrams which help me to "see" my data flow 
which makes it easy to see which control points need to be driven.  I 
expect Simon's problem is in the instruction decode logic not driving a 
signal, but it is hard to tell.  The fact that a register is changing 
value in the simulator means that it is being given variable values, but 
does not mean the output is being used for anything.  I'm very unclear 
why he has a 32 bit register in an 8 bit processor.

-- 

Rick

Article: 158459
Subject: Re: Simulation vs Synthesis
From: Simon <google@gornall.net>
Date: Mon, 30 Nov 2015 22:15:13 -0800 (PST)
Links: << >>  << T >>  << A >>
Clearly some explanation required...

> I am not clear what you are trying to do with the stack here. Do you=20
> have a relatively complete CPU implemented?=20

The original design did in fact have a clean simple 64k block of RAM, where=
 no page was special, and yes, it was hung off the address/data buses and l=
ife was simple. The timing, however, was not. The original 6502 had a 2-pha=
se clock, in which it could present the address at the beginning of the clo=
ck cycle, and the data would be on the data-bus ready for use at the end of=
 the same clock cycle. In this way, instructions that took two memory acces=
ses {read opcode, read argument} could be retired in just 2 clocks.

I don't have that luxury :) I wanted everything to be simple synchronous lo=
gic that presented the data on clock N and the result was available at cloc=
k N+1. In fact I wanted it even simpler. I wanted to keep the internal logi=
c as simple as possible as well, making everything synced to always @ (pose=
dge(clk)), which means that after a decode operation, it takes 3 clocks to =
read the data ([abus <=3D address], [address appears on bus], [data appears=
 on bus]).=20

The reason for this self-imposed simplicity is that I'm planning on using t=
he design as a test-case for implementing an actual real-life chip - I mana=
ged to persuade Cadence to let me use their tools, and I have a 5-week Xmas=
 break coming up (no vacation during the year, so I'm looking forward to it=
 :) I thought an 8-bit processor would be nice and easy, but perhaps I'm ai=
ming too high. Perhaps I ought to look at XOR ...=20

So, the main challenges are with stack and zero-page instructions because t=
hose are the "fast" instructions on a 6502 (2 clocks) that aren't just regi=
ster->register. I'm already running the CPU internal state at 4x the nomina=
l 'cpu clock' to get the clock-cycle accuracy I need for the instructions t=
hat the 6502 took 2 clocks to process, and I'm inserting wait states to syn=
c up the longer (up to 7, so 28 clocks in my world) ones. Once the basic sy=
stem is in place, I'll allow that to optionally relax, and I can run it in =
cycle-accurate or "turbo" mode. Perhaps I'll have a "turbo" button (showing=
 my age, here).

My solution to the 2-cycle instructions was to declare 2 pages-worth of reg=
isters: page-0, (which is special for the 6502, with special opcodes that t=
ake less time to run if they access there) and the stack (which is page-1).=
 The 6502 has an 8-bit stack-pointer, that it always prepends 01h to (to fo=
rm 16'h01xx), providing a 256-deep stack. The use of a register array for b=
oth these pages significantly helps when I only have 2 clocks to play with.=
 Obviously when the CPU wants to store or read values, I need to determine =
if it's page-0 or page-1 and redirect accordingly, but that's not a high pr=
ice to pay.

"Relatively complete" is an interesting term. I have a CPU that will execut=
e (at least in simulation [grin]) all the opcodes I have implemented - it d=
oes the decode, figures out the addressing mode of the instruction (up to 8=
 of them), processes the result, and updates the {memory, registers, proces=
sor-state-flags, stack} accordingly. The issue is that I've implemented abo=
ut 1/3 of the opcodes right now... So, the answer is "it depends on what yo=
u mean" :)

> In Simon's case this may well be due to inputs which are not driven becau=
se the instruction decode logic is not implemented.  This can=20
> either be causing logic to be optimized because it is constant, or to be =
 optimized because the output is never gated into the next register.=20
> Unless he provides the full code we can't debug this.=20

The full code is actually available (see link above, or go to http://0x0000=
ff.com/6502/)=20

Cheers
   Simon.

Article: 158460
Subject: Re: Simulation vs Synthesis
From: Brian Drummond <brian@shapes.demon.co.uk>
Date: Tue, 1 Dec 2015 12:03:39 -0000 (UTC)
Links: << >>  << T >>  << A >>
On Mon, 30 Nov 2015 11:02:14 -0800, Simon wrote:

> Thanks for all the replies, guys :)
> 
> On Sunday, November 29, 2015 at 11:51:03 PM UTC-8, rickman wrote:
>> 
>> Usually logic is removed because the result is not used anywhere.  You
>> can design and simulate a design only to see the synthesizer remove the
>> entire thing if it has no outputs that make it to an I/O pin.
>> 
>> So where are the outputs of your register used?  Do they actually
>> connect?
> 
> Actually, this may be it. I had tried to counter this by exporting the
> databus (both input and output) in the top-level test-bench module, but
> thinking about it, the registers it's removing are from code that
> exercises the BRK instruction, which only affects the stack-pointer and
> program-counter, both of which are internal to the CPU in the design as
> it stands, and the BRK instruction is currently the only thing to
> manipulate the stack pointer (I'm going alphabetically through the
> instruction list, and I've only got as far as EOR :)

At this stage, it's probably OK to simply trust synthesis until the 
design is largely complete. I

f your simulation tests are thorough enough, that's what matters.

You can mess with a temporary framework of attributes to preserve 
signals, but IMO it's wasted time and effort, especially since what's 
"preserve"d through synthesis can still be trimmed by the mapper, so you 
might have to push the rope a little harder. 

You can possibly stub out blocks (containing some dummy observable, like 
an OR gate) and fill them in later.

But I'd probably press ahead with proving the design in simulation until 
there was enough to be worth synthesis.

-- Brian


Article: 158461
Subject: Re: Simulation vs Synthesis
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Tue, 1 Dec 2015 13:22:47 +0000
Links: << >>  << T >>  << A >>
On 01/12/15 01:32, Simon wrote:
> ... the report tells me that bits [31:8] of 'newSPData' are optimised away, but bits [7:0] are not.

Is this because the 6502's stack pointer is only 8 bits long?
It can only address 256 bytes of RAM, so bits 8-31 /cannot/ be
used.

 From http://www.dwheeler.com/6502/oneelkruns/asm1step.html

    Stack Pointer
    -------------

    When the microprocessor executes a JSR (Jump to SubRoutine)
    instruction it needs to know where to return when finished.  The 6502
    keeps this information in low memory from $0100 to $01FF and uses the
    stack pointer as an offset.  The stack grows down from $01FF and makes
    it possible to nest subroutines up to 128 levels deep.  Not a problem
    in most cases.

Article: 158462
Subject: Re: Simulation vs Synthesis
From: Simon <google@gornall.net>
Date: Tue, 1 Dec 2015 06:52:19 -0800 (PST)
Links: << >>  << T >>  << A >>
Sorry, I didn't really explain the 32-bit newSPData register, did I ? In my=
 defence, my 3-year old was clamouring for his evening meal, and his mother=
 was busy :)

What I'd been trying to do was split up the code into separate areas by mod=
ule, so generally speaking:

 -  there's a module ("decode.v") which takes in raw opcodes and outputs th=
e instruction type, and the addressing mode of the opcode (one of {accumula=
tor, Immediate, relative, absolute, zero-page, absolute-indexed-x, absolute=
-indexed-y, zero-page-indexed-x, zero-page-indexed-y, indirect, indirect-x,=
 indirect-y})

 - there's a module ("execute.v") that handles doing the actual work of eac=
h opcode, placing the results in intermediate registers (output ports of th=
e module)=20

 - and there's an overall harness-it-all-together module ("cpu_6502" in 650=
2.v) which instantiates the above

The stack and page-zero are special (as mentioned before) for speed reasons=
, and I don't know of a way to share an array of registers between modules.=
 For zero-page this isn't an issue, there's only one byte to write, and it =
can be passed back as 'storeValue' with an 'action' of {UPDATE_A, UPDATE_X,=
 UPDATE_Y} and that byte will be placed in the correct processor register b=
ased on the action.

For the stack, though, I need to pass back (so far) up to 3 bytes of data. =
The BRK instruction simulates an interrupt, pushing (in order) {PC high-byt=
e, PC low-byte, Processor-status-flags} onto the stack, then reading the 2 =
bytes at the interrupt vector {16'hFFFE,16'hFFFF}, and setting the contents=
 of those two bytes into PC.

The 32-bit (I went for 4 bytes not 3. If it needs to be changed, I can do s=
o later) 'newSPData' register, combined with the 2-bit count 'numSPBytes' i=
s how I implemented passing back the bytes from "execute" to the overall mo=
dule to update the array of registers that constitute my stack. The "execut=
e" module can pass back up to 4 bytes, and the overall module ("cpu_6502") =
that actually contains the stack register array will do the right thing, ba=
sed on 'numSPBytes' if the 'action' contains the bit 'UPDATE_SP'. This is a=
ll done in the `EXECUTE stage of the overall module in 6502.v

The addition I made last night was to expose the 255th byte of the stack as=
 an external top-level port (the stack grows downwards, so this is the firs=
t byte of the stack) and synthesise. The lower 8 bits of the 'newSPData' re=
gister are those that would be inserted into the first position on the stac=
k, and indeed those lower 8 bits were not optimised away. From this, I conc=
lude it is the stack being optimised away that is the root cause of my warn=
ing messages.

'stack' was otherwise totally internal to the "cpu_6502" module, and althou=
gh it had a writer (the `EXECUTE stage can write up to 3 bytes to it), ther=
e is currently no reader for those registers. I'm up to 'EOR' (the 6502 use=
s EOR for what the rest of the world calls XOR), and the first instruction =
to implement reading the stack is 'PLA'. I might try jumping ahead to imple=
menting that instruction rather than going strictly alphabetically (I didn'=
t want to miss one :)=20

Hope that clears things up a little.

Cheers
   Simon

Article: 158463
Subject: Re: Simulation vs Synthesis
From: Simon <google@gornall.net>
Date: Tue, 1 Dec 2015 06:58:48 -0800 (PST)
Links: << >>  << T >>  << A >>
Thanks :)

As I mentioned just above, I might jump ahead and implement PLA (which will=
 force a *read* of the stack values rather than just the current writes) an=
d see if that has an effect.=20

The simulation tests at the moment are me going through (for every opcode) =
(for every addressing mode) ...

 - Check the decoding
 - Check the timing (varies based on addressing mode)
 - Check the results

... in the simulator using the waveforms. It is, however, getting to the po=
int where writing a formal test of each of the above would start to become =
beneficial. I want to make sure that any later additions don't affect any p=
revious results. It's effort to do so, and my time is limited, but it will =
actually save time in the long run.

Cheers
   Simon

Article: 158464
Subject: Re: Simulation vs Synthesis
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Tue, 1 Dec 2015 15:10:24 +0000
Links: << >>  << T >>  << A >>
On 01/12/15 14:58, Simon wrote:
> I want to make sure that any later additions don't affect any previous results. It's effort to do so, and my time is limited, but it will actually save time in the long run.

That's what suites of test benches are for. The software
world has triumphantly reinvented the concept and called
them "unit tests".

It is normal to have a hierarchy of test suites. Some
can be run frequently because they are are a fast "sanity
check" that just tests simple externally observable
behaviour of whatever unit is being tested. Some tests
are run at major points in the design because they test
the internal operation in detail, and hence are slow.


Article: 158465
Subject: Re: Found: an FPGA with internal tri-states
From: "carstenherr" <94590@FPGARelated>
Date: Tue, 01 Dec 2015 09:37:32 -0600
Links: << >>  << T >>  << A >>
In german microcontroller Forum, there was a discussion recently regarding
this Topic. An expert for algorithms and optimization explained this in
detail that signals and busses in FPGAs cannot efficiently be optiomized
when they are bidictional. So this is not just an electrical issue.







---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158466
Subject: Re: Sum of 8 numbers in FPGA
From: "carstenherr" <94590@FPGARelated>
Date: Tue, 01 Dec 2015 09:37:41 -0600
Links: << >>  << T >>  << A >>
I expect it to be most efficient to use 8 adders in parallel when the
incoming data is not always fully ocupying their vector withs since the
Compiler might discoder unsued bits and shorten carry chain lengths
appropriately.

To meet Timing, I always add FFs behind and use Register balancing and
retiming giving the Compiler oall Options of Optimization.



---------------------------------------
Posted through http://www.FPGARelated.com

Article: 158467
Subject: Re: Simulation vs Synthesis
From: Simon <google@gornall.net>
Date: Tue, 1 Dec 2015 08:07:32 -0800 (PST)
Links: << >>  << T >>  << A >>
On Tuesday, December 1, 2015 at 7:10:40 AM UTC-8, Tom Gardner wrote:
> On 01/12/15 14:58, Simon wrote:
> > I want to make sure that any later additions don't affect any previous =
results. It's effort to do so, and my time is limited, but it will actually=
 save time in the long run.
>=20
> That's what suites of test benches are for. The software
> world has triumphantly reinvented the concept and called
> them "unit tests".
>=20
> It is normal to have a hierarchy of test suites. Some
> can be run frequently because they are are a fast "sanity
> check" that just tests simple externally observable
> behaviour of whatever unit is being tested. Some tests
> are run at major points in the design because they test
> the internal operation in detail, and hence are slow.

[grin] I'm well aware what unit tests are for, I've written a *lot* of them=
 in my day job over the last few decades, although admittedly not in verilo=
g :) The problem is not the lack of knowledge (for once), it's the will to =
sit down and do something that doesn't seemingly advance the project... It'=
s a lot more fun to write code than to write code that tests code...

As I said though, it is getting to the point (in all honesty, it's way past=
 the point) where manual checking of things like this is no longer viable. =
Unit tests feature in my future ...

Cheers
   Simon.

Article: 158468
Subject: Re: Found: an FPGA with internal tri-states
From: rickman <gnuarm@gmail.com>
Date: Tue, 1 Dec 2015 12:44:17 -0500
Links: << >>  << T >>  << A >>
On 12/1/2015 10:37 AM, carstenherr wrote:
> In german microcontroller Forum, there was a discussion recently regarding
> this Topic. An expert for algorithms and optimization explained this in
> detail that signals and busses in FPGAs cannot efficiently be optiomized
> when they are bidictional. So this is not just an electrical issue.

I'm not sure what that means really.  How are other buses optimized?

-- 

Rick

Article: 158469
Subject: Re: Simulation vs Synthesis
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Tue, 1 Dec 2015 17:51:04 +0000
Links: << >>  << T >>  << A >>
On 01/12/15 16:07, Simon wrote:
> On Tuesday, December 1, 2015 at 7:10:40 AM UTC-8, Tom Gardner wrote:
>> On 01/12/15 14:58, Simon wrote:
>>> I want to make sure that any later additions don't affect any previous results. It's effort to do so, and my time is limited, but it will actually save time in the long run.
>>
>> That's what suites of test benches are for. The software
>> world has triumphantly reinvented the concept and called
>> them "unit tests".
>>
>> It is normal to have a hierarchy of test suites. Some
>> can be run frequently because they are are a fast "sanity
>> check" that just tests simple externally observable
>> behaviour of whatever unit is being tested. Some tests
>> are run at major points in the design because they test
>> the internal operation in detail, and hence are slow.
>
> [grin] I'm well aware what unit tests are for, I've written a *lot* of them in my day job over the last few decades, although admittedly not in verilog :) The problem is not the lack of knowledge (for once), it's the will to sit down and do something that doesn't seemingly advance the project... It's a lot more fun to write code than to write code that tests code...
>
> As I said though, it is getting to the point (in all honesty, it's way past the point) where manual checking of things like this is no longer viable. Unit tests feature in my future ...

:)

Ah, but are you ready for the softies next dogma, "TDD"?
Take a good thing, unit tests, and confidently state
that they are necessary /and sufficient/ for a good product.

None of this BUFD (big[1] up-front design) nonsense. Write
a test, create something that passes the test, and move on
to the next test. Never mind the quality/completeness of
the tests, if you get a green light after running the
tests then /by definition/ it works.

Yup, ignorant youngsters are taught that and believe it :(


[1] in typo veritas: I first wrote "bug" :)

Article: 158470
Subject: Re: Simulation vs Synthesis
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Wed, 2 Dec 2015 00:21:04 +0000 (UTC)
Links: << >>  << T >>  << A >>
Simon <google@gornall.net> wrote:
> Sorry, I didn't really explain the 32-bit newSPData register, did I ? 
> In my defence, my 3-year old was clamouring for his evening meal, 
> and his mother was busy :)
 
> What I'd been trying to do was split up the code into separate areas 
> by module, so generally speaking:
 
(snip) 
> - there's a module ("execute.v") that handles doing the actual work of 
> each opcode, placing the results in intermediate registers 
> (output ports of the module) 

(snip)

Early in the processing the synthesis tools flatten the netlist.

That is, all the modules go away. Just a big collection of gates like
one big module.  We find it easier to think about logic one module 
at a time, but it seems not easier for the computer.

Not so much longer after that, duplicate logic, including duplicate
registers are detected. If you have two registers in different modules
with the same inputs and clocks, one will be removed.  (Same module,
too, but that is more obvious to us.) 

Later, any logic where the output doesn't go anywhere is removed,
recursively. Also, any logic that has a constant output is removed,
and replaced by the constant. 

You might find: https://www.coursera.org/course/vlsicad interesting.

(Even though it has ended, it looks like it will still let you sign up.)

-- glen

Article: 158471
Subject: Re: Simulation vs Synthesis
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Wed, 2 Dec 2015 00:23:58 +0000 (UTC)
Links: << >>  << T >>  << A >>
Simon <google@gornall.net> wrote:
> Thanks :)
 
> As I mentioned just above, I might jump ahead and implement PLA 
> (which will force a *read* of the stack values rather than just the 
> current writes) and see if that has an effect. 

As I said before, one reason to turn off the optimization is to see
how big it will be when it isn't optimized out. 

It is sometimes useful to know early how big an FPGA is needed.

But for actual use, you might just as well let it optimize away.

-- glen


 

Article: 158472
Subject: Re: Simulation vs Synthesis
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Wed, 2 Dec 2015 00:32:58 +0000 (UTC)
Links: << >>  << T >>  << A >>
Simon <google@gornall.net> wrote:

(snip)


> [grin] I'm well aware what unit tests are for, I've written a *lot* 
> of them in my day job over the last few decades, although admittedly 
> not in verilog :) The problem is not the lack of knowledge (for once), 
> it's the will to sit down and do something that doesn't seemingly 
> advance the project... It's a lot more fun to write code than to write 
> code that tests code...

I believe it is one of Brooks' laws of software engineering
(applies here, even though it isn't software):

   "Writing the code takes the first 90% of the time,
    debugging takes the second 90%."

https://en.wikipedia.org/wiki/The_Mythical_Man-Month

-- glen

Article: 158473
Subject: Re: Simulation vs Synthesis
From: BobH <wanderingmetalhead.nospam.please@yahoo.com>
Date: Tue, 1 Dec 2015 18:55:00 -0700
Links: << >>  << T >>  << A >>
On 11/30/2015 5:34 PM, rickman wrote:
> On 11/30/2015 6:44 PM, BobH wrote:
>> A mistake that I have made, is to mis-spell the wire connection and then
>> there is no user for the outputs. The easiest way to check that is to
>> inspect the simulation at the inputs to the next stage that uses the
>> data and make sure that they are wiggling as you expect and not showing
>> undefined as they would for an undriven wire. The second easiest way to
>> check that is to eyeball the naming for this problem.
>
> If you make a spelling error, won't that be flagged because that signal
> hasn't been declared?
>
Often the auto-wire "feature" will generate a replacement. If you go 
through the logs, it is noted, and usually the auto-wire will be a 
single wide signal instead of a bus, so it shows up that way too.

Article: 158474
Subject: Re: Simulation vs Synthesis
From: BobH <wanderingmetalhead.nospam.please@yahoo.com>
Date: Tue, 1 Dec 2015 19:19:03 -0700
Links: << >>  << T >>  << A >>
On 11/30/2015 11:15 PM, Simon wrote:
> My solution to the 2-cycle instructions was to declare 2 pages-worth
> of registers: page-0, (which is special for the 6502, with special
> opcodes that take less time to run if they access there) and the
> stack (which is page-1). The 6502 has an 8-bit stack-pointer, that
> it always prepends 01h to (to form 16'h01xx), providing a 256-deep stack.
> The use of a register array for both these pages significantly helps
> when I only have 2 clocks to play with. Obviously when the CPU wants
> to store or read values, I need to determine if it's page-0 or page-1
> and redirect accordingly, but that's not a high price to pay.

If I understand correctly, the root of the problem you are describing is 
that you are trying to use an array of registers as RAM, and it is 
optimizing out big chunks or all of it. Trying to build a synthesizable 
array of addressable registers is a pain in the butt in Verilog. There 
is probably a way to do it with genvars or maybe a for loop, but in the 
past I have just brute forced it. Using genvars seems like the a 
promising path, but the only exposure to them that I have had is 
debugging cases where Xilinx ISE (v14) would not handle them as expected.

The brute force might look like:

module reg_ram
(
   input wire [1:0] address,
   input wire [7:0] write_data,
   input wire       write_en,
   input wire       clk,
   input wire       rstn,
   output reg [7:0] read_data
);

reg [7:0] cell0, cell1, cell2, cell3;

always @(posedge clk or negedge rstn)
if (~rstn)
   cell0 <= 8'h0;
else
   if (write_en & (address == 2'h0))
     cell0 <= write_data;

always @(posedge clk or negedge rstn)
if (~rstn)
   cell1 <= 8'h0;
else
   if (write_en & (address == 2'h1))
     cell1 <= write_data;

always @(posedge clk or negedge rstn)
if (~rstn)
   cell2 <= 8'h0;
else
   if (write_en & (address == 2'h2))
     cell2 <= write_data;

always @(posedge clk or negedge rstn)
if (~rstn)
   cell3 <= 8'h0;
else
   if (write_en & (address == 2'h3))
     cell3 <= write_data;

always @( * )
   case (address)
     2'h0: read_data = cell0;
     2'h0: read_data = cell1;
     2'h0: read_data = cell2;
     2'h0: read_data = cell3;
   endcase
endmodule

As rude as this looks, most of the other structures that I can think of 
result in something that looks like a huge barrel shifter and are larger 
to implement.





Site Home   Archive Home   FAQ Home   How to search the Archive   How to Navigate the Archive   
Compare FPGA features and resources   

Threads starting:
1994JulAugSepOctNovDec1994
1995JanFebMarAprMayJunJulAugSepOctNovDec1995
1996JanFebMarAprMayJunJulAugSepOctNovDec1996
1997JanFebMarAprMayJunJulAugSepOctNovDec1997
1998JanFebMarAprMayJunJulAugSepOctNovDec1998
1999JanFebMarAprMayJunJulAugSepOctNovDec1999
2000JanFebMarAprMayJunJulAugSepOctNovDec2000
2001JanFebMarAprMayJunJulAugSepOctNovDec2001
2002JanFebMarAprMayJunJulAugSepOctNovDec2002
2003JanFebMarAprMayJunJulAugSepOctNovDec2003
2004JanFebMarAprMayJunJulAugSepOctNovDec2004
2005JanFebMarAprMayJunJulAugSepOctNovDec2005
2006JanFebMarAprMayJunJulAugSepOctNovDec2006
2007JanFebMarAprMayJunJulAugSepOctNovDec2007
2008JanFebMarAprMayJunJulAugSepOctNovDec2008
2009JanFebMarAprMayJunJulAugSepOctNovDec2009
2010JanFebMarAprMayJunJulAugSepOctNovDec2010
2011JanFebMarAprMayJunJulAugSepOctNovDec2011
2012JanFebMarAprMayJunJulAugSepOctNovDec2012
2013JanFebMarAprMayJunJulAugSepOctNovDec2013
2014JanFebMarAprMayJunJulAugSepOctNovDec2014
2015JanFebMarAprMayJunJulAugSepOctNovDec2015
2016JanFebMarAprMayJunJulAugSepOctNovDec2016
2017JanFebMarAprMayJunJulAugSepOctNovDec2017
2018JanFebMarAprMayJunJulAugSepOctNovDec2018
2019JanFebMarAprMayJunJulAugSepOctNovDec2019
2020JanFebMarAprMay2020

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search