Messages from 139100

Article: 139100
Subject: Re: How big is my vhdl and am I approaching some size limitation on the chip.
From: Jonathan Bromley <jonathan.bromley@MYCOMPANY.com>
Date: Fri, 20 Mar 2009 20:17:22 +0000
Links: << >> << T >> << A >>

On Fri, 20 Mar 2009 12:15:23 -0700 (PDT), jleslie48 wrote:

>but I cant seem to compare these two chips, the VIRTEX II vs the
>XCR3512XL-12-PQ208
>its apples to oranges, how does it work?

It really IS apples and oranges.  The structures are so different
that you can't expect design size considerations to map 
meaningfully from one to the other.

CPLDs excel at wide decoding functions.  Each logic cell 
(typically, but not necessarily, including a flip-flop) 
computes the logical OR of a bunch of "product terms".  
Each product term is the logical AND of any selection 
of a very large number of signals - resource cost is 
driven by number of PTs, NOT by the number of inputs
to a given PT.  PT inputs are basically free.

FPGAs have a much higher ratio of flip-flops to logic.  The 
basic logic function is a 4-input lookup table, i.e. any 
logical function you care - but only with 4 inputs.  Wide
AND functions and decodes are quite expensive of area.
Inputs to a logic function are expensive as soon as you
have more than 4 of them.  (OK, it might be 5 or 6 in
some newer devices.)

Of course, synthesis tools will cram your specified
VHDL functionality into either, and will get reasonably
good optimization in either.  But to say "adding this 
function made my FPGA only 20% bigger. so why does it
make my CPLD 40% bigger?" is a question with no
sensible answer.  The devil is in the detail.
-- 
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which 
are not the views of Doulos Ltd., unless specifically stated.

Article: 139101
Subject: Re: Zero operand CPUs
From: Jacko <jackokring@gmail.com>
Date: Fri, 20 Mar 2009 13:48:07 -0700 (PDT)
Links: << >> << T >> << A >>

On 20 Mar, 03:08, Albert van der Horst <alb...@spenarnc.xs4all.nl>
wrote:
> In article <1478a4ab-62d3-4e9d-b1a3-769f84726...@j8g2000yql.googlegroups.=
com>,
>
> Jacko =A0<jackokr...@gmail.com> wrote:
> >hi
>
> >Chuck and the poe are in the design lab,
>
> >Chuck sys to pope "Have you got a rubber, my designs are getting big?"
> >Pope says "That's a bit RISCy!"
>
> >So if you had possiblly 4 instructions to do stack init pointers and
> >save both aswell, what would you use?
>
> Do you ever read over your posts before submission?
> Do you have a spelling checker?
>
>
>
> >cheers jacko
>
> --
> --
> Albert van der Horst, UTRECHT,THE NETHERLANDS
> Economic growth -- like all pyramid schemes -- ultimately falters.
> albert@spe&ar&c.xs4all.nl &=3Dnhttp://home.hccnet.nl/a.w.m.van.der.horst

Darwin, Mutation and the Death of a Lnguage via Stagnation
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D

Oh yo evil looking mutated word, you is dead right, no sexy none. Me
bee's full oxford smili life, gets me the beer token and enduf this
wanton easo-speak.

cheers jacko

Article: 139102
Subject: Re: How big is my vhdl and am I approaching some size limitation
From: Mike Treseler <mtreseler@gmail.com>
Date: Fri, 20 Mar 2009 13:48:47 -0700
Links: << >> << T >> << A >>

jleslie48 wrote:

>> any idea on how to make it fit?

If it has to be that device, I would need two of them.

> how can I find out who is the piggy, and what can I due to trim things
> down?

Synthesis has already done the trimming.
The device is too small.

> but I cant seem to compare these two chips, the VIRTEX II vs the
> XCR3512XL-12-PQ208
> its apples to oranges, how does it work?

Rerun synthesis and check the % utilization

   -- Mike Treseler

Article: 139103
Subject: Re: How big is my vhdl and am I approaching some size limitation on the chip.
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Fri, 20 Mar 2009 20:49:36 +0000 (UTC)
Links: << >> << T >> << A >>

Jonathan Bromley <jonathan.bromley@mycompany.com> wrote:
(big snip)
 
> FPGAs have a much higher ratio of flip-flops to logic.  The 
> basic logic function is a 4-input lookup table, i.e. any 
> logical function you care - but only with 4 inputs.  Wide
> AND functions and decodes are quite expensive of area.
> Inputs to a logic function are expensive as soon as you
> have more than 4 of them.  (OK, it might be 5 or 6 in
> some newer devices.)

I think some can use the carry chain logic to
do wide AND (or NOR if inverted) logic.  Maybe
still not quite the same as CPLD (PAL) logic,
but fast and not too expensive.

-- glen

Article: 139104
Subject: Re Zero operand CPUs
From: AliBama@gmail.com
Date: Fri, 20 Mar 2009 16:54:01 -0500
Links: << >> << T >> << A >>

In article <1478a4ab-62d3-4e9d-b1a3-769f847260a2@j8g2000yql.googlegroups.com>, 
Jacko <jackokring@gmail.com> wrote: 

> So if you had possiblly 4 instructions to do stack init pointers and
> save both aswell, what would you use?
> 
------------
Previously Jacko wrote:
> You will find the subroutine RI FI RO SO BA does the same as
>    the forth word lit.
> 
>    == subroutine to do move #$222 to address $665 ==
>    $lit ,
>    $666 ,  <-- ? typo  $665 <> $666 or is this for the auto-decr ?
>    $lit ,
>    $222 ,
>    SI FA FO BA
=========
Does $lit , $666 , $lit ,  $222 mean:
"push    $666, push    $222" ?

Then  SI FA FO BA ==
(S)->A ; (S)->Q ; A->(Q) ; (R)->P == ?

If it's a stack-machine, then which register
is the TOS-pointer ?  

Let's try to work backwards:-
 (R)->P should move #$222 to address $665?6
 ==> P  == address $665 [or $666]
    and
R pointed to mem-containing #$222

So, how did    $lit ,    $222 [ push    $222] get
$222 into mem pointed to by R ?
-------------
Perhaps this is all obvious for someone with a
VHDL design background, but this forth-group has
just degenerated into clowing, with this thread.

Can anybody contribute any knowledge/interpretation
to how this nibz works ?
-------------
someone wrote:
> >  How  many  bits are in an opcode, 4 or 5?  
> >  I would say it has to be five
> > or there is no way for the machine to distinguish between 
> >  an opcode and  an  address.   In  other  words,  there   
> >  *has*  to be  a CALL  instruction,
> > even if it is just a one bit opcode with the rest being the address.

Jacko wrote:
>    if(instructionRegister<16) doOpcode(instructionRegister) else 
>       doSubroutine(instructionRegister);

IMO this corresponds to the wiki:
> For this example we will take the most complex to understand instruction BO.
>    Details
> 
>    BO ((S) AND A) mul 2 -> A
> 
>    This translates as BOth instruction: Load the indirect contents of
> memory indexed by S and 'and' it with A.
> Then shift this left one bit through the  CarryRollBit and save this 
> result into A.
> Also increase S by 1 to do the post increment indexing.

So, '16 basic instructions' [need 4 bits] and the BO instruction, 
would get the BasicInstr/Subroutine 1-bit-flag.

This would imply a 5bit wide word, which is obviously not
the case ?
------------- who can uderstand/explain this:--
>There are no conditional instructions, SU of the carry to (R) 
>is the major branching method.
...
>        * SU (S)+A->A - SUm (CarryRollBit added too)
--- 
Since of the 16 instruction, the only ALU types: +, xor, and;
use 'A' as one would expect an accumulator to be used 
[as a source & destination for the binary op.], 'A' *is* the
accumulator !! Similarly 'S' is seen to be the TOS-pointer.
--- wiki says:
> All  opcodes above 15 are subroutine call addresses.
OK, so for an 8 bit wide word, you get 256-15 possible subroutines ?
And the subroutines also use the basic 16 instructions, including
 possibly nested-subroutine/s ?
No, because the first-level of subroutines has already been allocated
 the 256-16 subroutine-pointers ?
Ok, you can only have 256-16 *different* subroutines, but they can be
 nested - limited only by RAM to hold a stacked-returns ?

== Chris Glur.

Article: 139105
Subject: Re: FPGA users, Please take a few seconds to report SPAM
From: James Harris <james.harris.1@googlemail.com>
Date: Fri, 20 Mar 2009 16:14:00 -0700 (PDT)
Links: << >> << T >> << A >>

On 20 Mar, 15:03, VAX9...@gmail.com wrote:
> Dear FPGA users,
>
> This news group is valuable to all of us. I found that Google Group
> now allows users to report spam. I suggest we all take a few seconds
> to report spams on this comp.misc.fpga group. The result is yet to
> see, but we can start now.
>
> Just click into the spam message, then click "options" in the subject
> line, then "more options" in author line. I simply describe the reason
> as "This message is spam". I hope this helps. Thank you!

I have been reporting spam on other groups and, after a long time, the
senders Google accounts were removed. I think it took about four to
eight weeks. I send this to reassure that they do seem to take notice
eventually but patience is required.

James

Article: 139106
Subject: Re: Re Zero operand CPUs
From: Jacko <jackokring@gmail.com>
Date: Fri, 20 Mar 2009 16:39:45 -0700 (PDT)
Links: << >> << T >> << A >>

On 20 Mar, 21:54, AliB...@gmail.com wrote:
> In article <1478a4ab-62d3-4e9d-b1a3-769f84726...@j8g2000yql.googlegroups.=
com>,
>
> Jacko <jackokr...@gmail.com> wrote:
> > So if you had possiblly 4 instructions to do stack init pointers and
> > save both aswell, what would you use?
>
> ------------Previously Jacko wrote:
> > You will find the subroutine RI FI RO SO BA does the same as
> > =A0 =A0the forth word lit.
>
> > =A0 =A0=3D=3D subroutine to do move #$222 to address $665 =3D=3D
> > =A0 =A0$lit ,
> > =A0 =A0$666 , =A0<-- ? typo =A0$665 <> $666 or is this for the auto-dec=
r ?

Yes, It sure is.

> > =A0 =A0$lit ,
> > =A0 =A0$222 ,
> > =A0 =A0SI FA FO BA
>
> =3D=3D=3D=3D=3D=3D=3D=3D=3D
> Does $lit , $666 , $lit , =A0$222 mean:
> "push =A0 =A0$666, push =A0 =A0$222" ?

Yes when lit is defined as the soubroutine equivelent to lit.

> Then =A0SI FA FO BA =3D=3D
> (S)->A ; (S)->Q ; A->(Q) ; (R)->P =3D=3D ?

I would think so!!
TOS->A, TOS(2)->Q, STORE TOS -> TOS(2) ADDRESS, GET REURN TO PROGRAM
COUNTER

> If it's a stack-machine, then which register
> is the TOS-pointer ? =A0

S, but a top of stack optimization using A may be possible, for speed,
but lower space efficiency.

> Let's try to work backwards:-
> =A0(R)->P should move #$222 to address $665?6
> =A0=3D=3D> P =A0=3D=3D address $665 [or $666]
> =A0 =A0 and
> R pointed to mem-containing #$222
>
> So, how did =A0 =A0$lit , =A0 =A0$222 [ push =A0 =A0$222] get
> $222 into mem pointed to by R ?
> -------------

RI FI SO RO BA where SO RO commutes to RO SO as a duplicate expression
for same function.

(R)->Q,(Q)->A,A->(S),Q->(R),(R)->P
get return address and get indirect next address following return
address, and save this on stack, and put incremented return address
back on return stack and get return address (modified by +1) into
program counter to execute a return to the address following the
literal value.

> Perhaps this is all obvious for someone with a
> VHDL design background, but this forth-group has
> just degenerated into clowing, with this thread.

;-)

> Can anybody contribute any knowledge/interpretation
> to how this nibz works ?
> -------------
>
> someone wrote:
> > > =A0How =A0many =A0bits are in an opcode, 4 or 5? =A0
> > > =A0I would say it has to be five
> > > or there is no way for the machine to distinguish between
> > > =A0an opcode and =A0an =A0address. =A0 In =A0other =A0words, =A0there=
 =A0
> > > =A0*has* =A0to be =A0a CALL =A0instruction,
> > > even if it is just a one bit opcode with the rest being the address.
> Jacko wrote:
> > =A0 =A0if(instructionRegister<16) doOpcode(instructionRegister) else
> > =A0 =A0 =A0 doSubroutine(instructionRegister);
>
> IMO this corresponds to the wiki:
>
> > For this example we will take the most complex to understand instructio=
n BO.
> > =A0 =A0Details
>
> > =A0 =A0BO ((S) AND A) mul 2 -> A
>
> > =A0 =A0This translates as BOth instruction: Load the indirect contents =
of
> > memory indexed by S and 'and' it with A.
> > Then shift this left one bit through the =A0CarryRollBit and save this
> > result into A.
> > Also increase S by 1 to do the post increment indexing.
>
> So, '16 basic instructions' [need 4 bits] and the BO instruction,
> would get the BasicInstr/Subroutine 1-bit-flag.

No, a basic instruction is a number under 16, any number over 15 is an
address. This does have a disadvantage of not being able to call a
subroutine below address 16 but this is not a major fault, as boot
code would be here, and it is possible to place a subroutine call
instruction within these addresses.

> This would imply a 5bit wide word, which is obviously not
> the case ?

The implication of the extra bit 'needed' is not a true account of
functioning.

> ------------- who can uderstand/explain this:-->There are no conditional =
instructions, SU of the carry to (R)
> >is the major branching method.
> ...
> > =A0 =A0 =A0 =A0* SU (S)+A->A - SUm (CarryRollBit added too)

(R)->Q,Q->(S),(S)->A,A+(S)->A,A->(S),(S)->Q,Q->(R),(R)->P

> ---
> Since of the 16 instruction, the only ALU types: +, xor, and;
> use 'A' as one would expect an accumulator to be used
> [as a source & destination for the binary op.], 'A' *is* the
> accumulator !! Similarly 'S' is seen to be the TOS-pointer.
> --- wiki says:> All =A0opcodes above 15 are subroutine call addresses.
>
> OK, so for an 8 bit wide word, you get 256-15 possible subroutines ?
> And the subroutines also use the basic 16 instructions, including
> =A0possibly nested-subroutine/s ?
> No, because the first-level of subroutines has already been allocated
> =A0the 256-16 subroutine-pointers ?
> Ok, you can only have 256-16 *different* subroutines, but they can be
> =A0nested - limited only by RAM to hold a stacked-returns ?
>
> =3D=3D Chris Glur.

Hope this helps.

Cheers jacko

Article: 139107
Subject: Re: How big is my vhdl and am I approaching some size limitation on
From: jleslie48 <jon@jonathanleslie.com>
Date: Fri, 20 Mar 2009 17:56:39 -0700 (PDT)
Links: << >> << T >> << A >>

Well this is a bummer.  Here I think I'm being careful, working things
out with
test bench, re-building all the while and checking for growth, only to
be blind sided
when I hook up the pin to the generated signal.

Meantime I've got some more info and questions.

1) >> any idea on how to make it fit?

If it has to be that device, I would need two of them.

that chip we are getting for around $100, I don't even know where to
buy them, and where do I get a Virtex II-PRO chip? digikey says they
are $1000??  Wouldn't I be better off getting the Virtex II-PRO?
Meantime the old chip is
mounted on a custom board layout, I guess my hardware guys are going
to have to re-lay out the board with
two of these chips?  that won't work anyway as I added 600 macrocells
to a chip that only had 512 to begin with...
I think I was able to chip that down to under 512 though, but not by
much.

2) what exactly is a macrocell anyway?



3) "Rerun synthesis and check the % utilization "
that's what I've been doing.  basically I added the equivalent of soft
uart and the data generator state machine that Jonathan so kindly set
me up with.  So I started backing out that code bit by bit to see
where I pop the %s.  When I took out the data generator I only freed
up a few macro cells, I tried reducing the fixed buffers, and that
again only freed up some of the small weeds. When I deleted the UART
however, it cleared up the whole mess. But now I don't know if thats
because I've effectively dead ended other parts, ...  I'm still
driving the output now, but I can't believe my little uart is that big
a deal.   I'm wondering if there is an expense in the separate
modules, and instantiations, or maybe the
'reverse' function.  My next step is to start stubbing out sections
and see what causes the growth.  It would be nice if in all those
reports that get generated they assigned macrocells/pterms etc back to
the source code that generated them.


 Directory of C:\jon\oats

03/20/2009  10:36 AM            19,460 data_gen_40.vhd
03/19/2009  03:48 PM             8,795 OATS_TOP.ucf
03/20/2009  04:56 PM           263,536 OATS_Top.vhd

03/09/2009  11:21 AM               941 mod_m_counter.vhd
03/04/2009  05:23 PM             3,486 fifo.vhd
03/19/2009  03:23 PM             5,635 oats_top_tb.vhd
03/20/2009  01:10 PM             3,851 uart_core40.vhd
03/12/2009  09:22 AM             2,756 uart_rx40.vhd
03/12/2009  12:31 PM             3,734 uart_tx40.vhd





Now clearly source size doesn't make much difference, the OATS_TOP.VHD
program is ridiculously big, but as shown above used reasonable
amounts of resources.  I added some clocking and counters to OATS_TOP:

------------------------------

FUNCTION     to_slv                (c: character) RETURN
STD_LOGIC_VECTOR  IS
  BEGIN
    RETURN std_logic_vector(to_unsigned(character'pos(c), 8));
  END;

FUNCTION     reverse               (a : IN STD_LOGIC_VECTOR) RETURN
STD_LOGIC_VECTOR IS
VARIABLE
        result : STD_LOGIC_VECTOR(a'RANGE);
ALIAS   aa     : STD_LOGIC_VECTOR(a'REVERSE_RANGE) IS a;
BEGIN
    FOR i IN aa'RANGE LOOP
       result(i) := aa(i);
       END LOOP;
    RETURN result;
END;

...

clock_4hz:  PROCESS ( system_clock_used )
BEGIN
     IF ( rising_edge(system_clock_used)) then
          IF ( clk_4hz_countdown = 0) THEN
               clk_4hz_countdown <= human_clock_count;
               clk_4hz <= NOT clk_4hz;
            else
               clk_4hz_countdown <= clk_4hz_countdown -1;
               End if; -- countdown ife

     END IF;
END PROCESS clock_4hz;

clock_2mhz:  PROCESS ( system_clock_used )
BEGIN
     IF ( rising_edge(system_clock_used)) then
          IF ( clk_2mhz_countdown = 0) THEN
               clk_2mhz_countdown <= clk_2mhz_clock_count;       --
base clock choice
               clk_2mhz <= NOT clk_2mhz;
            else
               clk_2mhz_countdown <= clk_2mhz_countdown -1;
               End if; -- countdown ife
     END IF;
END PROCESS clock_2mhz;

clock_2mhz_ctr:  PROCESS ( clk_2mhz )
BEGIN
     IF ( rising_edge(clk_2mhz)) then
               time_cntr_500ns     <= time_cntr_500ns +1;
     END IF;
END PROCESS clock_2mhz_ctr;

-- this clock is not uart related
clock_7812hz:  PROCESS ( system_clock_used )
BEGIN
     IF ( rising_edge(system_clock_used)) then
          IF ( clk_7812hz_countdown = 0) THEN
               clk_7812hz_countdown <= clk_7812hz_clock_count;       --
base clock choice
               clk_7812hz <= NOT clk_7812hz;
               if (clk_7812hz = '0') Then
                       clk_7812hz_tick <= '1';
                       end if;
            else
               clk_7812hz_tick <= '0';
               clk_7812hz_countdown <= clk_7812hz_countdown -1;
               if ( (initialize_done = '0'      ) AND
                    (clk_7812hz_countdown = 1276)     ) Then
                        initialize_done <= '1';
                        initialize_data_gen <= '1';
                  else
                        initialize_data_gen <= '0';

                    end if; -- initialize_done
               End if; -- countdown ife

     END IF;
END PROCESS clock_7812hz;


clock_7812_ctr:  PROCESS ( clk_7812hz )
BEGIN
     IF ( rising_edge(clk_7812hz)) then
               time_cntr_128us     <= time_cntr_128us +1;
               uptime_at_128us      <= time_cntr_500ns;

               a2mhz_parity_plus(7) <=
                   (   (time_cntr_500ns(39) xor time_cntr_500ns(38))
xor
                       (time_cntr_500ns(37) xor time_cntr_500ns(36))
                   ) xor
                   (   (time_cntr_500ns(35) xor time_cntr_500ns(34))
xor
                       (time_cntr_500ns(33) xor time_cntr_500ns(32))
                   );
               a2mhz_parity_plus(6) <=
                   (   (time_cntr_500ns(31) xor time_cntr_500ns(30))
xor
                       (time_cntr_500ns(29) xor time_cntr_500ns(28))
                   ) xor
                   (   (time_cntr_500ns(27) xor time_cntr_500ns(26))
xor
                       (time_cntr_500ns(25) xor time_cntr_500ns(24))
                   );
               a2mhz_parity_plus(5) <=
                   (   (time_cntr_500ns(23) xor time_cntr_500ns(22))
xor
                       (time_cntr_500ns(21) xor time_cntr_500ns(20))
                   ) xor
                   (   (time_cntr_500ns(19) xor time_cntr_500ns(18))
xor
                       (time_cntr_500ns(17) xor time_cntr_500ns(16))
                   );
               a2mhz_parity_plus(4) <=
                   (   (time_cntr_500ns(15) xor time_cntr_500ns(14))
xor
                       (time_cntr_500ns(13) xor time_cntr_500ns(12))
                   ) xor
                   (   (time_cntr_500ns(11) xor time_cntr_500ns(10))
xor
                       (time_cntr_500ns(09) xor time_cntr_500ns(08))
                   );
               a2mhz_parity_plus(3) <=
                   (   (time_cntr_500ns(07) xor time_cntr_500ns(06))
xor
                       (time_cntr_500ns(05) xor time_cntr_500ns(04))
                   ) xor
                   (   (time_cntr_500ns(03) xor time_cntr_500ns(02))
xor
                       (time_cntr_500ns(01) xor time_cntr_500ns(00))
                   );

     END IF;
END PROCESS clock_7812_ctr;



data_message_handler:  PROCESS ( system_clock_used)
BEGIN
     IF ( rising_edge(system_clock_used) ) THEN
        if ( (w40_wanted = '1'     ) and
             (clk_7812hz_tick = '1')
           )                             then
           w40_data_from_main <=
                       reverse(
                                   a2mhz_optional_message
                                 & std_logic_vector(uptime_at_128us)
                                 & a2mhz_parity_plus
                                 & a2mhz_optional_message
                               )  ; -- big endian for BAE.

           w40_ready <= '1';
         else
            w40_ready <= '0';
           end if;  -- w40_wanted ite
     else
        --w40_ready <= '0';
     END IF;  --clock edge

END PROCESS data_message_handler;

---------------------
    --2mhz communications uart  begin

   -- instantiate uart
   a2mhz_uart_unit: entity work.uart40(str_arch)
      generic map (
                dbit     => a2mhz_data_bit_count,
                sb_tick  => a2mhz_clock_tick_per_sampling_rate,
                dvsr     => a2mhz_baud_rate_divisor,
                dvsr_bit => 2,   -- number of bits necessary to hold
dvsr
                FIFO_W   => 2    -- 2**(value) is the number of chars
that can be queued.
              ) -- generic map


      port map(
               clk            => system_clock_used,
               reset          => initialize_data_gen,
               rd_uart        => a2mhz_RX_READ_BUFFER_STB,
               wr_uart        => a2mhz_TX_WRITE_BUFFER_STB,
               rx             => a2mhz_HUART_RX_LINE,
               w_data         => a2mhz_TX_1CHAR_BUF,
               tx_full            => a2mhz_TX_BUFFER_FULL,
               rx_empty           => open,
               rx_not_empty       => a2mhz_RX_BUFFER_DATA_PRESENT,
               r_data             => a2mhz_RX_1CHAR_BUF,
               tx                 => a2mhz_HUART_TX_LINE,
               baud_rate_tick     => a2mhz_UART_EN_16_x_BAUD
              );



a2mhz_DATA_GENERATOR: entity work.data_gen_40
         generic map
           ( PC_bits       => 5
           , dbit          => a2mhz_data_bit_count
           , the_program   =>

               -- Long startup delay
               op40_DELAY & 200
&                                            --2 bytes long

               op40_LABEL & 04 &
                  op40_WAIT_FOR_W40 &
                  op40_GOTOL            & 04 &           -- spin on
printing W40's from here on in.

               op40_HALT
           )
         port map
           ( clock         => system_clock_used
           , reset         => initialize_data_gen
           , timer         => a2mhz_UART_EN_16_x_BAUD
           , tx_data            => a2mhz_TX_1CHAR_BUF
           , tx_valid           => a2mhz_tx_valid
           , tx_ready      => a2mhz_tx_ready
           , rx_data       => a2mhz_RX_1CHAR_BUF
           , lbl_data      => a2mhz_lbl_data_from_main
           , rx_valid      => a2mhz_RX_BUFFER_DATA_PRESENT
           , rx_needed          => a2mhz_rx_wanted
           , reset_out          => open --UART_RESET_BUFFER
           , lbl_needed         => a2mhz_lbl_wanted
           , halted             => a2mhz_halted
           , error_cond         => a2mhz_error_cond_main
           , w40_data      => w40_data_from_main
           , w40_ready     => w40_ready
           , w40_needed         => w40_wanted
           );




  -- JSEB: Conditioning of interface signals between UART and data
generator

  a2mhz_tx_ready <= not a2mhz_TX_BUFFER_FULL;
  a2mhz_TX_WRITE_BUFFER_STB <= a2mhz_tx_valid and a2mhz_tx_ready;  --
Write only when it's safe

  a2mhz_RX_READ_BUFFER_STB <= a2mhz_rx_wanted and
a2mhz_RX_BUFFER_DATA_PRESENT;
  a2mhz_HUART_TX_CK_LINE      <= clk_2mhz;


    --2mhz communications uart  end
---------------------




now by removing the   a2mhz_uart_unit entity, I return to acceptable
levels the old levels.  That leads me to believe
I just put all the logic in the unused/don't bother pile and once I
put the uart back in, Is the REVERSE function the problem?




The uart is pretty simple:






----------Uart_core40.vhd-------------
-- Listing 7.4
--
-- jl 090226  First working.  WATCH OUT!!! DVSR_BIT!!! for 19200
baud,
--            the DVSR was 325, and guess what?  that means that you
need
--            9 bits instead of 8 for the DVSR_BIT, this still synthed
ok
--            but generated 175 warnings.  changing it to 9 bits (or
115,200 baud means
--            the synth gets through with 5 warnings, and works.
--    090312  copy and redo of uart_core.vhd.  this one is to
customize to the to the 40 bit
--            (probably expand to 64) comm port for BAE.
--

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity uart40 is
   generic(
     -- Default setting:
     -- xxx baud, 8 data bis, 1 stop its, 2^2 FIFO

      DBIT        : integer:=8;     -- # data bits
      SB_TICK     : integer:=16;    -- # ticks for stop bits, 16/24/32
                                    --   for 1/1.5/2 stop bits
      DVSR        : integer:= 325;  -- baud rate divisor
                                    -- DVSR = 50M/(16*baud rate(19200))
== 162.76
			            -- 100m/(16*19200) ==325.52
			            -- 100m/(16*115200) ==54.25
      DVSR_BIT    : integer:=9;     -- # bits of DVSR 325 needs 9
bits!!!!!
      FIFO_W      : integer:=2      -- # addr bits of FIFO
                                    -- # words in FIFO=2^FIFO_W
   );
   port(
      clk,
      reset:           in std_logic;
      rd_uart,
      wr_uart:         in std_logic;
      rx:              in std_logic;
      w_data:          in std_logic_vector((dbit-1) downto 0);
      tx_full,
      rx_empty:           out std_logic;
      rx_not_empty:       out std_logic;
      r_data:             out std_logic_vector((dbit-1) downto 0);
      tx:                 out std_logic;
      baud_rate_tick:     out std_logic
   );
end uart40;

architecture str_arch of uart40 is
   signal tick: std_logic;
   signal rx_done_tick: std_logic;
   signal tx_fifo_out: std_logic_vector((dbit-1) downto 0);
   signal rx_data_out: std_logic_vector((dbit-1) downto 0);
   signal tx_empty, tx_fifo_not_empty: std_logic;
   signal tx_done_tick: std_logic;
begin
   baud_gen_unit: entity work.mod_m_counter(arch)
      generic map(M=>DVSR,
                  N=>DVSR_BIT)
      port map(clk    =>clk,
               reset  =>reset,
               q         =>open,
               max_tick  =>tick
              );

   uart40_rx_unit: entity work.uart40_rx(arch)
      generic map(DBIT=>DBIT,
                  SB_TICK=>SB_TICK)
      port map(clk=>clk,
               reset=>reset,
               rx=>rx,
               s_tick=>tick,
               rx_done_tick=>rx_done_tick,
               dout=>rx_data_out);

   fifo_rx_unit: entity work.fifo(arch)
      generic map(B=>DBIT, W=>FIFO_W)
      port map(clk=>clk,
               reset=>reset,
               rd=>rd_uart,
               wr=>rx_done_tick,
               w_data=>rx_data_out,
               empty=>rx_empty,
               notempty=>rx_not_empty,
               full=>open,
               r_data=>r_data);

   fifo_tx_unit: entity work.fifo(arch)
      generic map(B=>DBIT,
                  W=>FIFO_W)
      port map(clk=>clk,
               reset=>reset,
               rd=>tx_done_tick,
               wr=>wr_uart,
               w_data=>w_data,
               empty=>tx_empty,
               notempty=>open,
               full=>tx_full,
               r_data=>tx_fifo_out);

   uart40_tx_unit: entity work.uart40_tx(arch)
      generic map(DBIT=>DBIT,
                  SB_TICK=>SB_TICK)
      port map(clk=>clk,
               reset=>reset,
               tx_start=>tx_fifo_not_empty,
               s_tick=>tick,
               din=>tx_fifo_out,
               tx_done_tick=> tx_done_tick,
               tx=>tx);

   tx_fifo_not_empty <= not tx_empty;
   baud_rate_tick <= tick;

end str_arch;


-----fifo.vhd-----
-- Listing 4.20
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity fifo is
   generic(
      B: natural:=8; -- number of bits
      W: natural:=4 -- number of address bits
   );
   port(
      clk, reset: in std_logic;
      rd, wr: in std_logic;
      w_data: in std_logic_vector (B-1 downto 0);
      empty,
      notempty,
      full: out std_logic;
      r_data: out std_logic_vector (B-1 downto 0)
   );
end fifo;

architecture arch of fifo is
   type reg_file_type is array (2**W-1 downto 0) of
        std_logic_vector(B-1 downto 0);
   signal array_reg: reg_file_type;
   signal w_ptr_reg, w_ptr_next, w_ptr_succ:
      std_logic_vector(W-1 downto 0);
   signal r_ptr_reg, r_ptr_next, r_ptr_succ:
      std_logic_vector(W-1 downto 0);
   signal full_reg, empty_reg, full_next, empty_next:
          std_logic;
   signal wr_op: std_logic_vector(1 downto 0);
   signal wr_en: std_logic;
begin
   --=================================================
   -- register file
   --=================================================
   process(clk,reset)
   begin
     if (reset='1') then
        array_reg <= (others=>(others=>'0'));
     elsif (clk'event and clk='1') then
        if wr_en='1' then
           array_reg(to_integer(unsigned(w_ptr_reg)))
                 <= w_data;
        end if;
     end if;
   end process;
   -- read port
   r_data <= array_reg(to_integer(unsigned(r_ptr_reg)));
   -- write enabled only when FIFO is not full
   wr_en <= wr and (not full_reg);

   --=================================================
   -- fifo control logic
   --=================================================
   -- register for read and write pointers
   process(clk,reset)
   begin
      if (reset='1') then
         w_ptr_reg <= (others=>'0');
         r_ptr_reg <= (others=>'0');
         full_reg <= '0';
         empty_reg <= '1';
      elsif (clk'event and clk='1') then
         w_ptr_reg <= w_ptr_next;
         r_ptr_reg <= r_ptr_next;
         full_reg <= full_next;
         empty_reg <= empty_next;
      end if;
   end process;

   -- successive pointer values
   w_ptr_succ <= std_logic_vector(unsigned(w_ptr_reg)+1);
   r_ptr_succ <= std_logic_vector(unsigned(r_ptr_reg)+1);

   -- next-state logic for read and write pointers
   wr_op <= wr & rd;
   process(w_ptr_reg,w_ptr_succ,r_ptr_reg,r_ptr_succ,wr_op,
           empty_reg,full_reg)
   begin
      w_ptr_next <= w_ptr_reg;
      r_ptr_next <= r_ptr_reg;
      full_next <= full_reg;
      empty_next <= empty_reg;
      case wr_op is
         when "00" => -- no op
         when "01" => -- read
            if (empty_reg /= '1') then -- not empty
               r_ptr_next <= r_ptr_succ;
               full_next <= '0';
               if (r_ptr_succ=w_ptr_reg) then
                  empty_next <='1';
               end if;
            end if;
         when "10" => -- write
            if (full_reg /= '1') then -- not full
               w_ptr_next <= w_ptr_succ;
               empty_next <= '0';
               if (w_ptr_succ=r_ptr_reg) then
                  full_next <='1';
               end if;
            end if;
         when others => -- write/read;
            w_ptr_next <= w_ptr_succ;
            r_ptr_next <= r_ptr_succ;
      end case;
   end process;
   -- output
   full <= full_reg;
   empty <= empty_reg;
   notempty <= not empty_reg;
end arch;

------------------mod_m_counter.vhd

-- Listing 4.11
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity mod_m_counter is
   generic(
      N: integer := 4;     -- number of bits
      M: integer := 10     -- mod-M
  );
   port(
      clk,
      reset    : in std_logic;
      max_tick : out std_logic;
      q        : out std_logic_vector(N-1 downto 0)
   );
end mod_m_counter;

architecture arch of mod_m_counter is
   signal r_reg: unsigned(N-1 downto 0);
   signal r_next: unsigned(N-1 downto 0);
begin
   -- register
   process(clk,reset)
   begin
      if (reset='1') then
         r_reg <= (others=>'0');
      elsif (clk'event and clk='1') then
         r_reg <= r_next;
      end if;
   end process;
   -- next-state logic
   r_next <= (others=>'0') when r_reg=(M-1) else
             r_reg + 1;
   -- output logic
   q <= std_logic_vector(r_reg);
   max_tick   <= '1' when r_reg=(M-1) else '0';
end arch;


---------uart_tx40.vhd

-- Listing 7.3

-- JL 090309   changing hard coded '15' to (sb_tick-1) for length of
--             each bit. hard coded '7' for databits now (dbit-1) as
well.
-- JL 090312   custom version of uart_tx for the BAE comm link.

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity uart40_tx is
   generic(
      DBIT: integer:=8;     -- # data bits
      SB_TICK: integer:=16  -- # ticks for stop bits
   );
   port(
      clk, reset: in std_logic;
      tx_start: in std_logic;
      s_tick: in std_logic;
      din: in std_logic_vector((dbit-1) downto 0);
      tx_done_tick: out std_logic;
      tx: out std_logic
   );
end uart40_tx ;

architecture arch of uart40_tx is
   type state_type is (idle, start, data, stop);

   constant go_high : std_logic := '1';
   constant go_low  : std_logic := '0';

   signal state_reg, state_next: state_type;
   signal s_reg, s_next: unsigned(7 downto 0);
   signal n_reg, n_next: unsigned(7 downto 0);
   signal b_reg, b_next: std_logic_vector((dbit-1) downto 0);
   signal tx_reg, tx_next: std_logic;
   signal bit_length: std_logic := '0';     -- testbench watching
only.  use with din watch.
begin
   -- FSMD state & data registers
   process(clk,reset)
   begin
      if reset='1' then
         state_reg <= idle;
         s_reg <= (others=>'0');
         n_reg <= (others=>'0');
         b_reg <= (others=>'0');
         tx_reg <= go_high;
      elsif (clk'event and clk='1') then
         state_reg <= state_next;
         s_reg <= s_next;
         n_reg <= n_next;
         b_reg <= b_next;
         tx_reg <= tx_next;
      end if;
   end process;
   -- next-state logic & data path functional units/routing
   process(state_reg,s_reg,n_reg,b_reg,s_tick,
           tx_reg,tx_start,din)
   begin
      state_next <= state_reg;
      s_next <= s_reg;
      n_next <= n_reg;
      b_next <= b_reg;
      tx_next <= tx_reg ;
      tx_done_tick <= '0';
      case state_reg is
         when idle =>
            tx_next <= go_low;
            if tx_start='1' then
               state_next <= start;
               s_next <= (others=>'0');
               b_next <= din;
            end if;
         when start =>
            tx_next <= go_high;
            if (s_tick = '1') then
               if s_reg=(sb_tick-1) then
                  state_next <= data;
                  s_next <= (others=>'0');
                  n_next <= (others=>'0');
               else
                  s_next <= s_reg + 1;
               end if;
            end if;
         when data =>
            tx_next <= b_reg(0);
            if (s_tick = '1') then
               if s_reg=(sb_tick-1) then
                  bit_length <= not bit_length;  -- measure a bit.
                  s_next <= (others=>'0');
                  b_next <= '0' & b_reg((dbit-1) downto 1) ;
                  if n_reg=(DBIT-1) then
                     state_next <= idle; -- stop ;    --lets skip the
stop bit.
                     tx_done_tick <= '1';             -- moved in from
stop
                  else
                     n_next <= n_reg + 1;
                  end if;
               else
                  s_next <= s_reg + 1;
               end if;
            end if;
         when stop =>
            tx_next <= go_high;
            if (s_tick = '1') then
               if s_reg=(SB_TICK*4-1) then   -- lets make it stick out
for now.
                  state_next <= idle;
                  tx_done_tick <= '1';
               else
                  s_next <= s_reg + 1;
               end if;
            end if;
      end case;
   end process;
   tx <= tx_reg;
end arch;





---------------uart_rx40.vhd ----
-- Listing 7.1
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity uart40_rx is
   generic(
      DBIT: integer:=8;     -- # data bits
      SB_TICK: integer:=16  -- # ticks for stop bits
   );
   port(
      clk, reset: in std_logic;
      rx: in std_logic;
      s_tick: in std_logic;
      rx_done_tick: out std_logic;
      dout: out std_logic_vector((dbit-1) downto 0)
   );
end uart40_rx ;

architecture arch of uart40_rx is
   type state_type is (idle, start, data, stop);
   signal state_reg, state_next: state_type;
   signal s_reg, s_next: unsigned(3 downto 0);
   signal n_reg, n_next: unsigned(2 downto 0);
   signal b_reg, b_next: std_logic_vector((dbit-1) downto 0);
begin
   -- FSMD state & data registers
   process(clk,reset)
   begin
      if reset='1' then
         state_reg <= idle;
         s_reg <= (others=>'0');
         n_reg <= (others=>'0');
         b_reg <= (others=>'0');
      elsif (clk'event and clk='1') then
         state_reg <= state_next;
         s_reg <= s_next;
         n_reg <= n_next;
         b_reg <= b_next;
      end if;
   end process;
   -- next-state logic & data path functional units/routing
   process(state_reg,s_reg,n_reg,b_reg,s_tick,rx)
   begin
      state_next <= state_reg;
      s_next <= s_reg;
      n_next <= n_reg;
      b_next <= b_reg;
      rx_done_tick <='0';
      case state_reg is
         when idle =>
            if rx='0' then
               state_next <= start;
               s_next <= (others=>'0');
            end if;
         when start =>
            if (s_tick = '1') then
               if s_reg=(sb_tick/2 -1) then
                  state_next <= data;
                  s_next <= (others=>'0');
                  n_next <= (others=>'0');
               else
                  s_next <= s_reg + 1;
               end if;
            end if;
         when data =>
            if (s_tick = '1') then
               if s_reg=(sb_tick-1) then
                  s_next <= (others=>'0');
                  b_next <= rx & b_reg((dbit-1) downto 1) ;
                  if n_reg=(DBIT-1) then
                     state_next <= stop ;
                  else
                     n_next <= n_reg + 1;
                  end if;
               else
                  s_next <= s_reg + 1;
               end if;
            end if;
         when stop =>
            if (s_tick = '1') then
               if s_reg=(SB_TICK-1) then
                  state_next <= idle;
                  rx_done_tick <='1';
               else
                  s_next <= s_reg + 1;
               end if;
            end if;
      end case;
   end process;
   dout <= b_reg;
end arch;

Article: 139108
Subject: Re: How big is my vhdl and am I approaching some size limitation on
From: rickman <gnuarm@gmail.com>
Date: Fri, 20 Mar 2009 21:12:04 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 20, 3:15 pm, jleslie48 <j...@jonathanleslie.com> wrote:
> On Mar 20, 2:58 pm, jleslie48 <j...@jonathanleslie.com> wrote:
>
>
>
> > On Mar 20, 1:41 pm, Mike Treseler <mtrese...@gmail.com> wrote:
>
> > > jleslie48 wrote:
> > > > and when I added some digital outputs, the %'s went up, but then I
> > > > added a whole bunch of logic, and nothing changed,
>
> > > If the number of cells or Pterms used didn't change at all,
> > > I would expect that a "whole bunch" of logic does not make it
> > > out to a pin. I would run a sim to check.
>
> > >       -- Mike Treseler
>
> > ahhh,  well that is a bummer.  I just tied the output to a pin and now
> > I"m getting:
>
> > Fitting...
> > .
> > ERROR:Cpld:1063 - Design requires at least 947 macrocells, exceeds
> > device limit
> >    512.
> > ERROR:Cpld:1062 - Design contains 2004 unique product terms, exceeds
> > device
> >    limit 1536.
> > ERROR:Cpld:1064 - Design rules checking error. Fitting process
> > stopped.
> > ...o
> > ERROR:Cpld:868 - Cannot fit the design into any of the specified
> > devices with
> >    the selected implementation options.
>
> > any idea on how to make it fit?
>
> ok, before I added my functionality, I had:
> Macrocells Used         Pterms Used     Registers Used  Pins Used       Function
> Block Inputs Used
> 379/512  (75%)  831/1536  (55%)         354/512  (70%)  118/176  (68%)
> 779/1280  (61%)
>
> so from my errors, I can see I added some 600 macrocells, and 1200
> pterms,
>
> how can I find out who is the piggy, and what can I due to trim things
> down?

I don't think you really got an answer to this question.  To some
extent you can look at the code and estimate the number of macrocells
or other logic elements used.  But to measure it, you need to break
the code into modules and let the tool tell you about each module
separately.  In an FPGA the logic has a finer grain, so there are not
as much optimizations to affect these counts when you use the block
all together.  But a CPLD can put a lot of logic into each macrocell
and will be much more limited by the FF count.  Your design counts
above indicate that your design uses 1300 FFs and your CPLD only has
512 FFs.  Not a good fit!

You won't find much in the way of optimizations that will make this
fit.  The best thing to trim your logic is to change your algorithm.
If there are parts of your design that can run slowly compared to the
clock rate, you can let them run sequentially rather than in
parallel.  But if your design has to run at the full rate of the clock
with everything in parallel, you just need a larger part.  So take a
good, hard at your design and see if there is anything you can do to
reduce it.

> also, what is a macrocell and pterm?

I think these got answered, but a little more detail...  A macrocell
is the unit block of a CPLD.  It typically include one or two FFs, an
output, often to a pin along with some amount of logic.  The logic in
a macrocell is made of p-terms and OR gates.  P-terms are very wide
AND gates with inputs from all of the inputs to that block, all of the
FFs in that block as well as, in some devices, some inputs from other
macrocell p-terms.  The p-terms of a given macrocell are OR'd together
to produce the input to the FF or it can be routed directly to the
output.  There is also a p-term or two devoted to controlling the tri-
state driver on the output.  The OR gate and FF outputs are connected
back to the logic matrix for use in other or the same macrocells.
Some devices have "buried" FFs which allow some of the logic in the
macrocell to be split off and used with this second FF, but the output
can only be routed back to the routing matrix, not an output pin.

That is a lot to absorb from a description.  I am sure the data sheet
has a picture that is very clear and can portray the detail better.
The main thing to understand is that the p-term (and) are unlimited
(or more accurately only limited by the inputs to the block routing)
and an FPGA typically has much smaller LUTs, usually 1 LUT per FF or
sometimes 4 LUTs to 3 FFs.  So a CPLD is often FF count limited while
an FPGA is mostly LUT count limited.  Certainly there are things you
can change in your design to use more logic and fewer FF to target
CPLDs.  But I think it will be a major job to cut the design size by
more than half!

> I originally ran this program on a virtexII, and everthing looked
> liked it
> was pretty small and effecient:
>
> Device Utilization Summary
> [-]
> Logic Utilization
> Used
> Available
> Utilization
> Note(s)
> Number of Slice Flip Flops
> 1,282
> 27,392
> 4%
>
> Number of 4 input LUTs
> 1,545
> 27,392
> 5%
>
> Logic Distribution
>
> Number of occupied Slices
> 1,302
> 13,696
> 9%
>
>     Number of Slices containing only related logic
> 1,302
> 1,302
> 100%
>
>     Number of Slices containing unrelated logic
> 0
> 1,302
> 0%
>
> Total Number of 4 input LUTs
> 1,589
> 27,392
> 5%
>
>     Number used as logic
> 1,545
>
>     Number used as a route-thru
> 44
>
> Number of bonded IOBs
> Number of bonded
> 15
> 556
> 2%
>
>     IOB Flip Flops
> 1
>
> Number of RAMB16s
> 2
> 136
> 1%
>
> Number of BUFGMUXs
> 3
> 16
> 18%
>
> but I cant seem to compare these two chips, the VIRTEX II vs the
> XCR3512XL-12-PQ208
> its apples to oranges, how does it work?

The 3512 has 512 FFs in the macrocells.  (I think they also have input
FFs)  The FPGA is using some 1300 out of 27,000!  The FPGA is using
1500 LUTs for logic.  It does not look to me like that couldn't fit in
the logic of 512 macrocells.  But the number of FFs has to be
reduced.  Are they all necessary?

Rick

Article: 139109
Subject: Re: Re Zero operand CPUs
From: rickman <gnuarm@gmail.com>
Date: Fri, 20 Mar 2009 21:30:40 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 20, 7:39=A0pm, Jacko <jackokr...@gmail.com> wrote:
>
> > This would imply a 5bit wide word, which is obviously not
> > the case ?
>
> The implication of the extra bit 'needed' is not a true account of
> functioning.

Ah, but it is.  In your specific implementation, you have not only a
fifth bit, but also a sixth, seventh all the way up to 16th, no?  You
have a 16 bit instruction word and only 17 opcodes; 0 through 15 are
the ones you list, and 16 through 65535 is the LIT or CALL instruction
(I'm not sure which).

Article: 139110
Subject: Re: How big is my vhdl and am I approaching some size limitation on
From: jleslie48 <jon@jonathanleslie.com>
Date: Fri, 20 Mar 2009 22:14:38 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 21, 12:12 am, rickman <gnu...@gmail.com> wrote:
> On Mar 20, 3:15 pm, jleslie48 <j...@jonathanleslie.com> wrote:
>
>
>
> > On Mar 20, 2:58 pm, jleslie48 <j...@jonathanleslie.com> wrote:
>
> > > On Mar 20, 1:41 pm, Mike Treseler <mtrese...@gmail.com> wrote:
>
> > > > jleslie48 wrote:
> > > > > and when I added some digital outputs, the %'s went up, but then I
> > > > > added a whole bunch of logic, and nothing changed,
>
> > > > If the number of cells or Pterms used didn't change at all,
> > > > I would expect that a "whole bunch" of logic does not make it
> > > > out to a pin. I would run a sim to check.
>
> > > >       -- Mike Treseler
>
> > > ahhh,  well that is a bummer.  I just tied the output to a pin and now
> > > I"m getting:
>
> > > Fitting...
> > > .
> > > ERROR:Cpld:1063 - Design requires at least 947 macrocells, exceeds
> > > device limit
> > >    512.
> > > ERROR:Cpld:1062 - Design contains 2004 unique product terms, exceeds
> > > device
> > >    limit 1536.
> > > ERROR:Cpld:1064 - Design rules checking error. Fitting process
> > > stopped.
> > > ...o
> > > ERROR:Cpld:868 - Cannot fit the design into any of the specified
> > > devices with
> > >    the selected implementation options.
>
> > > any idea on how to make it fit?
>
> > ok, before I added my functionality, I had:
> > Macrocells Used         Pterms Used     Registers Used  Pins Used       Function
> > Block Inputs Used
> > 379/512  (75%)  831/1536  (55%)         354/512  (70%)  118/176  (68%)
> > 779/1280  (61%)
>
> > so from my errors, I can see I added some 600 macrocells, and 1200
> > pterms,
>
> > how can I find out who is the piggy, and what can I due to trim things
> > down?
>
> I don't think you really got an answer to this question.  To some
> extent you can look at the code and estimate the number of macrocells
> or other logic elements used.  But to measure it, you need to break
> the code into modules and let the tool tell you about each module
> separately.  In an FPGA the logic has a finer grain, so there are not
> as much optimizations to affect these counts when you use the block
> all together.  But a CPLD can put a lot of logic into each macrocell
> and will be much more limited by the FF count.  Your design counts
> above indicate that your design uses 1300 FFs and your CPLD only has
> 512 FFs.  Not a good fit!
>
> You won't find much in the way of optimizations that will make this
> fit.  The best thing to trim your logic is to change your algorithm.
> If there are parts of your design that can run slowly compared to the
> clock rate, you can let them run sequentially rather than in
> parallel.  But if your design has to run at the full rate of the clock
> with everything in parallel, you just need a larger part.  So take a
> good, hard at your design and see if there is anything you can do to
> reduce it.
>
> > also, what is a macrocell and pterm?
>
> I think these got answered, but a little more detail...  A macrocell
> is the unit block of a CPLD.  It typically include one or two FFs, an
> output, often to a pin along with some amount of logic.  The logic in
> a macrocell is made of p-terms and OR gates.  P-terms are very wide
> AND gates with inputs from all of the inputs to that block, all of the
> FFs in that block as well as, in some devices, some inputs from other
> macrocell p-terms.  The p-terms of a given macrocell are OR'd together
> to produce the input to the FF or it can be routed directly to the
> output.  There is also a p-term or two devoted to controlling the tri-
> state driver on the output.  The OR gate and FF outputs are connected
> back to the logic matrix for use in other or the same macrocells.
> Some devices have "buried" FFs which allow some of the logic in the
> macrocell to be split off and used with this second FF, but the output
> can only be routed back to the routing matrix, not an output pin.
>
> That is a lot to absorb from a description.  I am sure the data sheet
> has a picture that is very clear and can portray the detail better.
> The main thing to understand is that the p-term (and) are unlimited
> (or more accurately only limited by the inputs to the block routing)
> and an FPGA typically has much smaller LUTs, usually 1 LUT per FF or
> sometimes 4 LUTs to 3 FFs.  So a CPLD is often FF count limited while
> an FPGA is mostly LUT count limited.  Certainly there are things you
> can change in your design to use more logic and fewer FF to target
> CPLDs.  But I think it will be a major job to cut the design size by
> more than half!
>
>
>
> > I originally ran this program on a virtexII, and everthing looked
> > liked it
> > was pretty small and effecient:
>
> > Device Utilization Summary
> > [-]
> > Logic Utilization
> > Used
> > Available
> > Utilization
> > Note(s)
> > Number of Slice Flip Flops
> > 1,282
> > 27,392
> > 4%
>
> > Number of 4 input LUTs
> > 1,545
> > 27,392
> > 5%
>
> > Logic Distribution
>
> > Number of occupied Slices
> > 1,302
> > 13,696
> > 9%
>
> >     Number of Slices containing only related logic
> > 1,302
> > 1,302
> > 100%
>
> >     Number of Slices containing unrelated logic
> > 0
> > 1,302
> > 0%
>
> > Total Number of 4 input LUTs
> > 1,589
> > 27,392
> > 5%
>
> >     Number used as logic
> > 1,545
>
> >     Number used as a route-thru
> > 44
>
> > Number of bonded IOBs
> > Number of bonded
> > 15
> > 556
> > 2%
>
> >     IOB Flip Flops
> > 1
>
> > Number of RAMB16s
> > 2
> > 136
> > 1%
>
> > Number of BUFGMUXs
> > 3
> > 16
> > 18%
>
> > but I cant seem to compare these two chips, the VIRTEX II vs the
> > XCR3512XL-12-PQ208
> > its apples to oranges, how does it work?
>
> The 3512 has 512 FFs in the macrocells.  (I think they also have input
> FFs)  The FPGA is using some 1300 out of 27,000!  The FPGA is using
> 1500 LUTs for logic.  It does not look to me like that couldn't fit in
> the logic of 512 macrocells.  But the number of FFs has to be
> reduced.  Are they all necessary?
>
> Rick


"But I think it will be a major job to cut the design size by more
than half!"

Well this is what has me scratching my head,  I only added one uart to
the 3512,  the listing from the Virtex II has two
separate UARTS,  to make up the 1300 slice flip flops.    I've only
moved one of the uarts to the 3512 so far and it blew its top.  I
can't see how one uart can take up the entire chip, Or that the
difference between the $90 3512 and the
$1200 Virtex II Pro?


I'm not sure of what you are getting out with me reducing the number
of FF's,  I'm just getting the hand of VHDL but I'm not aware of what
code makes up the FF's,   I inlcuded the code I put in up above, It
seems very straight-forward, state machine,

"The FPGA is using some 1300 out of 27,000!  "  I'm assuming you mean
the 1282/27,392 number.   What I'm guessing is that in order for this
design I have to get these 1282 to fit into the the 512 macrocells of
the 3512 but I can only put 1 in each macrocell, aka I've got to get
down to under 512 slice FF.  Thats not counting the problem I'm having
with the pterms,

Article: 139111
Subject: Re: Re Zero operand CPUs
From: Jacko <jackokring@gmail.com>
Date: Fri, 20 Mar 2009 23:38:29 -0700 (PDT)
Links: << >> << T >> << A >>

On 21 Mar, 04:30, rickman <gnu...@gmail.com> wrote:
> On Mar 20, 7:39=A0pm, Jacko <jackokr...@gmail.com> wrote:
>
>
>
> > > This would imply a 5bit wide word, which is obviously not
> > > the case ?
>
> > The implication of the extra bit 'needed' is not a true account of
> > functioning.

That would be like saying you need the extra )s on the front of
numbers when you do arithmetic on paper.

> Ah, but it is. =A0In your specific implementation, you have not only a
> fifth bit, but also a sixth, seventh all the way up to 16th, no? =A0You
> have a 16 bit instruction word and only 17 opcodes; 0 through 15 are
> the ones you list, and 16 through 65535 is the LIT or CALL instruction
> (I'm not sure which).

(Depends on the subroutine start address) all subroutines are calls,
so they are all calls, just one is LIT.

Yes, you will find primitives use codes 0-15 and colon definitions use
0-65535. If you are crazy enough to have a massive primitive set, or
to implement such a set in full width memory, then you would be right.
On the 12 bit version you could use 16 bit memory, and have the high 4
as the primitive part of the address space.

As stated on the website (somewhare) this processor is not designed
for running monolith inlined code, and pay in space and cache slowdown
such things will, say yoda.

So in the example I gave for the store, it's likely the last line of
simple instructions would be a subroutine named store or +1!

You will find a large amount of primitive code can be optimized into a
small logic area, especially if the address space over which these
subroutines is spread is sparse to allow combinational alignment of
product terms and boolean logic reduction.

To just generalize this code as something to slot into the threading
is missing the point that this is an ocassional feature, not a best
practice.

cheers jacko

"speak unto my mobile I will, sometime it may be a programming tool."

Article: 139112
Subject: Re: How big is my vhdl and am I approaching some size limitation on the chip.
From: Jonathan Bromley <jonathan.bromley@MYCOMPANY.com>
Date: Sat, 21 Mar 2009 09:24:20 +0000
Links: << >> << T >> << A >>

On Fri, 20 Mar 2009 17:56:39 -0700 (PDT), jleslie48 wrote:

>re-building all the while and checking for growth, only to
>be blind sided when I hook up the pin to the generated signal.

So you need to identify each functional block in your
design, and synthesise it - on its own - in an FPGA
with every input and output of the block hooked to 
a pin (the synth tool will automatically put pads
on the ports of your top-level VHDL entity, so 
that's no effort).  That way you can quickly get
a feel for the size of each block.  If you synthesise
with some pins not connected, the tool will surely
strip away loads of unused logic and you will get
an over-optimistic size estimate.

Interconnect between blocks costs propagation delay, 
but only rather a little logic, so that's OK.

Your reverse() function is pretty much free - it's 
just interconnect.
-- 
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which 
are not the views of Doulos Ltd., unless specifically stated.

Article: 139113
Subject: plb_emc with flash and datawidth matching
From: "Kristian Klaus" <kristian.klaus@gmx.de>
Date: Sat, 21 Mar 2009 11:22:37 +0100
Links: << >> << T >> << A >>

Hello,

I am trying to connect a 16 bit Intel Strataflash to the 
xps_mch_emc_v2_00_a. My problem is, that the flashwriter.tcl application 
stops after some percents (13% or later) and never comes to 100%.

In the microblaze-uclinux archive, I found two old posts:

http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/archive/2005/07/msg00046.html
and
http://osdir.com/ml/linux.uclinux.microblaze/2004-02/msg00046.html

Can anybody confirm, that this also works with the new xps_mch_emc?

--
Kristian

Article: 139114
Subject: Re: plb_emc with flash and datawidth matching
From: "Kristian Klaus" <kristian.klaus@gmx.de>
Date: Sat, 21 Mar 2009 11:51:19 +0100
Links: << >> << T >> << A >>

I forgot to say, that I want the DATA_WIDTH_MATCHING option set to 1 because 
I want the emc to be fully transparent to the operating system (32 bit 
writes).

Kristian


"Kristian Klaus" <kristian.klaus@gmx.de> schrieb im Newsbeitrag 
news:gq2f5g$eph$1@hahn.informatik.hu-berlin.de...
> Hello,
>
> I am trying to connect a 16 bit Intel Strataflash to the 
> xps_mch_emc_v2_00_a. My problem is, that the flashwriter.tcl application 
> stops after some percents (13% or later) and never comes to 100%.
>
> In the microblaze-uclinux archive, I found two old posts:
>
> http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/archive/2005/07/msg00046.html
> and
> http://osdir.com/ml/linux.uclinux.microblaze/2004-02/msg00046.html
>
> Can anybody confirm, that this also works with the new xps_mch_emc?
>
> --
> Kristian
>
>
>

Article: 139115
Subject: Re: How big is my vhdl and am I approaching some size limitation on the chip.
From: Brian Drummond <brian_drummond@btconnect.com>
Date: Sat, 21 Mar 2009 12:42:22 +0000
Links: << >> << T >> << A >>

On Fri, 20 Mar 2009 17:56:39 -0700 (PDT), jleslie48 <jon@jonathanleslie.com>
wrote:

>Well this is a bummer.  Here I think I'm being careful, working things
>out with
>test bench, re-building all the while and checking for growth, only to
>be blind sided
>when I hook up the pin to the generated signal.
>
>Meantime I've got some more info and questions.
>
>1) >> any idea on how to make it fit?
>
>If it has to be that device, I would need two of them.
>
>that chip we are getting for around $100, I don't even know where to
>buy them, and where do I get a Virtex II-PRO chip? digikey says they
>are $1000??  Wouldn't I be better off getting the Virtex II-PRO?

Spartan-3 gives a sizeable resource for well under $100. (V2Pro with the same
capacity would be $500 up - ballpark numbers) 
em.avnet.com lists the XC3S1500 for $74 (1 off) $55 (100 off) and you can
upgrade to larger versioss if necessary.

http://www.enterpoint.co.uk/moelbryn/raggedstone1.html
Here's one example of a complete board with XC3S1500 for about $250.

>Meantime the old chip is
>mounted on a custom board layout, I guess my hardware guys are going
>to have to re-lay out the board with
>two of these chips?  

Consider a Spartan-3 layout for room to grow.

>3) "Rerun synthesis and check the % utilization "
>that's what I've been doing.  basically I added the equivalent of soft
>uart and the data generator state machine that Jonathan so kindly set
>me up with.  So I started backing out that code bit by bit to see
>where I pop the %s.  

Not the best way - as you discovered.

Find the "do not allocate I/O pin" synthesis option and synth each major
subsystem as a separate project. Crosscheck that a simple sum of the results is
approx (say within 10%) of the overall size. Note any major surprises...

This option is used to build separate re-usable modules (black boxes) so it is
not allowed to optimize away anything not connected to a pin; therefore it is a
better way to determine resource usage.

(Just another way to achieve what Jonathan advised, but without the labour of
adding the pins)

- Brian

Article: 139116
Subject: DVI in FPGA
From: Mawafugo <ccon67@netscape.net>
Date: Sat, 21 Mar 2009 06:47:40 -0700 (PDT)
Links: << >> << T >> << A >>

In xapp460 the DVI/HDMI transmitter & receiver is implemented but the
max throughput limit to somewhat 750 Mb/s, which can handle up to
1080i or 720p resolution.  The 1080p, however needs twice of that

The question is how can we crank up the throughput to about 1.5 Gb/s ?

Article: 139117
Subject: Re: Re Zero operand CPUs
From: rickman <gnuarm@gmail.com>
Date: Sat, 21 Mar 2009 07:13:36 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 21, 2:38=A0am, Jacko <jackokr...@gmail.com> wrote:
> On 21 Mar, 04:30, rickman <gnu...@gmail.com> wrote:
>
> > On Mar 20, 7:39=A0pm, Jacko <jackokr...@gmail.com> wrote:
>
> > > > This would imply a 5bit wide word, which is obviously not
> > > > the case ?
>
> > > The implication of the extra bit 'needed' is not a true account of
> > > functioning.
>
> That would be like saying you need the extra )s on the front of
> numbers when you do arithmetic on paper.

Extra what???  Actually, you are quoting your own comment.

> > Ah, but it is. =A0In your specific implementation, you have not only a
> > fifth bit, but also a sixth, seventh all the way up to 16th, no? =A0You
> > have a 16 bit instruction word and only 17 opcodes; 0 through 15 are
> > the ones you list, and 16 through 65535 is the LIT or CALL instruction
> > (I'm not sure which).
>
> (Depends on the subroutine start address) all subroutines are calls,
> so they are all calls, just one is LIT.

I have no idea what you are talking about.  How does this instruction
set specify literals?

Your obfuscation is getting to be annoying.  You never explain what
you mean, you speak in crypto language and you seem intent on never
really explaining the principles of your design.  Even your assembly
language is some new symbolism that just serves to isolate what you
are doing and thinking rather than to be at all useful for
communication.

None of the rest of this is at all useful.  You are presuming that I
am making some sort of statement or that I am looking at your
processor from a very different point of view.  I am doing neither.  I
am trying to understand your processor from the point of view of a
small, embeddable CPU for use in an FPGA and in particular, to be
programmable in Forth.  That is the target of my CPU.  I am hoping to
learn something about these processors that I don't know or that I
haven't thought to try.  What I am learning about this design is that
it seems to have been designed without regard to a lot of knowledge
available, not that I will ever know for sure because it will never
really be explained.

Have you read Koopman's book on stack CPUs?  He covers a lot of ground
with that.

> Yes, you will find primitives use codes 0-15 and colon definitions use
> 0-65535. If you are crazy enough to have a massive primitive set, or
> to implement such a set in full width memory, then you would be right.
> On the 12 bit version you could use 16 bit memory, and have the high 4
> as the primitive part of the address space.
>
> As stated on the website (somewhare) this processor is not designed
> for running monolith inlined code, and pay in space and cache slowdown
> such things will, say yoda.
>
> So in the example I gave for the store, it's likely the last line of
> simple instructions would be a subroutine named store or +1!
>
> You will find a large amount of primitive code can be optimized into a
> small logic area, especially if the address space over which these
> subroutines is spread is sparse to allow combinational alignment of
> product terms and boolean logic reduction.
>
> To just generalize this code as something to slot into the threading
> is missing the point that this is an ocassional feature, not a best
> practice.

Rick

Article: 139118
Subject: Re: How big is my vhdl and am I approaching some size limitation on
From: John Adair <g1@enterpoint.co.uk>
Date: Sat, 21 Mar 2009 07:14:53 -0700 (PDT)
Links: << >> << T >> << A >>

CPLDs are generally very small devices compared to a FPGAs. They are
generally slightly easier to use for the novice but I won't let that
put you off going for FPGA. Virtex-IIPro is a very old and expensive
familiy now. Xilinx offers 2 sets of families. The Virtex range is
big, very fast and expensive. Virtex-5 is readily available with
Virtex-6 just announced. The Spartan families go from small to medium
size in comparision. Coolrunner etc. I would describe as tiny to give
a reference.

If you have the ability to choose a part now then the Spartan-3A or
Spartan-3AN are probably a good choice. The S3-A needs an external
Flash memory that is used to configure the device at power up. The S3-
AN has an internal Flash that is used for that purpose.

The smallest S3-AN is the XC3S50AN and it has about 1400 flip-flops as
a comparision to the Coolrunner with 512 macrocells which have 512
flip-flops available. It is very difficult to make a simple
comparision between CPLD and FPGA technologies but I would suggest
just trail building the design in a XC3S50AN to get a better
comparision. ISE Webpack I presume you already have and it will only
take a few minutes to change the part type and re-build.

If you do want a development board we supply lots of choice with some
more shortly in this market sector soon. You may find some of the
links on our Techitips page useful - http://www.enterpoint.co.uk/techitips/=
techitips.html.

John Adair
Enterpoint Ltd.


On 21 Mar, 05:14, jleslie48 <j...@jonathanleslie.com> wrote:
> On Mar 21, 12:12 am, rickman <gnu...@gmail.com> wrote:
>
>
>
>
>
> > On Mar 20, 3:15 pm, jleslie48 <j...@jonathanleslie.com> wrote:
>
> > > On Mar 20, 2:58 pm, jleslie48 <j...@jonathanleslie.com> wrote:
>
> > > > On Mar 20, 1:41 pm, Mike Treseler <mtrese...@gmail.com> wrote:
>
> > > > > jleslie48 wrote:
> > > > > > and when I added some digital outputs, the %'s went up, but the=
n I
> > > > > > added a whole bunch of logic, and nothing changed,
>
> > > > > If the number of cells or Pterms used didn't change at all,
> > > > > I would expect that a "whole bunch" of logic does not make it
> > > > > out to a pin. I would run a sim to check.
>
> > > > > =A0 =A0 =A0 -- Mike Treseler
>
> > > > ahhh, =A0well that is a bummer. =A0I just tied the output to a pin =
and now
> > > > I"m getting:
>
> > > > Fitting...
> > > > .
> > > > ERROR:Cpld:1063 - Design requires at least 947 macrocells, exceeds
> > > > device limit
> > > > =A0 =A0512.
> > > > ERROR:Cpld:1062 - Design contains 2004 unique product terms, exceed=
s
> > > > device
> > > > =A0 =A0limit 1536.
> > > > ERROR:Cpld:1064 - Design rules checking error. Fitting process
> > > > stopped.
> > > > ...o
> > > > ERROR:Cpld:868 - Cannot fit the design into any of the specified
> > > > devices with
> > > > =A0 =A0the selected implementation options.
>
> > > > any idea on how to make it fit?
>
> > > ok, before I added my functionality, I had:
> > > Macrocells Used =A0 =A0 =A0 =A0 Pterms Used =A0 =A0 Registers Used =
=A0Pins Used =A0 =A0 =A0 Function
> > > Block Inputs Used
> > > 379/512 =A0(75%) =A0831/1536 =A0(55%) =A0 =A0 =A0 =A0 354/512 =A0(70%=
) =A0118/176 =A0(68%)
> > > 779/1280 =A0(61%)
>
> > > so from my errors, I can see I added some 600 macrocells, and 1200
> > > pterms,
>
> > > how can I find out who is the piggy, and what can I due to trim thing=
s
> > > down?
>
> > I don't think you really got an answer to this question. =A0To some
> > extent you can look at the code and estimate the number of macrocells
> > or other logic elements used. =A0But to measure it, you need to break
> > the code into modules and let the tool tell you about each module
> > separately. =A0In an FPGA the logic has a finer grain, so there are not
> > as much optimizations to affect these counts when you use the block
> > all together. =A0But a CPLD can put a lot of logic into each macrocell
> > and will be much more limited by the FF count. =A0Your design counts
> > above indicate that your design uses 1300 FFs and your CPLD only has
> > 512 FFs. =A0Not a good fit!
>
> > You won't find much in the way of optimizations that will make this
> > fit. =A0The best thing to trim your logic is to change your algorithm.
> > If there are parts of your design that can run slowly compared to the
> > clock rate, you can let them run sequentially rather than in
> > parallel. =A0But if your design has to run at the full rate of the cloc=
k
> > with everything in parallel, you just need a larger part. =A0So take a
> > good, hard at your design and see if there is anything you can do to
> > reduce it.
>
> > > also, what is a macrocell and pterm?
>
> > I think these got answered, but a little more detail... =A0A macrocell
> > is the unit block of a CPLD. =A0It typically include one or two FFs, an
> > output, often to a pin along with some amount of logic. =A0The logic in
> > a macrocell is made of p-terms and OR gates. =A0P-terms are very wide
> > AND gates with inputs from all of the inputs to that block, all of the
> > FFs in that block as well as, in some devices, some inputs from other
> > macrocell p-terms. =A0The p-terms of a given macrocell are OR'd togethe=
r
> > to produce the input to the FF or it can be routed directly to the
> > output. =A0There is also a p-term or two devoted to controlling the tri=
-
> > state driver on the output. =A0The OR gate and FF outputs are connected
> > back to the logic matrix for use in other or the same macrocells.
> > Some devices have "buried" FFs which allow some of the logic in the
> > macrocell to be split off and used with this second FF, but the output
> > can only be routed back to the routing matrix, not an output pin.
>
> > That is a lot to absorb from a description. =A0I am sure the data sheet
> > has a picture that is very clear and can portray the detail better.
> > The main thing to understand is that the p-term (and) are unlimited
> > (or more accurately only limited by the inputs to the block routing)
> > and an FPGA typically has much smaller LUTs, usually 1 LUT per FF or
> > sometimes 4 LUTs to 3 FFs. =A0So a CPLD is often FF count limited while
> > an FPGA is mostly LUT count limited. =A0Certainly there are things you
> > can change in your design to use more logic and fewer FF to target
> > CPLDs. =A0But I think it will be a major job to cut the design size by
> > more than half!
>
> > > I originally ran this program on a virtexII, and everthing looked
> > > liked it
> > > was pretty small and effecient:
>
> > > Device Utilization Summary
> > > [-]
> > > Logic Utilization
> > > Used
> > > Available
> > > Utilization
> > > Note(s)
> > > Number of Slice Flip Flops
> > > 1,282
> > > 27,392
> > > 4%
>
> > > Number of 4 input LUTs
> > > 1,545
> > > 27,392
> > > 5%
>
> > > Logic Distribution
>
> > > Number of occupied Slices
> > > 1,302
> > > 13,696
> > > 9%
>
> > > =A0 =A0 Number of Slices containing only related logic
> > > 1,302
> > > 1,302
> > > 100%
>
> > > =A0 =A0 Number of Slices containing unrelated logic
> > > 0
> > > 1,302
> > > 0%
>
> > > Total Number of 4 input LUTs
> > > 1,589
> > > 27,392
> > > 5%
>
> > > =A0 =A0 Number used as logic
> > > 1,545
>
> > > =A0 =A0 Number used as a route-thru
> > > 44
>
> > > Number of bonded IOBs
> > > Number of bonded
> > > 15
> > > 556
> > > 2%
>
> > > =A0 =A0 IOB Flip Flops
> > > 1
>
> > > Number of RAMB16s
> > > 2
> > > 136
> > > 1%
>
> > > Number of BUFGMUXs
> > > 3
> > > 16
> > > 18%
>
> > > but I cant seem to compare these two chips, the VIRTEX II vs the
> > > XCR3512XL-12-PQ208
> > > its apples to oranges, how does it work?
>
> > The 3512 has 512 FFs in the macrocells. =A0(I think they also have inpu=
t
> > FFs) =A0The FPGA is using some 1300 out of 27,000! =A0The FPGA is using
> > 1500 LUTs for logic. =A0It does not look to me like that couldn't fit i=
n
> > the logic of 512 macrocells. =A0But the number of FFs has to be
> > reduced. =A0Are they all necessary?
>
> > Rick
>
> "But I think it will be a major job to cut the design size by more
> than half!"
>
> Well this is what has me scratching my head, =A0I only added one uart to
> the 3512, =A0the listing from the Virtex II has two
> separate UARTS, =A0to make up the 1300 slice flip flops. =A0 =A0I've only
> moved one of the uarts to the 3512 so far and it blew its top. =A0I
> can't see how one uart can take up the entire chip, Or that the
> difference between the $90 3512 and the
> $1200 Virtex II Pro?
>
> I'm not sure of what you are getting out with me reducing the number
> of FF's, =A0I'm just getting the hand of VHDL but I'm not aware of what
> code makes up the FF's, =A0 I inlcuded the code I put in up above, It
> seems very straight-forward, state machine,
>
> "The FPGA is using some 1300 out of 27,000! =A0" =A0I'm assuming you mean
> the 1282/27,392 number. =A0 What I'm guessing is that in order for this
> design I have to get these 1282 to fit into the the 512 macrocells of
> the 3512 but I can only put 1 in each macrocell, aka I've got to get
> down to under 512 slice FF. =A0Thats not counting the problem I'm having
> with the pterms,- Hide quoted text -
>
> - Show quoted text -

Article: 139119
Subject: Re: How big is my vhdl and am I approaching some size limitation on
From: rickman <gnuarm@gmail.com>
Date: Sat, 21 Mar 2009 07:47:30 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 21, 1:14 am, jleslie48 <j...@jonathanleslie.com> wrote:
> On Mar 21, 12:12 am, rickman <gnu...@gmail.com> wrote:
>
>
>
> > On Mar 20, 3:15 pm, jleslie48 <j...@jonathanleslie.com> wrote:
>
> > > On Mar 20, 2:58 pm, jleslie48 <j...@jonathanleslie.com> wrote:
>
> > > > On Mar 20, 1:41 pm, Mike Treseler <mtrese...@gmail.com> wrote:
>
> > > > > jleslie48 wrote:
> > > > > > and when I added some digital outputs, the %'s went up, but then I
> > > > > > added a whole bunch of logic, and nothing changed,
>
> > > > > If the number of cells or Pterms used didn't change at all,
> > > > > I would expect that a "whole bunch" of logic does not make it
> > > > > out to a pin. I would run a sim to check.
>
> > > > >       -- Mike Treseler
>
> > > > ahhh,  well that is a bummer.  I just tied the output to a pin and now
> > > > I"m getting:
>
> > > > Fitting...
> > > > .
> > > > ERROR:Cpld:1063 - Design requires at least 947 macrocells, exceeds
> > > > device limit
> > > >    512.
> > > > ERROR:Cpld:1062 - Design contains 2004 unique product terms, exceeds
> > > > device
> > > >    limit 1536.
> > > > ERROR:Cpld:1064 - Design rules checking error. Fitting process
> > > > stopped.
> > > > ...o
> > > > ERROR:Cpld:868 - Cannot fit the design into any of the specified
> > > > devices with
> > > >    the selected implementation options.
>
> > > > any idea on how to make it fit?
>
> > > ok, before I added my functionality, I had:
> > > Macrocells Used         Pterms Used     Registers Used  Pins Used       Function
> > > Block Inputs Used
> > > 379/512  (75%)  831/1536  (55%)         354/512  (70%)  118/176  (68%)
> > > 779/1280  (61%)
>
> > > so from my errors, I can see I added some 600 macrocells, and 1200
> > > pterms,
>
> > > how can I find out who is the piggy, and what can I due to trim things
> > > down?
>
> > I don't think you really got an answer to this question.  To some
> > extent you can look at the code and estimate the number of macrocells
> > or other logic elements used.  But to measure it, you need to break
> > the code into modules and let the tool tell you about each module
> > separately.  In an FPGA the logic has a finer grain, so there are not
> > as much optimizations to affect these counts when you use the block
> > all together.  But a CPLD can put a lot of logic into each macrocell
> > and will be much more limited by the FF count.  Your design counts
> > above indicate that your design uses 1300 FFs and your CPLD only has
> > 512 FFs.  Not a good fit!
>
> > You won't find much in the way of optimizations that will make this
> > fit.  The best thing to trim your logic is to change your algorithm.
> > If there are parts of your design that can run slowly compared to the
> > clock rate, you can let them run sequentially rather than in
> > parallel.  But if your design has to run at the full rate of the clock
> > with everything in parallel, you just need a larger part.  So take a
> > good, hard at your design and see if there is anything you can do to
> > reduce it.
>
> > > also, what is a macrocell and pterm?
>
> > I think these got answered, but a little more detail...  A macrocell
> > is the unit block of a CPLD.  It typically include one or two FFs, an
> > output, often to a pin along with some amount of logic.  The logic in
> > a macrocell is made of p-terms and OR gates.  P-terms are very wide
> > AND gates with inputs from all of the inputs to that block, all of the
> > FFs in that block as well as, in some devices, some inputs from other
> > macrocell p-terms.  The p-terms of a given macrocell are OR'd together
> > to produce the input to the FF or it can be routed directly to the
> > output.  There is also a p-term or two devoted to controlling the tri-
> > state driver on the output.  The OR gate and FF outputs are connected
> > back to the logic matrix for use in other or the same macrocells.
> > Some devices have "buried" FFs which allow some of the logic in the
> > macrocell to be split off and used with this second FF, but the output
> > can only be routed back to the routing matrix, not an output pin.
>
> > That is a lot to absorb from a description.  I am sure the data sheet
> > has a picture that is very clear and can portray the detail better.
> > The main thing to understand is that the p-term (and) are unlimited
> > (or more accurately only limited by the inputs to the block routing)
> > and an FPGA typically has much smaller LUTs, usually 1 LUT per FF or
> > sometimes 4 LUTs to 3 FFs.  So a CPLD is often FF count limited while
> > an FPGA is mostly LUT count limited.  Certainly there are things you
> > can change in your design to use more logic and fewer FF to target
> > CPLDs.  But I think it will be a major job to cut the design size by
> > more than half!
>
> > > I originally ran this program on a virtexII, and everthing looked
> > > liked it
> > > was pretty small and effecient:
>
> > > Device Utilization Summary
> > > [-]
> > > Logic Utilization
> > > Used
> > > Available
> > > Utilization
> > > Note(s)
> > > Number of Slice Flip Flops
> > > 1,282
> > > 27,392
> > > 4%
>
> > > Number of 4 input LUTs
> > > 1,545
> > > 27,392
> > > 5%
>
> > > Logic Distribution
>
> > > Number of occupied Slices
> > > 1,302
> > > 13,696
> > > 9%
>
> > >     Number of Slices containing only related logic
> > > 1,302
> > > 1,302
> > > 100%
>
> > >     Number of Slices containing unrelated logic
> > > 0
> > > 1,302
> > > 0%
>
> > > Total Number of 4 input LUTs
> > > 1,589
> > > 27,392
> > > 5%
>
> > >     Number used as logic
> > > 1,545
>
> > >     Number used as a route-thru
> > > 44
>
> > > Number of bonded IOBs
> > > Number of bonded
> > > 15
> > > 556
> > > 2%
>
> > >     IOB Flip Flops
> > > 1
>
> > > Number of RAMB16s
> > > 2
> > > 136
> > > 1%
>
> > > Number of BUFGMUXs
> > > 3
> > > 16
> > > 18%
>
> > > but I cant seem to compare these two chips, the VIRTEX II vs the
> > > XCR3512XL-12-PQ208
> > > its apples to oranges, how does it work?
>
> > The 3512 has 512 FFs in the macrocells.  (I think they also have input
> > FFs)  The FPGA is using some 1300 out of 27,000!  The FPGA is using
> > 1500 LUTs for logic.  It does not look to me like that couldn't fit in
> > the logic of 512 macrocells.  But the number of FFs has to be
> > reduced.  Are they all necessary?
>
> > Rick
>
> "But I think it will be a major job to cut the design size by more
> than half!"
>
> Well this is what has me scratching my head,  I only added one uart to
> the 3512,  the listing from the Virtex II has two
> separate UARTS,  to make up the 1300 slice flip flops.    I've only
> moved one of the uarts to the 3512 so far and it blew its top.  I
> can't see how one uart can take up the entire chip, Or that the
> difference between the $90 3512 and the
> $1200 Virtex II Pro?

Somewhere we did not communicate.  The VIIP part has some 27,000 FFs.
Yes, that was 27 *thousand* FFs.  The 3512 has 512 FFs for logic.  So
there is no way that you can expect the CPLD to hold anywhere near the
same number of UARTs as the FPGA you are using.  The two UARTs in the
VIIP are using 1300 FFs.  Divide that by two (assuming they don't
share any logic like the baud rate generator) and you get 650 FFs per
UART.  Will that fit into 512 FFs in the CPLD?  It is very likely that
the UART you are using is very much more complex than you really
need.  I expect you could fit some 10 or more UARTs into this CPLD if
they are streamlined a bit.  A UART is nothing but a pair of shift
registers with some control logic and should fit into a couple of
dozen FFs if coded minimally.  To do that requires that you understand
how to design hardware so that you know what you want from the HDL
code and then to code the HDL to produce that hardware.


> I'm not sure of what you are getting out with me reducing the number
> of FF's,  I'm just getting the hand of VHDL but I'm not aware of what
> code makes up the FF's,   I inlcuded the code I put in up above, It
> seems very straight-forward, state machine,
>
> "The FPGA is using some 1300 out of 27,000!  "  I'm assuming you mean
> the 1282/27,392 number.   What I'm guessing is that in order for this
> design I have to get these 1282 to fit into the the 512 macrocells of
> the 3512 but I can only put 1 in each macrocell, aka I've got to get
> down to under 512 slice FF.  Thats not counting the problem I'm having
> with the pterms,

Yes, that is what you need to do.  I'm not sure what the p-term count
is a problem.

> > > > ERROR:Cpld:1062 - Design contains 2004 unique product terms, exceeds
> > > > device
> > > >    limit 1536.

If you get your FF count down, the p-term count will also likely
decrease as well.  But I'm not clear on why there are so few p-terms
in this device.  A typical CPLD will have macrocells with a range of p-
terms per macrocell of 4 to 12 or more.  I would have to look at the
data sheet of the 3512 to see how they are organized.

Ahhh... I found your culprit.

architecture arch of fifo is
   type reg_file_type is array (2**W-1 downto 0) of
        std_logic_vector(B-1 downto 0);
   signal array_reg: reg_file_type;

This FIFO is implemented using memory resources in the FPGA.  In the
CPLD there are no memory resouces... at least in Xilinx CPLDs.  Other
brands have memory.  With 8 bits and 16 words, each FIFO uses 128
FFs.  A UART has two FIFOs using 256 FFs.  That's half the CPLD right
there!

If you want a UART in the CPLD you need to take out the FIFOs.  If you
are still running the same code for the Hello World program, you don't
need the FIFOs anyway.  Instead of letting the data generator push
chars into the FIFO, let the data generator be throttled by the UART
handshake directly.

The UART clearly has other complexities that is eating up FFs.  You
need to find or code a simpler UART to suit your requirements.  Think
of the CPLD as an MCU with only 2 kB of program space.  You wouldn't
pull the UART driver out of Linux and try to use it in that device
would you?  In essence, that is what you are doing.

Rick

Article: 139120
Subject: Re: R/A FX2 connectors for S3A board - anyone have a couple spare?
From: John Adair <g1@enterpoint.co.uk>
Date: Sat, 21 Mar 2009 08:09:34 -0700 (PDT)
Links: << >> << T >> << A >>

If you search as a UK customer some parts don't appear. This might
explain why you didn't find it.

John Adair
Enterpoint Ltd.

On 20 Mar, 13:39, Mike Harrison <m...@whitewing.co.uk> wrote:
> On Fri, 20 Mar 2009 14:25:10 +0100, "StoneThrower" <digi_64-public[remove=
this]@yahoo.com> wrote:
> >> Digikey only stock the straight version =A0(-DSA)
> >?!
> >Digikey p/n H10644-ND, FX2-100S-1.27DS, right-angled (~as nicely shown o=
n
> >pics):
> >http://parts.digikey.com/1/parts/287861-conn-recept-r-a-100pos-1-27mm...
>
> Thanks - =A0 I did several searches & failed to find it - seems like they=
 forgot to put the pin count
> in the item data so when I filtered to 100 pin, it didn't find it!
> Their parametric data is usually very good so I don't tend to look furthe=
r if the search doesn't
> find something - have emailed them.

Article: 139121
Subject: Re: How big is my vhdl and am I approaching some size limitation on
From: jleslie48 <jon@jonathanleslie.com>
Date: Sat, 21 Mar 2009 08:10:07 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 21, 8:42 am, Brian Drummond <brian_drumm...@btconnect.com>
wrote:
> On Fri, 20 Mar 2009 17:56:39 -0700 (PDT), jleslie48 <j...@jonathanleslie.=
com>
> wrote:
>
> >Well this is a bummer.  Here I think I'm being careful, working things
> >out with
> >test bench, re-building all the while and checking for growth, only to
> >be blind sided
> >when I hook up the pin to the generated signal.
>
> >Meantime I've got some more info and questions.
>
> >1) >> any idea on how to make it fit?
>
> >If it has to be that device, I would need two of them.
>
> >that chip we are getting for around $100, I don't even know where to
> >buy them, and where do I get a Virtex II-PRO chip? digikey says they
> >are $1000??  Wouldn't I be better off getting the Virtex II-PRO?
>
> Spartan-3 gives a sizeable resource for well under $100. (V2Pro with the =
same
> capacity would be $500 up - ballpark numbers)
> em.avnet.com lists the XC3S1500 for $74 (1 off) $55 (100 off) and you can
> upgrade to larger versioss if necessary.
>
> http://www.enterpoint.co.uk/moelbryn/raggedstone1.html
> Here's one example of a complete board with XC3S1500 for about $250.
>
> >Meantime the old chip is
> >mounted on a custom board layout, I guess my hardware guys are going
> >to have to re-lay out the board with
> >two of these chips?
>
> Consider a Spartan-3 layout for room to grow.
>
> >3) "Rerun synthesis and check the % utilization "
> >that's what I've been doing.  basically I added the equivalent of soft
> >uart and the data generator state machine that Jonathan so kindly set
> >me up with.  So I started backing out that code bit by bit to see
> >where I pop the %s.
>
> Not the best way - as you discovered.
>
> Find the "do not allocate I/O pin" synthesis option and synth each major
> subsystem as a separate project. Crosscheck that a simple sum of the resu=
lts is
> approx (say within 10%) of the overall size. Note any major surprises...
>
> This option is used to build separate re-usable modules (black boxes) so =
it is
> not allowed to optimize away anything not connected to a pin; therefore i=
t is a
> better way to determine resource usage.
>
> (Just another way to achieve what Jonathan advised, but without the labou=
r of
> adding the pins)
>
> - Brian

Hey everybody, thanks for all the good suggestions.

1) So it is reasonable to conclude that the cold runner 3512 is way
too small to even run a uart yes?


2) is the board that Brian suggested, the raggedstone1 with the
spartan XC3S1500 is big enough?

2A) I really want to put 3 or 4 UARTS onto the chip, Is it big enough
for that?

2B) The specs for the  XC3S1500 are:

Xilinx XC3S1500-4FG676C
FPGA Spartan=AE-3 Family 1.5M Gates 29952 Cells 630MHz Commercial 90nm
Technology 1.2V 676-Pin FCBGA
Cross to Alternate Parts by selecting most important features and
values below and then search again
	Search within this category only
	Search within this manufacturer only
  	Feature Description 	Feature Value
	Package 	676FCBGA
	Family Name 	Spartan=AE-3
	Device Logic Cells 	29952
	Device Logic Units 	3328
	Device System Gates 	1500000
	Number of Registers 	N/A
	Maximum Internal Frequency 	630 MHz
	Typical Operating Supply Voltage 	1.2 V
	Maximum Number of User I/Os 	487
	RAM Bits 	589824
	Re-programmability Support 	Yes

Whats the deal withe the "PACKAGE" ( 676FCBGA)

I see from AVNET that the XC3S1500 comes in lots of flavors:

http://avnetexpress.avnet.com/store/em/EMController?langId=3D-1&storeId=3D5=
00201&catalogId=3D500201&term=3DXC3S1500&x=3D0&y=3D0&N=3D0&action=3Dproduct=
s


XC3S1500-4FG320C
XC3S1500-4FG456C
XC3S1500-4FG676C
XC3S1500-4FGG456C
XC3S1500-5FG456C
XC3S1500-5FGG456C
XC3S1500-4FGG320C
XC3S1500-4FGG320I
XC3S1500-5FGG676C
XC3S1500-4FGG676C
XC3S1500-4FG320I
XC3S1500-4FG456I
XC3S1500-4FG676I
XC3S1500-4FGG456I
XC3S1500-4FGG676I
XC3S1500-5FG320C
XC3S1500-5FG676C
XC3S1500-5FGG320C
XC3S1500-5FGG320

how interchangeable are these parts?  the ds0099.pdf data sheet is not
geared towards just eh xc3s1500, I'm getting confused within its 219
pages...


3) The raggedstone1  has an added feature that I was told to consider,
mounting into a PC. Initially I want to put it a stand alone box, so I
will need to order the board, the PCI I/O header, and the Ocsillator
and then I'm good to go yes?

Article: 139122
Subject: Re: DVI in FPGA
From: "Antti.Lukats@googlemail.com" <Antti.Lukats@googlemail.com>
Date: Sat, 21 Mar 2009 08:52:49 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 21, 3:47=A0pm, Mawafugo <cco...@netscape.net> wrote:
> In xapp460 the DVI/HDMI transmitter & receiver is implemented but the
> max throughput limit to somewhat 750 Mb/s, which can handle up to
> 1080i or 720p resolution. =A0The 1080p, however needs twice of that
>
> The question is how can we crank up the throughput to about 1.5 Gb/s ?

answer is: it is not doable with S3A

Antti

Article: 139123
Subject: Re: How big is my vhdl and am I approaching some size limitation on
From: rickman <gnuarm@gmail.com>
Date: Sat, 21 Mar 2009 09:11:06 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 21, 11:10 am, jleslie48 <j...@jonathanleslie.com> wrote:
>
> 1) So it is reasonable to conclude that the cold runner 3512 is way
> too small to even run a uart yes?

I would not say that.  A UART can be done in a small number of FFs, or
in your case, a small number of macrocells.  I expect it to be easy to
get ten UARTs into the 3512.  But the code you have for a UART is very
large and overly complex if you just want to do serial transmission
and reception of data.  If you just want to *send* data the size can
be reduced further.

> 2) is the board that Brian suggested, the raggedstone1 with the
> spartan XC3S1500 is big enough?

Certainly an XC3S1500 is plenty large enough for four UARTs.  In that
size part you could have not only the UARTs, but also the CPU!

> 2A) I really want to put 3 or 4 UARTS onto the chip, Is it big enough
> for that?

Yes.  If four UARTs is all you want, the XC3S1500 is very much
overkill.

> 2B) The specs for the  XC3S1500 are:
>
> Xilinx XC3S1500-4FG676C
> FPGA Spartan=AE-3 Family 1.5M Gates 29952 Cells 630MHz Commercial 90nm
> Technology 1.2V 676-Pin FCBGA
> Cross to Alternate Parts by selecting most important features and
> values below and then search again
>         Search within this category only
>         Search within this manufacturer only
>         Feature Description     Feature Value
>         Package         676FCBGA
>         Family Name     Spartan=AE-3
>         Device Logic Cells      29952
>         Device Logic Units      3328
>         Device System Gates     1500000
>         Number of Registers     N/A
>         Maximum Internal Frequency      630 MHz
>         Typical Operating Supply Voltage        1.2 V
>         Maximum Number of User I/Os     487
>         RAM Bits        589824
>         Re-programmability Support      Yes
>
> Whats the deal withe the "PACKAGE" ( 676FCBGA)
>
> I see from AVNET that the XC3S1500 comes in lots of flavors:
>
> http://avnetexpress.avnet.com/store/em/EMController?langId=3D-1&storeId..=
.
>
> XC3S1500-4FG320C
> XC3S1500-4FG456C
> XC3S1500-4FG676C
> XC3S1500-4FGG456C
> XC3S1500-5FG456C
> XC3S1500-5FGG456C
> XC3S1500-4FGG320C
> XC3S1500-4FGG320I
> XC3S1500-5FGG676C
> XC3S1500-4FGG676C
> XC3S1500-4FG320I
> XC3S1500-4FG456I
> XC3S1500-4FG676I
> XC3S1500-4FGG456I
> XC3S1500-4FGG676I
> XC3S1500-5FG320C
> XC3S1500-5FG676C
> XC3S1500-5FGG320C
> XC3S1500-5FGG320
>
> how interchangeable are these parts?  the ds0099.pdf data sheet is not
> geared towards just eh xc3s1500, I'm getting confused within its 219
> pages...

They are all the same die and will likely all run the same bitstream.
You really only need to worry about the package you are using unless
you want to target different boards.

The first digit after the dash -4 or -5 is the speed of the part.  The
parts are the same, but they are tested to different speeds.  The
letter at the end is the temperature rating, C for commerical
(normally 0 to 70 C ambient, but I think Xilinx specs a higher number
and says this has to be the die temperature) and I for industrial (-20
to +85 C with the same issue as commercial).  The rest of the suffix
is the package.  Mostly all the different sized parts have similar
timing.  But some timing numbers are different.  Anything that is
widely distributed across the chip has further to go in the larger
chips, so it runs slower.

If you want to design for a range of packages you mainly need to limit
your design to the I/O pins that are used on the smallest package.  So
make sure every package you want to use supports all of those pins.
Otherwise you should have no problems.

> 3) The raggedstone1  has an added feature that I was told to consider,
> mounting into a PC. Initially I want to put it a stand alone box, so I
> will need to order the board, the PCI I/O header, and the Ocsillator
> and then I'm good to go yes?

The board will need power.  Other than that, you need to consult the
data sheet for the board.  They should provide specs on how to use the
board stand alone.

Rick

Article: 139124
Subject: Re: plb_emc with flash and datawidth matching
From: "Antti.Lukats@googlemail.com" <Antti.Lukats@googlemail.com>
Date: Sat, 21 Mar 2009 09:13:11 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 21, 12:51=A0pm, "Kristian Klaus" <kristian.kl...@gmx.de> wrote:
> I forgot to say, that I want the DATA_WIDTH_MATCHING option set to 1 beca=
use
> I want the emc to be fully transparent to the operating system (32 bit
> writes).
>
> Kristian
>
> "Kristian Klaus" <kristian.kl...@gmx.de> schrieb im Newsbeitragnews:gq2f5=
g$eph$1@hahn.informatik.hu-berlin.de...
>
> > Hello,
>
> > I am trying to connect a 16 bit Intel Strataflash to the
> > xps_mch_emc_v2_00_a. My problem is, that the flashwriter.tcl applicatio=
n
> > stops after some percents (13% or later) and never comes to 100%.
>
> > In the microblaze-uclinux archive, I found two old posts:
>
> >http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/archive/2005/0...
> > and
> >http://osdir.com/ml/linux.uclinux.microblaze/2004-02/msg00046.html
>
> > Can anybody confirm, that this also works with the new xps_mch_emc?
>
> > --
> > Kristian

upon the times when EDK was so young i designed a special IP core FIX
that reparaired the EMC and did allow flash writing when widtht
matching is on.

it was for MANY years. I assumed the problem is now fixed by Xilinx?

but i havent worked with EMC for long time

at the old times, the EMC fix was basically an AND gate that
removed some extra pulse that made the CFI interface go nuts.

Antti

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search