Messages from 62950

Article: 62950
Subject: Re: Home grown CPU core legal?
From: H. Peter Anvin <hpa@zytor.com>
Date: 11 Nov 2003 11:28:59 -0800
Links: << >> << T >> << A >>

Followup to:  <762349e4.0311101431.24595dcb@posting.google.com>
By author:    bpride@monad.net (Bruce P.)
In newsgroup: comp.arch.fpga
> 
> Did I mention that my new board design has a Cyclone on it? Hmmm, "The
> Cyclo-Blaze"...has sort of a nice ring to it. ;>)
> 
> I guess if it's smaller than the Altera Nios and a lot simpler, it
> could be of some use.  Anyway, it should be a good learning
> experience.  Thanks again.
> 

I think a small core that's vendor-independent would be nice.  There
are enough many things in common between most current FPGA
architectures (4- or 5-input LUTs, carry chains, dualport block RAMs
in the 4kbit size range) that doing something sane that's
technology-independent shouldn't be that hard.  It may not be as
sophisticated as MicroBlaze or NIOS, but wouldn't come with automatic
vendor lock-in.

I have actually been hacking a bit on a very simple 16-bit
architecture that I'm hoping will fit the bill.  No promises if or
when I'll get around to finishing it, though... at this point I'd say
the RTL is about 30% done.

	-hpa



-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
If you send me mail in HTML format I will assume it's spam.
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

Article: 62951
Subject: Re: BGA packages in high vibration environments
From: rickman <spamgoeshere4@yahoo.com>
Date: Tue, 11 Nov 2003 14:32:29 -0500
Links: << >> << T >> << A >>

Sorry I did not get back to you sooner.  The original contact was ASG at
www.asg-jergens.com.  They make the IS-1000 which gets under the BGA
from what I can see.  So you can see each and every ball.  But you
should get a demo since the sales pictures don't clearly indicate if
they are looking at the edge row of balls or an inner row.  

With a google search I found this - http://www.caltexsci.com/
They seem to make a similar product, but the web page is not too clear
if they are just looking at it from the outside.  


Ron Huizen wrote:
> 
> Rick,
> 
> I'd certainly be interested in more info on the fiber microscope you
> mentioned.  Debugging designs with lots of big BGAs is tough enough without
> wondering whether it's an assembly issue or not, and traditional xray
> techniques are good for showing shorts, but no so good for opens ...
> 
> -----
> Ron Huizen
> BittWare
> 
> "rickman" <spamgoeshere4@yahoo.com> wrote in message
> news:3F93E3DC.6753DCD4@yahoo.com...
> > Thomas Stanka wrote:
> > >
> > > Xpost 2 cae and caf, no Fup.
> > >
> > > Hallo,
> > >
> > > "Geoffrey Mortimer" <me@privacy.net> wrote:
> > > > Anyone have any experience of BGA's (especially fine pitch types) in
> high
> > > > vibration environments? Is there a more appropriate newsgroup for this
> > > > topic?
> > >
> > > Actually that's a very hot topic as BGA seems to get usual in the
> > > world of FPGAs and ASICs. I know that our mechanical engineers
> > > allready research on this topic, as we are very likely to have some
> > > fine pitch BGA in a high vibration environment in future.
> > > I would guess, that you should ask in some mechanical newsgroups as
> > > well.
> > > A big problem using FBGA is the test, wether you connected all balls
> > > proberly [1], as you have no chance of easy visual inspection.
> > >
> > > bye Thomas
> >
> > I recently saw a product that allows visual inspection of the solder
> > balls on a mounted BGA.  It is a fiber optic microscope and has tiny
> > fiber probes that can run between the balls.  I'll look for the info if
> > anyone is interested.
> >
> > --
> >
> > Rick "rickman" Collins
> >
> > rick.collins@XYarius.com
> > Ignore the reply address. To email me use the above address with the XY
> > removed.
> >
> > Arius - A Signal Processing Solutions Company
> > Specializing in DSP and FPGA design      URL http://www.arius.com
> > 4 King Ave                               301-682-7772 Voice
> > Frederick, MD 21701-3110                 301-682-7666 FAX

-- 

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 62952
Subject: Re: Home grown CPU core legal?
From: "Glen Herrmannsfeldt" <gah@ugcs.caltech.edu>
Date: Tue, 11 Nov 2003 19:37:38 GMT
Links: << >> << T >> << A >>


"H. Peter Anvin" <hpa@zytor.com> wrote in message
news:bord9r$sn5$1@cesium.transmeta.com...

(snip)

> I think a small core that's vendor-independent would be nice.  There
> are enough many things in common between most current FPGA
> architectures (4- or 5-input LUTs, carry chains, dualport block RAMs
> in the 4kbit size range) that doing something sane that's
> technology-independent shouldn't be that hard.  It may not be as
> sophisticated as MicroBlaze or NIOS, but wouldn't come with automatic
> vendor lock-in.
>
> I have actually been hacking a bit on a very simple 16-bit
> architecture that I'm hoping will fit the bill.  No promises if or
> when I'll get around to finishing it, though... at this point I'd say
> the RTL is about 30% done.

The PDP-11 has a nice simple 16 bit architecture, not including the optional
instructions.  (FIS and EIS for example.)

-- glen

Article: 62953
Subject: Re: Are modules that are not floorplanned still functional?
From: "Erez Birenzwig" <erez_birenzwig@hotmail.com>
Date: Wed, 12 Nov 2003 08:40:40 +1300
Links: << >> << T >> << A >>

I found out that the DDR FF is optimized out if you do not connect
it to a OBUF of some kind.
If you use a synthesis tool you might want to declare the pin as output
of some kind (Select IO), or just instantiate the OBUF directly in the HDL.

Erez.


"Jiang" <merlin_jiang@hotmail.com> wrote in message
news:9bd94bca.0311110028.294412a9@posting.google.com...
> Hello, FPGA friends,
>
> I'm trying to implement a simple clock bypassing on my Virtex-II 6000 with
an
> FF1152 board. My ISE is version 5.2.03i. In the beginning I could do a
trivial
> bypassing using Virtex 2000E with a BG560 board:
>
> input clk;
> output out_clk;
> wire out_clk;
>
> out_clk=clk;
>
> But on my Virtex-II 6000 it didn't work. It's fine since that I could try
> FDDRRSE to accomplish the same task. My code evolved as:
>
> input clk;
> output out_clk;
> wire out_clk;
>
> FDDRRSE fddrrse_0 (
> .Q (out_clk),
> .C0 (clk),
> .C1 (~clk),
> .CE (1'b1),
> .D0 (1'b1),
> .D1 (1'b0),
> .R (1'b0),
> .S (1'b0)
> );
>
> After browsing the old messages of this news group, I didn't know why the
above
> code failed again. The output port just stuck at logic 0, and it looked
like
> that fddrrse_0 was power-up and did nothing. Then I tried to use Xilinx
> floorplanner to see what FDDRRSE was. Here I found it not
> being floorplanned. Well, maybe it was too trivial to be floorplanned.
Hence
> I used Xilinx FPGA editor to see what the connections were like. However,
> besides my fddrrse_0, I didn't find any nets other than the I/O port
between
> the inferred input and output buffers and some VCCs. And it looked like
> clk --> clk_IBUFG --> out_clk was the whole route.
>
> I believe I might have missed something there such that neither could I
> bypass clock signals correctly nor could I understand what Xilinx
floorplanner
> and FPGA editor told me.
>
> Please give me some suggestion to understand even some parts of this
problem.
> Thank you :-)
>
> Regards,
> Merlin

Article: 62954
Subject: Re: Home grown CPU core legal?
From: H. Peter Anvin <hpa@zytor.com>
Date: 11 Nov 2003 11:59:27 -0800
Links: << >> << T >> << A >>

Followup to:  <6Uasb.123195$mZ5.829826@attbi_s54>
By author:    "Glen Herrmannsfeldt" <gah@ugcs.caltech.edu>
In newsgroup: comp.arch.fpga
> 
> The PDP-11 has a nice simple 16 bit architecture, not including the optional
> instructions.  (FIS and EIS for example.)
> 

The PDP-11 is still very much a CISC archtecture... I think it would
require a lot more logic than necessary.

This below is my design notes for my hacked-up architecture, currently
called "NanoRISC."

I have no way to know how this is turning out.  My current goal is to
make sure it implements in < 1000 LEs on Cyclone, without using
blockRAM for the register file.  Fundamentally it's a personal
research hack project.

	-hpa



NanoRISC goals
         - Minimal hardware consumption
         - Technology independent
         - Free licensing

-> 16-bit addressing, data width, instruction word
-> Single issue in-order RISC
-> Short pipeline (probably 3 stages)
-> Deterministic timing (1 cycle/insn, taken branch 2 cycles?)
-> Separate ports for I and D to take advantage of dual-port RAM

0000 NNNN NNNN NNNN     - IMM (supplies upper 12 bits of q or Is field)
0001 0000 SSSS DDDD     - JMP Rd,Rs (PC <- Rd, Rd <- Rs)
0001 CCCC TTTT TTTT     - BR cc,PC+t (cc != 0)
001I PPPP SSSS DDDD     - ALU Rd,Rs/Is (P = operation, I = immediate)
01WB QQQQ BBBB RRRR     - LD/ST Rr,[Rb+q] (W=ST/LD# B=16/8#)
1TTT TTTT TTTT TTTT     - CALL PC+t (PC <- PC+2, r15 <- PC, PC <- PC+t)


ALU opcodes

0000    UNARY
        1000    ROR
        1001    ROL
        1010    RCR
        1011    RCL
        1100    SHR
        1101    SHL
        1110    SAR
        1111    SXL     Shift left insert 1
	[...more...]

0001    MOV
        
0010    CMP
0011    TST

0100    ANDN
0101    OR
0110    XOR
0111    AND

1000    ADD
1001    ADC
1010    SUB
1011    SBC
1100    SUBR
1101    SBCR
      
1110    WRSR
1111    RDSR

Condition codes

3 N = negative
2 Z = zero
1 V = overflow
0 C = carry

        #e      - Z
        #b      - ~C
        #a      - C & ~Z
        #l      - V
        #g      - ~V & ~Z
        #s      - N

+ negations
 
	always	- negation of code 0000
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
If you send me mail in HTML format I will assume it's spam.
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

Article: 62955
Subject: Re: Implementing a very fast counterin VirtexII
From: Jim Granville <jim.granville@designtools.co.nz>
Date: Wed, 12 Nov 2003 09:31:08 +1300
Links: << >> << T >> << A >>

> Erez Birenzwig wrote:
> >
> > Then when you read the counter every clock cycle once every 64K counts
> > you'll
> > get a wrong result. I don't think it's good enough.. Remember the FMUL bug ?
> >
> > Anyway I got a good answer from another list :
> >
> > 1) Build a fast 2-bit counter
> > 2) Build a slow 62-bit counter, with enable
> > 3) Use enable = q[1]&q[0]
> > 4) latch the slow counter using the enable as well
> >
> > You get a full 4 cycles for the carry to ripple through the upper 62
> > bits.  Be careful in timing analysis.  Some systems let you specify that
> > the carry chain is a multi-cycle path.  Other's force you to ignore
> > these paths with falsepath commands.
Peter Alfke wrote:
> 
> Erez
> This is a good idea for a counter, but it does not work for a general
> purpose incrementer where you would throw new vectors at it on every
> clock cycle. In that case, my suggestion of detecting FFFF and
> generating a wait state works well. (I hope you did not think I was just
> brushing the problem under the carpet. I solve it with the additional
> wait state ).
> Regarding your 2-stage prescaler, I would extend this to three stages.
> It gives you double the timing benefit, and it fits the 4-input LUT
> structure very nicely.
> I don't understand your item 4, but that may be just semantics.

 I think item 4) was to cover capture of the counter at any
instant, and to cover the carry ripple. 
 I'm with Peter in questioning 4).
Carry ripple is certainly long, but this is on the .D side, and
determines the NEXT Clock delay. However any Capture is on the .Q side,
and all Q's will be fully sync (no ripple adders )
 Capture of both the prescaler, and long counter, can be clock
syncronous, and does not need any enables.

 Capture and Clear (can be more application usefull), can 
be done in a single clock with a little more .D side logic.

 Capture of fractional clocks, to push the time resolve better than
1/clock speed, is challenging, but looks possible in modern FPGA.

-jg

Article: 62956
Subject: Re: Home grown CPU core legal?
From: rickman <spamgoeshere4@yahoo.com>
Date: Tue, 11 Nov 2003 15:33:17 -0500
Links: << >> << T >> << A >>

"H. Peter Anvin" wrote:
> 
> Followup to:  <6Uasb.123195$mZ5.829826@attbi_s54>
> By author:    "Glen Herrmannsfeldt" <gah@ugcs.caltech.edu>
> In newsgroup: comp.arch.fpga
> >
> > The PDP-11 has a nice simple 16 bit architecture, not including the optional
> > instructions.  (FIS and EIS for example.)
> >
> 
> The PDP-11 is still very much a CISC archtecture... I think it would
> require a lot more logic than necessary.
> 
> This below is my design notes for my hacked-up architecture, currently
> called "NanoRISC."
> 
> I have no way to know how this is turning out.  My current goal is to
> make sure it implements in < 1000 LEs on Cyclone, without using
> blockRAM for the register file.  Fundamentally it's a personal
> research hack project.

Aren't there already several open source FPGA CPUs avaiable?  Anyone
have a few links handy?  

-- 

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 62957
Subject: Re: Implementing a very fast counterin VirtexII
From: "Erez Birenzwig" <erez_birenzwig@hotmail.com>
Date: Wed, 12 Nov 2003 09:50:48 +1300
Links: << >> << T >> << A >>

"Jim Granville" <jim.granville@designtools.co.nz> wrote in message
news:3FB1470C.F22@designtools.co.nz...
>  I think item 4) was to cover capture of the counter at any
> instant, and to cover the carry ripple.
>  I'm with Peter in questioning 4).
> Carry ripple is certainly long, but this is on the .D side, and
> determines the NEXT Clock delay. However any Capture is on the .Q side,
> and all Q's will be fully sync (no ripple adders )
>  Capture of both the prescaler, and long counter, can be clock
> syncronous, and does not need any enables.

4) is to cover the fact that once you enable the +1 on the long carry chain
you can't
sample it at the next clock cycle (It won't be ready by then), so you have
to latch the
previous value which is the new count value.

The problem is that I must be able to sample the counter on every arbitary
clock cycle
therefore it must be glitch free.

The counter that I need doesn't require a clear but thanks for the thought,
it needs a load though.

>
>  Capture and Clear (can be more application usefull), can
> be done in a single clock with a little more .D side logic.
>
>  Capture of fractional clocks, to push the time resolve better than
> 1/clock speed, is challenging, but looks possible in modern FPGA.
>
> -jg

Article: 62958
Subject: Re: Home grown CPU core legal?
From: "Erez Birenzwig" <erez_birenzwig@hotmail.com>
Date: Wed, 12 Nov 2003 09:58:14 +1300
Links: << >> << T >> << A >>

You should try www.opencores.org

Erez.

"rickman" <spamgoeshere4@yahoo.com> wrote in message
news:3FB1478D.8C19CC98@yahoo.com...
> "H. Peter Anvin" wrote:
> >

> Aren't there already several open source FPGA CPUs avaiable?  Anyone
> have a few links handy?
>
> --
>
> Rick "rickman" Collins
>
> rick.collins@XYarius.com
> Ignore the reply address. To email me use the above address with the XY
> removed.
>
> Arius - A Signal Processing Solutions Company
> Specializing in DSP and FPGA design      URL http://www.arius.com
> 4 King Ave                               301-682-7772 Voice
> Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 62959
Subject: Re: FPGAs and DRAM bandwidth
From: fortiz80@tutopia.com (Fernando)
Date: 11 Nov 2003 13:11:08 -0800
Links: << >> << T >> << A >>

> Quesion: do you REALLY need all that memory bandwidth?  Do you really
> need all that speed?  Or could you just make things take 10x longer,
> only require 2 banks of DDR, and use a smaller piece of FPGA logic?

I *could* take 10x longer.  I could use a pentium too.

-----------

For the ones interested in this thread, see

http://micron.com/news/product/2003-11-03_Altera-MicronDDR400.html

"Altera and Micron Announce Industry's First DDR400 SDRAM DIMM
Interface for FPGAs"

I don't see how that's "the first", but it's a good thing to have
multiple vendors to choose from.

Article: 62960
Subject: Re: Implementing a very fast counterin VirtexII
From: Jim Granville <jim.granville@designtools.co.nz>
Date: Wed, 12 Nov 2003 10:11:47 +1300
Links: << >> << T >> << A >>

Erez Birenzwig wrote:
> 
> "Jim Granville" <jim.granville@designtools.co.nz> wrote in message
> news:3FB1470C.F22@designtools.co.nz...
> >  I think item 4) was to cover capture of the counter at any
> > instant, and to cover the carry ripple.
> >  I'm with Peter in questioning 4).
> > Carry ripple is certainly long, but this is on the .D side, and
> > determines the NEXT Clock delay. However any Capture is on the .Q side,
> > and all Q's will be fully sync (no ripple adders )
> >  Capture of both the prescaler, and long counter, can be clock
> > syncronous, and does not need any enables.
> 
> 4) is to cover the fact that once you enable the +1 on the long carry chain
> you can't sample it at the next clock cycle (It won't be ready by then), so you have
> to latch the previous value which is the new count value.

 We may be differing in topology
A +1 is normally done on the register INPUT side (.D), not on the 
register OUTPUT (.Q) side.
 On a FPGA, you use carry logic/ +1 maths, on a CPLD, you use 
wide-AND and toggle flipflops.

 So, it does not matter if the long carry chain results are not ready
(and you are right, it will not be ready ), because the 
latch sample is taken from the .Q, whilst the carry results drive the .D
 You DO need to enable the counter clock, as that requires a fully
settled
summation result.

-jg

Article: 62961
Subject: Re: Implementing a very fast counterin VirtexII
From: Peter Alfke <peter@xilinx.com>
Date: Tue, 11 Nov 2003 13:17:31 -0800
Links: << >> << T >> << A >>

Erez,
there are some misunderstandings here:
"Your" counter (with its 2- or 3- bit front end that decodes an enable
to the 62 more significant bits) looks and behaves like a perfectly
synchronous counter. The carry into the 62-bit section has been active
and rippling up the chain, but the non-enable input stopped any action,
until Enable goes active and the next clock edge causes the whole
counter to increment, synchronously.
You don't have to do anything to prevent the ripple carry chain to do
something stupid, the Enable signal takes care of that.
You can read the whole proper count value within a nanosecond after each
incoming clock.
So far, so good.
But, as I said, do not try to use this as a general purpose incrementer,
where an arbitrary vector might come in on any clock tick. It will fail
miserably, but I showed you the solution (using a wait state).

Peter Alfke, Xilinx Applications
=============================
Erez Birenzwig wrote:
> 
> "Jim Granville" <jim.granville@designtools.co.nz> wrote in message
> news:3FB1470C.F22@designtools.co.nz...
> >  I think item 4) was to cover capture of the counter at any
> > instant, and to cover the carry ripple.
> >  I'm with Peter in questioning 4).
> > Carry ripple is certainly long, but this is on the .D side, and
> > determines the NEXT Clock delay. However any Capture is on the .Q side,
> > and all Q's will be fully sync (no ripple adders )
> >  Capture of both the prescaler, and long counter, can be clock
> > syncronous, and does not need any enables.
> 
> 4) is to cover the fact that once you enable the +1 on the long carry chain
> you can't
> sample it at the next clock cycle (It won't be ready by then), so you have
> to latch the
> previous value which is the new count value.
> 
> The problem is that I must be able to sample the counter on every arbitary
> clock cycle
> therefore it must be glitch free.
> 
> The counter that I need doesn't require a clear but thanks for the thought,
> it needs a load though.
> 
> >
> >  Capture and Clear (can be more application usefull), can
> > be done in a single clock with a little more .D side logic.
> >
> >  Capture of fractional clocks, to push the time resolve better than
> > 1/clock speed, is challenging, but looks possible in modern FPGA.
> >
> > -jg

Article: 62962
Subject: Re: Programmer's unpaid overtime. ==> I would suggest here some prudence
From: uselinux2000@yahoo.com (linux user)
Date: 11 Nov 2003 13:18:59 -0800
Links: << >> << T >> << A >>

I have been very efficient in companies, because the environment was
positive and good: there are places where accomplishment are not only
reconized, but easy.

I have also been very inefficient, in other places, for the usual
reasons: too much politics, too many red tapes, unability to comply
with constantly changing specifications, or more simply a boss who
delayed on purpose or by incompetence a project.

So I would suggest here some prudence.
Accountability does only make sense (and a lot!) if proper authority
is given.
Most of us like what they do, and try to do it right a reasonably
fast.
If there is somewhere a non performer, blaming the non performer
beside being a hiring mistake, is a very convenient way to push under
the rug many structural problems. (a facist and/or network
administrator is a very common one, an undefined level of
authority/responsibility is another one).

To me if things works, appreciate the "boss", if they do not blame
"the boss", unless (s)he is not given and proper authority.
One common problem is that promotion to supervisory position is often
given to people who haye their job! This should be hierachily neutral:
a good single performer, is just as important as a good director in an
orchestra.
And by the way, in music the best musicians usually become orchestra
directors.
 Do the same in engineering, things will be good.
- UL2K -
ps: false/fake achievements are so common.

The Real Bev <bashley@myrealbox.com> wrote in message news:<3FAC46F0.31F9B374@myrealbox.com>...
> Kevin Neilson wrote:
> > 
> > I've always been amazed that at a big company there can be two coders
> > sitting next to each other with outputs that vary by a factor of ten, and
> > their pay varies by a factor of 5%.  Companies seem to be very good at
> > laying off large swaths of workers, but not at firing really useless ones.
> > -Kevin
> 
> And some companies are very good at promoting and throwing great
> fistfuls of cash at coders with outputs of 100x the average who can also
> solve other technical problems.
> 
> It's really hard to fire a useless person without being able to prove in
> court that they guy really IS useless, was given the appropriate number
> of chances to remedy his uselessness, and that the company bent over
> backwards to keep him gainfully employed in spite of his limitations,
> especially if said useless person is a member of some EEO "protected"
> class.  You have problems even if you give such a person a charity
> layoff and a few months of severance pay.  
> 
> Carry on...

Article: 62963
Subject: Re: Implementing a very fast counterin VirtexII
From: "Erez Birenzwig" <erez_birenzwig@hotmail.com>
Date: Wed, 12 Nov 2003 10:20:45 +1300
Links: << >> << T >> << A >>

"Jim Granville" <jim.granville@designtools.co.nz> wrote in message
news:3FB15093.48E7@designtools.co.nz...
> Erez Birenzwig wrote:
> >
> > "Jim Granville" <jim.granville@designtools.co.nz> wrote in message
> > news:3FB1470C.F22@designtools.co.nz...
> >
> > 4) is to cover the fact that once you enable the +1 on the long carry
chain
> > you can't sample it at the next clock cycle (It won't be ready by then),
so you have
> > to latch the previous value which is the new count value.
>
>  We may be differing in topology
> A +1 is normally done on the register INPUT side (.D), not on the
> register OUTPUT (.Q) side.
>  On a FPGA, you use carry logic/ +1 maths, on a CPLD, you use
> wide-AND and toggle flipflops.

Sorry, you're right on the spot there. My mistake.

>
>  So, it does not matter if the long carry chain results are not ready
> (and you are right, it will not be ready ), because the
> latch sample is taken from the .Q, whilst the carry results drive the .D
>  You DO need to enable the counter clock, as that requires a fully
> settled
> summation result.
>
> -jg

Article: 62964
Subject: Re: Home grown CPU core legal?
From: Peter Alfke <peter@xilinx.com>
Date: Tue, 11 Nov 2003 13:22:29 -0800
Links: << >> << T >> << A >>

Your self-imposed limit of "1000 LEs without using BlockRAM for the
register file" will put you at a distinct disadvantage against
MicroBlaze which can use LUT-RAMs and SRL16s, something Altera does not have.
:-(   or    :-)  dpending on your affiliation.
Peter Alfke, Xilinx

rickman wrote:
> 
> "H. Peter Anvin" wrote:
> >
> > Followup to:  <6Uasb.123195$mZ5.829826@attbi_s54>
> > By author:    "Glen Herrmannsfeldt" <gah@ugcs.caltech.edu>
> > In newsgroup: comp.arch.fpga
> > >
> > > The PDP-11 has a nice simple 16 bit architecture, not including the optional
> > > instructions.  (FIS and EIS for example.)
> > >
> >
> > The PDP-11 is still very much a CISC archtecture... I think it would
> > require a lot more logic than necessary.
> >
> > This below is my design notes for my hacked-up architecture, currently
> > called "NanoRISC."
> >
> > I have no way to know how this is turning out.  My current goal is to
> > make sure it implements in < 1000 LEs on Cyclone, without using
> > blockRAM for the register file.  Fundamentally it's a personal
> > research hack project.
> 
> Aren't there already several open source FPGA CPUs avaiable?  Anyone
> have a few links handy?
> 
> --
> 
> Rick "rickman" Collins
> 
> rick.collins@XYarius.com
> Ignore the reply address. To email me use the above address with the XY
> removed.
> 
> Arius - A Signal Processing Solutions Company
> Specializing in DSP and FPGA design      URL http://www.arius.com
> 4 King Ave                               301-682-7772 Voice
> Frederick, MD 21701-3110                 301-682-7666 FAX

Article: 62965
Subject: Re: Home grown CPU core legal?
From: Jim Granville <jim.granville@designtools.co.nz>
Date: Wed, 12 Nov 2003 10:24:44 +1300
Links: << >> << T >> << A >>

H. Peter Anvin wrote:
> 
> Followup to:  <6Uasb.123195$mZ5.829826@attbi_s54>
> By author:    "Glen Herrmannsfeldt" <gah@ugcs.caltech.edu>
> In newsgroup: comp.arch.fpga
> >
> > The PDP-11 has a nice simple 16 bit architecture, not including the optional
> > instructions.  (FIS and EIS for example.)
> >
> 
> The PDP-11 is still very much a CISC archtecture... I think it would
> require a lot more logic than necessary.
> 
> This below is my design notes for my hacked-up architecture, currently
> called "NanoRISC."
> 
> I have no way to know how this is turning out.  My current goal is to
> make sure it implements in < 1000 LEs on Cyclone, without using
> blockRAM for the register file.  Fundamentally it's a personal
> research hack project.
> 
>         -hpa
> 
> NanoRISC goals
>          - Minimal hardware consumption
>          - Technology independent
>          - Free licensing
> 
> -> 16-bit addressing, data width, instruction word

 In doing a 'clean slate' FPGA small core, there is merit in choosing
an opcode width that matches the FPGA Block RAM / Multiplier widths.
 ( eg I've seen 9 bit opcodes used )
 Did you look at that ?

-jg

Article: 62966
Subject: Re: Linux and FPGA compatibility
From: Petter Gustad <newsmailcomp6@gustad.com>
Date: 11 Nov 2003 22:26:03 +0100
Links: << >> << T >> << A >>

sleepymish@hotmail.com (Michelle) writes:

> By the way, I have a related question....
> 
> I believe the "/proc/bus/pci/devices" file in Linux lists all the
> devices on the PCI bus. I'm wondering if anyone knows how this file is
> updated or maintained? How is this file updated with the devices.? Is
> it updated every time the system is turned on or does the system admin
> update with the appropriate information?

This is maintained by the pci driver. It's not a real file, but data
wich is collected by the driver when try to read from this "file"
(there is a hook function you specify in a struct which you register
throu a call to proc_register). Try to look for pci_proc_attach_device
in the Linux source tree:

find /usr/src/linux/ -name '*.[ch]' | xargs grep pci_proc_attach_device

Writing a Linux device driver is not extremely difficult since there's
a lot of documentation out there as well as source code for all (well
most) of the current device drivers in use.

You can also use the "lspci" command to list all PCI devices. You can
also specify a bus:slot.function number as well as verbose output and
a hex dump of the PCI config space, like: "lspci -s 2:2.0 -v -x"

Petter

-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Article: 62967
Subject: Re: Home grown CPU core legal?
From: H. Peter Anvin <hpa@zytor.com>
Date: 11 Nov 2003 13:29:47 -0800
Links: << >> << T >> << A >>

Followup to:  <3FB15315.5D9F933E@xilinx.com>
By author:    Peter Alfke <peter@xilinx.com>
In newsgroup: comp.arch.fpga
>
> Your self-imposed limit of "1000 LEs without using BlockRAM for the
> register file" will put you at a distinct disadvantage against
> MicroBlaze which can use LUT-RAMs and SRL16s, something Altera does not have.
> :-(   or    :-)  dpending on your affiliation.
> Peter Alfke, Xilinx
> 

Since my affiliation is "neither" (I just happen to own a Cyclone
board since that was the biggest FPGA I could get with free tools) I
guess it's more of a :-| than either of those :^)

Unless Xilinx' tools are complete crap, which I'd find unlikely, I
would expect that the tools would infer the use of LUT-RAMs for the
register file if synthesized for a Xilinx part.  It's all part of "no
vendor lockin."

Also, this is mostly a project I'm doing for fun.  If it happens to be
useful at some point in the future, so much the better, if not, I've
still achieved my goal of grokking FPGA synthesis better.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
If you send me mail in HTML format I will assume it's spam.
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

Article: 62968
Subject: Re: Home grown CPU core legal?
From: Petter Gustad <newsmailcomp6@gustad.com>
Date: 11 Nov 2003 22:30:28 +0100
Links: << >> << T >> << A >>

"Erez Birenzwig" <erez_birenzwig@hotmail.com> writes:

> You should try www.opencores.org

Or 

http://www.fpgacpu.org/links.html


Petter

-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Article: 62969
Subject: Re: Home grown CPU core legal?
From: H. Peter Anvin <hpa@zytor.com>
Date: 11 Nov 2003 13:31:51 -0800
Links: << >> << T >> << A >>

Followup to:  <3FB1539C.160F@designtools.co.nz>
By author:    jim.granville@designtools.co.nz
In newsgroup: comp.arch.fpga
> 
>  In doing a 'clean slate' FPGA small core, there is merit in choosing
> an opcode width that matches the FPGA Block RAM / Multiplier widths.
>  ( eg I've seen 9 bit opcodes used )
>  Did you look at that ?
> 

Some vendors have 9/18-bit blockRAMs, some don't.  I'm trying to be as
generic as possible.  It also makes it easier to port tools like
gas/binutils/gcc.

	-hpa


-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
If you send me mail in HTML format I will assume it's spam.
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

Article: 62970
Subject: Re: Implementing a very fast counterin VirtexII
From: "Francisco Rodriguez" <prodrig@disca.upv.es>
Date: Tue, 11 Nov 2003 22:34:55 +0100
Links: << >> << T >> << A >>


"Erez Birenzwig" <erez_birenzwig@hotmail.com> escribió en el mensaje
news:x_bsb.1057$%o4.34282@news.xtra.co.nz...
> "Jim Granville" <jim.granville@designtools.co.nz> wrote in message
> news:3FB1470C.F22@designtools.co.nz...
> >  I think item 4) was to cover capture of the counter at any
> > instant, and to cover the carry ripple.
> >  I'm with Peter in questioning 4).
> > Carry ripple is certainly long, but this is on the .D side, and
> > determines the NEXT Clock delay. However any Capture is on the .Q side,
> > and all Q's will be fully sync (no ripple adders )
> >  Capture of both the prescaler, and long counter, can be clock
> > syncronous, and does not need any enables.
>
> 4) is to cover the fact that once you enable the +1 on the long carry
chain
> you can't
> sample it at the next clock cycle (It won't be ready by then), so you have
> to latch the
> previous value which is the new count value.
>
> The problem is that I must be able to sample the counter on every arbitary
> clock cycle
> therefore it must be glitch free.
>
It wouldn't be easier to always latch the low part?
That is, you build a circuit with a latency of 1 cycle to load a, and then
you have new a+1 result on every cycle.

Something like this:

process (...)
begin
   if rising_edge(clk) then
      a_low <= a_low + 1;
      if (a_low = x"FFFF") then
         enable_high <= '1';
     else
         enable_high <= '0';
     end if;
     if (enable_high = '1') then
        a_high <= a_high + 1;
     end if;
     late_a_low <= a_low;
   end if;
end process;

And the output is
a <= a_high & late_a_low;

Of course, additional logic is needed to load the counter with a predefined
value.

Once you get a (correct) value on a (by the load operation, not shown
above),
a_low maintains the low part of the _next_ count, while a_high has the high
part of the _current_ count.
That is why you need to delay a_low by a clock period.

I've tested a similar approach for a 64 bit counter running >200MHz on a
virtex2 -4 speed grade
although in my case the counter was built from 8 8-bit blocks.


Regards
    Francisco

> The counter that I need doesn't require a clear but thanks for the
thought,
> it needs a load though.
>
> >
> >  Capture and Clear (can be more application usefull), can
> > be done in a single clock with a little more .D side logic.
> >
> >  Capture of fractional clocks, to push the time resolve better than
> > 1/clock speed, is challenging, but looks possible in modern FPGA.
> >
> > -jg
>
>

Article: 62971
Subject: Re: Arithmetics with carry
From: H. Peter Anvin <hpa@zytor.com>
Date: 11 Nov 2003 13:36:47 -0800
Links: << >> << T >> << A >>

Followup to:  <nZ%qb.101301$mZ5.680324@attbi_s54>
By author:    "Glen Herrmannsfeldt" <gah@ugcs.caltech.edu>
In newsgroup: comp.arch.fpga
> 
> This is true, except for generating the flags on the final add.  Well, you
> can either generate all the flags, or only the signed or unsigned flags.
> For the intermediate adds only the carry, or lack of carry, from the high
> bit is important.  To detect signed overflow or underflow (more negative
> than can be represented) requires comparing the carry into and out of the
> sign bit.
> 

It depends.  Some architectures define CF=0 to mean borrow-out from a
subtraction.  Under that definition (used by the PDP-11, for example),
SUB is equivalent to NEG + ADD (a desirable property in my opinion);
under the "other" definition (as used by among others Intel processors
ever since the 4004), SUB ends up producing the opposite carry from
NEG+ADD.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
If you send me mail in HTML format I will assume it's spam.
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

Article: 62972
Subject: Re: Home grown CPU core legal?
From: Jim Granville <jim.granville@designtools.co.nz>
Date: Wed, 12 Nov 2003 10:54:17 +1300
Links: << >> << T >> << A >>

H. Peter Anvin wrote:
> 
> Followup to:  <3FB1539C.160F@designtools.co.nz>
> By author:    jim.granville@designtools.co.nz
> In newsgroup: comp.arch.fpga
> >
> >  In doing a 'clean slate' FPGA small core, there is merit in choosing
> > an opcode width that matches the FPGA Block RAM / Multiplier widths.
> >  ( eg I've seen 9 bit opcodes used )
> >  Did you look at that ?
> >
> 
> Some vendors have 9/18-bit blockRAMs, some don't.  I'm trying to be as
> generic as possible.  It also makes it easier to port tools like
> gas/binutils/gcc.

 & Off chip memory is also easier....

 As FPGAs get ever cheaper, and Block RAM gets larger, and
factoring in relative speeds, there is scope to define a 
CPU that takes a coarse approach to cache, like :

- reserves a BlockRAM (or 2) for CODE for SW interrupt loops, 
  and Cache-locked code 
  This gives very fast responses, and lowers RFI and total Power
  (minimum off-chip BUS/eternal memory activity)

- uses another Block RAM for code cache, where it is allowed to pause
  while it loads from slower memory. Dual Port RAM would allow a FIFO
  style load. 
  External memory could be WORD, BYTE or even serial ( FPGA_Stamp :)

- Other Block RAMS are standard DATA rams, including fast context
  register switching for interrupts / param passing.

  Design ends up with a single CPU, but two distinct areas of FAST and
SLOW
 code and data.

 Does anyone know of work using this HW focus on FPGA cores ?

- jg

Article: 62973
Subject: Re: Home grown CPU core legal?
From: Jim Granville <jim.granville@designtools.co.nz>
Date: Wed, 12 Nov 2003 10:59:15 +1300
Links: << >> << T >> << A >>

H. Peter Anvin wrote:
> 
> I have no way to know how this is turning out.  My current goal is to
> make sure it implements in < 1000 LEs on Cyclone, without using
> blockRAM for the register file.

 Isn't some form of BlockRAM a defacto standard on all 
'consider for new design' FPGAs - so not using that would 
restrict your options ?

 -jg

Article: 62974
Subject: Re: Home grown CPU core legal?
From: rickman <spamgoeshere4@yahoo.com>
Date: Tue, 11 Nov 2003 17:46:18 -0500
Links: << >> << T >> << A >>

Jim Granville wrote:
> 
> H. Peter Anvin wrote:
> >
> > I have no way to know how this is turning out.  My current goal is to
> > make sure it implements in < 1000 LEs on Cyclone, without using
> > blockRAM for the register file.
> 
>  Isn't some form of BlockRAM a defacto standard on all
> 'consider for new design' FPGAs - so not using that would
> restrict your options ?
> 
>  -jg

So are hardware multipliers these days.  I believe all the latest chips
have them as well as multi-standard IOs.  

-- 

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search