Messages from 138850

Article: 138850
Subject: Re: Nibz processor @ <570 MAXII LEs (16 bit generic specified), 20MHz
From: LittleAlex <alex.louie@email.com>
Date: Thu, 12 Mar 2009 11:17:20 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 11, 10:34 pm, "Antti.Luk...@googlemail.com"
<Antti.Luk...@googlemail.com> wrote:
>
> MAXII is a bad FPGA, as Altera made design mistakes (no distributed
> ram!)
>
> Antti

Perhaps that explains why Altera markets it as a CPLD, not an FPGA?

AL

Article: 138851
Subject: Re: Nibz processor @ <570 MAXII LEs (16 bit generic specified), 20MHz
From: -jg <Jim.Granville@gmail.com>
Date: Thu, 12 Mar 2009 11:41:24 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 13, 8:12=A0am, Walter Banks <wal...@bytecraft.com> wrote:
> Jim
>
> You beat me to the comment on serial processors.
>
> A long time ago I designed a number of bit serial processors
> they can be very hardware efficient. There are a number of
> very clever math algorithms that take advantage of bit serial.

The obvious next question, is do you have compilers
& maths libraries for such an animal ? ;)

-jg

Article: 138852
Subject: Re: Nibz processor @ <570 MAXII LEs (16 bit generic specified), 20MHz
From: "Antti.Lukats@googlemail.com" <Antti.Lukats@googlemail.com>
Date: Thu, 12 Mar 2009 11:44:07 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 12, 8:17=A0pm, LittleAlex <alex.lo...@email.com> wrote:
> On Mar 11, 10:34 pm, "Antti.Luk...@googlemail.com"
>
> <Antti.Luk...@googlemail.com> wrote:
>
> > MAXII is a bad FPGA, as Altera made design mistakes (no distributed
> > ram!)
>
> > Antti
>
> Perhaps that explains why Altera markets it as a CPLD, not an FPGA?
>
> AL

its bad when marketed as CPLD as well ;)

because it looks like poor man FPGA...

Antti

Article: 138853
Subject: Re: Nibz processor @ <570 MAXII LEs (16 bit generic specified), 20MHz
From: Jacko <jackokring@gmail.com>
Date: Thu, 12 Mar 2009 11:49:59 -0700 (PDT)
Links: << >> << T >> << A >>

Hi

12 bit (4K) 230 LEs 28 pin + Power 20-30MHz

nibz12-bit.vhd from download http://nibz.googlecode.com if you only
need a 4K address space of 12 bit memory locations. Full 12 bit
datapath. There is no benefit really going lower in the wide generic,
as an 8-bit only has 256 addressable locations for program and data.

cheers jacko

p.s. main reason for slowness is C% grade measure and use of carry
chain to reduce ALU area.

Article: 138854
Subject: Re: Nibz processor @ <570 MAXII LEs (16 bit generic specified), 20MHz
From: "Antti.Lukats@googlemail.com" <Antti.Lukats@googlemail.com>
Date: Thu, 12 Mar 2009 11:56:59 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 12, 12:31=A0pm, -jg <Jim.Granvi...@gmail.com> wrote:
> On Mar 12, 8:57=A0pm, rickman <gnu...@gmail.com> wrote:
>
> > I seem to recall that you were trying to find a bit serial CPU that
> > would be the smallest possible in an FPGA. =A0Did you ever find one you
> > liked? =A0Personally, I think that is a goal with a very low target
> > application size. =A0But certainly there are some apps where this could
> > be useful.
>
> Bit-serial (like Cop8) only makes size-sense to simplify bus routing.
> - but that's almost free in a FPGA, so other needs better drive Bit-
> Serial.
>
> Bit-serial Multiply/divide can save resource, but that's less a core
> than
> an algortihm =A0trade-off, and Mul/Div are rare in the smallest cores
> anyway.
>
> One plus that appeals to me, is Execute from Serial FLASH, (and now
> Serial RAM)
> [does a nibble fetch from 4 bit SPI still count as Bit-serial ? ]
> =A0as resource space. Saves MANY pins, and PCB space, but I'm not sure
> the
> core will be _smaller_ as a result - more likely slightly larger ?
>
> -jg

Jim,

this almost the old discussion ;)

yes, a bit serial processor that executes in place either from spi
flash or SD card
and uses say the dual 512 data buffers of atmel dataflash as ram could
be
very low resources

say for lowest cost Xilinx FPGA S3A-50 resources are pretty tight, so
if
a soft core can execute from the same spi flash that is used for
config
using the flash as code memory and flash buffers as ram, it would
retain almost all the rest of the FPGA resources for user application

should be doable in <100 Xilinx s3a slices i think

Antti

Article: 138855
Subject: Re: How to initialize the Xilinx FIFO with predetermined value on
From: no_spa2005@yahoo.fr
Date: Thu, 12 Mar 2009 11:59:23 -0700 (PDT)
Links: << >> << T >> << A >>

Hi,

Just do it yourself ! Is not so complicate and you can have some help
with xapp258. Then you can modify the init value of the BlockRam (or
replace this blockRam with a small distributed RAM).

Regards.

Article: 138856
Subject: Re: Nibz processor @ <570 MAXII LEs (16 bit generic specified), 20MHz
From: Walter Banks <walter@bytecraft.com>
Date: Thu, 12 Mar 2009 14:12:20 -0500
Links: << >> << T >> << A >>

Jim

You beat me to the comment on serial processors.

A long time ago I designed a number of bit serial processors
they can be very hardware efficient. There are a number of
very clever math algorithms that take advantage of bit serial.

A second comment on this core is what thought has been made
on parallel partitioning problems and how would that be handled.

Regards,

--
Walter Banks
Byte Craft Limited
http://www.bytecraft.com



-jg wrote:

> On Mar 12, 8:57 pm, rickman <gnu...@gmail.com> wrote:
> > I seem to recall that you were trying to find a bit serial CPU that
> > would be the smallest possible in an FPGA.  Did you ever find one you
> > liked?  Personally, I think that is a goal with a very low target
> > application size.  But certainly there are some apps where this could
> > be useful.
>
> Bit-serial (like Cop8) only makes size-sense to simplify bus routing.
> - but that's almost free in a FPGA, so other needs better drive Bit-
> Serial.
>
> Bit-serial Multiply/divide can save resource, but that's less a core
> than
> an algortihm  trade-off, and Mul/Div are rare in the smallest cores
> anyway.
>
> One plus that appeals to me, is Execute from Serial FLASH, (and now
> Serial RAM)
> [does a nibble fetch from 4 bit SPI still count as Bit-serial ? ]
>  as resource space. Saves MANY pins, and PCB space, but I'm not sure
> the
> core will be _smaller_ as a result - more likely slightly larger ?
>
> -jg

Article: 138857
Subject: Re: speeding hough tranformation in microblaze
From: Benjamin Couillard <benjamin.couillard@gmail.com>
Date: Thu, 12 Mar 2009 12:22:40 -0700 (PDT)
Links: << >> << T >> << A >>

On 12 mar, 08:32, SUMAN <suman...@gmail.com> wrote:
> Hello!
>
> My team is doing real time machine vision project in Spartan3a dsp
> 1800 board. We took greyscale data from c3038 camera module and
> succesfully performed sobel edge detection in hardware. Now we are
> detecting lines form the binary image fed to microblaze processor
> =A0We have to perform iterations to determine the value of r from the
> following equation
> r =3D x*cos(t) + y*sin(t) =A0for each (x,y) from t=3D -90 =A0degree to 90
> degree
>
> We have thought some solutions:-
>
> I) USING FLOATING POINT UNIT OF MICROBLAZE 7 AND PERFORMING THE
> CALCULATION WITH SINE /COSINE LOOKUP TABLE KEPT IN MEMORY
>
> II) USING CORDIC/ SINECOSINE LUT CORE CONECTED TO MICROBLAZE THROUGH
> FSL LINK
>
> CAN ANY BODY SUGGESTS ME ANY OTHER SOLUTION FOR MY PROBLEM
>
> THANK YOU

You could also use the CORDIC algorithm in software. It might be
simpler. Check this link on how to implement an efficient version of
the CORDIC algorithm in software.

http://www.embedded.com/design/embeddeddsp/210200583?_requestid=3D8108

Article: 138858
Subject: Re: FPGA LVDS for AC-decoupled transmit over CAT-5 cable
From: Stef <stef33d@yahooI-N-V-A-L-I-D.com.invalid>
Date: Thu, 12 Mar 2009 21:20:44 +0100
Links: << >> << T >> << A >>

In comp.arch.fpga,
Antti.Lukats@googlemail.com <Antti.Lukats@googlemail.com> wrote:
>
> and.. i do not like any wires direct to FPGA or MCU either (going off
> board/cable),
> have seen a Atmel to bulk erase itself because the reset line was
> in 2 meter long cable parallel to wire carrying 12V (reed relay
> switched)
> (well Atmel claimed such bulk erase is impossible...
> but it happened twice and second time i had another
> guy to witness it, so i wasnt seeing ghosts)

A little off-topic perhaps, but now i'm curious. What atmel chip did
you experience this erase with? We have just experienced a few
spontanious erases (over a couple of months) of the first sector of an
atmel dataflash on one of our boards. Another board in the same system
uses a dataflash for MCU code and one for FPGA configuration and we have
not had trouble with those.

-- 
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail)

New York's got the ways and means;
Just won't let you be.
		-- The Grateful Dead

Article: 138859
Subject: Re: FPGA LVDS for AC-decoupled transmit over CAT-5 cable
From: "Antti.Lukats@googlemail.com" <Antti.Lukats@googlemail.com>
Date: Thu, 12 Mar 2009 13:49:57 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 12, 10:20=A0pm, Stef <stef...@yahooI-N-V-A-L-I-D.com.invalid>
wrote:
> In comp.arch.fpga,
>
> Antti.Luk...@googlemail.com <Antti.Luk...@googlemail.com> wrote:
>
> > and.. i do not like any wires direct to FPGA or MCU either (going off
> > board/cable),
> > have seen a Atmel to bulk erase itself because the reset line was
> > in 2 meter long cable parallel to wire carrying 12V (reed relay
> > switched)
> > (well Atmel claimed such bulk erase is impossible...
> > but it happened twice and second time i had another
> > guy to witness it, so i wasnt seeing ghosts)
>
> A little off-topic perhaps, but now i'm curious. What atmel chip did
> you experience this erase with? We have just experienced a few
> spontanious erases (over a couple of months) of the first sector of an
> atmel dataflash on one of our boards. Another board in the same system
> uses a dataflash for MCU code and one for FPGA configuration and we have
> not had trouble with those.
>
> --
> Stef =A0 =A0(remove caps, dashes and .invalid from e-mail address to repl=
y by mail)
>
> New York's got the ways and means;
> Just won't let you be.
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 -- The Grateful Dead


it was the VERY first samples of ATmega32

those samples had m32 written with pencil on top

no i am lying, it was atmega163 and we tried to program those m32

the erasure did erase ALL memory, application sector and boot sector
at same time, and there is no IAP command todo that.. so it should
never happen

Antti

Article: 138860
Subject: Re: FPGA LVDS for AC-decoupled transmit over CAT-5 cable
From: Stef <stef33d@yahooI-N-V-A-L-I-D.com.invalid>
Date: Thu, 12 Mar 2009 22:04:14 +0100
Links: << >> << T >> << A >>

In comp.arch.fpga,
Antti.Lukats@googlemail.com <Antti.Lukats@googlemail.com> wrote:
> On Mar 12, 10:20 pm, Stef <stef...@yahooI-N-V-A-L-I-D.com.invalid>
> wrote:
>> In comp.arch.fpga,
>>
>> Antti.Luk...@googlemail.com <Antti.Luk...@googlemail.com> wrote:
>>
>> > and.. i do not like any wires direct to FPGA or MCU either (going off
>> > board/cable),
>> > have seen a Atmel to bulk erase itself because the reset line was
>> > in 2 meter long cable parallel to wire carrying 12V (reed relay
>> > switched)
>> > (well Atmel claimed such bulk erase is impossible...
>> > but it happened twice and second time i had another
>> > guy to witness it, so i wasnt seeing ghosts)
>>
>> A little off-topic perhaps, but now i'm curious. What atmel chip did
>> you experience this erase with? We have just experienced a few
>> spontanious erases (over a couple of months) of the first sector of an
>> atmel dataflash on one of our boards. Another board in the same system
>> uses a dataflash for MCU code and one for FPGA configuration and we have
>> not had trouble with those.
>>
>> --
>> Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail)
>>
>> New York's got the ways and means;
>> Just won't let you be.
>>                 -- The Grateful Dead
>
>
> it was the VERY first samples of ATmega32
>
> those samples had m32 written with pencil on top
>
> no i am lying, it was atmega163 and we tried to program those m32
>
> the erasure did erase ALL memory, application sector and boot sector
> at same time, and there is no IAP command todo that.. so it should
> never happen

OK, thanks for the info. No obvious relation to our problem, we keep
searching.


-- 
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail)

A programming language is low level when its programs require attention
to the irrelevant.

Article: 138861
Subject: Re: Nibz processor @ <570 MAXII LEs (16 bit generic specified), 20MHz
From: "Antti.Lukats@googlemail.com" <Antti.Lukats@googlemail.com>
Date: Thu, 12 Mar 2009 14:32:31 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 13, 12:18=A0am, Walter Banks <wal...@bytecraft.com> wrote:
> -jg wrote:
> > On Mar 13, 8:12 am, Walter Banks <wal...@bytecraft.com> wrote:
> > > Jim
>
> > > You beat me to the comment on serial processors.
>
> > > A long time ago I designed a number of bit serial processors
> > > they can be very hardware efficient. There are a number of
> > > very clever math algorithms that take advantage of bit serial.
>
> > The obvious next question, is do you have compilers
> > & maths libraries for such an animal ? ;)
>
> We did do a COP8 compiler where some of the features was
> a software / hardware solution.
>
> There were a lot of papers on this stuff at one point. Most I
> would expect if there are on the net will be available in image
> only.
>
> The COP8 is one of the well known bit serial processors. Many
> of the early 8051's were bit serial. Go back far enough and there is
> the PDP8S the S was jokingly referred to as slow. The IBM 1620
> was a serial nybble processor. The 1620 had one of the known
> serial processor advantages that of variable length numbers.
>
> All of which could be implemented with a SD card and some logic.
>
> Regards,
>
> --
> Walter Banks
> Byte Craft Limitedhttp://www.bytecraft.com

for execute in place for SD possible 4 bit cpu would be best
as sd fetches 4 bit per clock

eh, there is 4 bit 8051 !!
Atom

i wonder if that would be very small in FPGA or not
didnt deep enough to see how much of 8051 is there
and what is made 4 bit wide


Antti

Article: 138862
Subject: Re: Nibz processor @ <570 MAXII LEs (16 bit generic specified), 20MHz
From: rickman <gnuarm@gmail.com>
Date: Thu, 12 Mar 2009 14:35:40 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 12, 6:31=A0am, -jg <Jim.Granvi...@gmail.com> wrote:
> On Mar 12, 8:57=A0pm, rickman <gnu...@gmail.com> wrote:
>
> > I seem to recall that you were trying to find a bit serial CPU that
> > would be the smallest possible in an FPGA. =A0Did you ever find one you
> > liked? =A0Personally, I think that is a goal with a very low target
> > application size. =A0But certainly there are some apps where this could
> > be useful.
>
> Bit-serial (like Cop8) only makes size-sense to simplify bus routing.
> - but that's almost free in a FPGA, so other needs better drive Bit-
> Serial.

I don't agree with that.  An ALU can be serial and a register file can
be a single bit wide with random access to any bit of any register.  A
single LUT4 can hold two 8 bit registers so a CPU with four 8 bit
registers can hold them in two LUT4s.  The main memory likewise can be
implemented with a single bit data path.  So the data path of a small
CPU can be as little as half a dozen LUTs.  Of course there is a trade
off in complexity of the control logic, but still the CPU can be made
very small compared to even a PICO blaze.  With a little innovation,
the data size can be arbitrarily wide as well as independent of the
data path.

A stack machine can use a block ram as both the data and return stacks
with a bit of address work.  I am doing that with a 16 bit wide
machine and I expect the same architecture can be used with a 1 bit
data path to greatly reduce the amount of resources used.

> Bit-serial Multiply/divide can save resource, but that's less a core
> than
> an algortihm =A0trade-off, and Mul/Div are rare in the smallest cores
> anyway.

You can all any of the data path sizings are "algorithm" trade-offs,
but they can still be very efficient in resource usage.

> One plus that appeals to me, is Execute from Serial FLASH, (and now
> Serial RAM)
> [does a nibble fetch from 4 bit SPI still count as Bit-serial ? ]
> =A0as resource space. Saves MANY pins, and PCB space, but I'm not sure
> the
> core will be _smaller_ as a result - more likely slightly larger ?

The control logic is likely larger.  If you can't figure out how to
make the data path structure smaller by reducing the data path width,
you need to go back to school!

Rick

Article: 138863
Subject: Re: Nibz processor @ <570 MAXII LEs (16 bit generic specified), 20MHz
From: rickman <gnuarm@gmail.com>
Date: Thu, 12 Mar 2009 14:38:25 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 12, 2:56=A0pm, "Antti.Luk...@googlemail.com"
<Antti.Luk...@googlemail.com> wrote:
> On Mar 12, 12:31=A0pm, -jg <Jim.Granvi...@gmail.com> wrote:
>
>
>
> > On Mar 12, 8:57=A0pm, rickman <gnu...@gmail.com> wrote:
>
> > > I seem to recall that you were trying to find a bit serial CPU that
> > > would be the smallest possible in an FPGA. =A0Did you ever find one y=
ou
> > > liked? =A0Personally, I think that is a goal with a very low target
> > > application size. =A0But certainly there are some apps where this cou=
ld
> > > be useful.
>
> > Bit-serial (like Cop8) only makes size-sense to simplify bus routing.
> > - but that's almost free in a FPGA, so other needs better drive Bit-
> > Serial.
>
> > Bit-serial Multiply/divide can save resource, but that's less a core
> > than
> > an algortihm =A0trade-off, and Mul/Div are rare in the smallest cores
> > anyway.
>
> > One plus that appeals to me, is Execute from Serial FLASH, (and now
> > Serial RAM)
> > [does a nibble fetch from 4 bit SPI still count as Bit-serial ? ]
> > =A0as resource space. Saves MANY pins, and PCB space, but I'm not sure
> > the
> > core will be _smaller_ as a result - more likely slightly larger ?
>
> > -jg
>
> Jim,
>
> this almost the old discussion ;)
>
> yes, a bit serial processor that executes in place either from spi
> flash or SD card
> and uses say the dual 512 data buffers of atmel dataflash as ram could
> be
> very low resources
>
> say for lowest cost Xilinx FPGA S3A-50 resources are pretty tight, so
> if
> a soft core can execute from the same spi flash that is used for
> config
> using the flash as code memory and flash buffers as ram, it would
> retain almost all the rest of the FPGA resources for user application
>
> should be doable in <100 Xilinx s3a slices i think
>
> Antti

If you really mean 100 ***slices*** then you are not beating the
parallel processors.  The pico blaze and the Micro8 are both about 200
LUTs, IIRC.  I don't measure slices because only Xilinx and Lattice
have slices.  Most FPGAs have LUT4s (except for Actel and Atmel, but
nobody uses Atmel and not many use Actel).

Rick

Article: 138864
Subject: Re: Nibz processor @ <570 MAXII LEs (16 bit generic specified), 20MHz
From: rickman <gnuarm@gmail.com>
Date: Thu, 12 Mar 2009 14:44:57 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 12, 1:30=A0pm, Jacko <jackokr...@gmail.com> wrote:
> hi
>
> 292 LEs fully stripped, no ROM, no RAM no IO pins, 16 bit address, 16
> bit data Bus. Expected 20-30MHz, (36 pins plus power), About 10 MIPS
> at 20MHz.
>
> cheers jacko

What's a MIPS?  Native instructions?  Or something that can be
compared to other processors?

I have yet found a good way to compare these small, FPGA CPUs.  The
ZPU is pretty small, but the originator seems to still think in terms
of Dhrystones.  I can't begin to measure my processor in Dhrystones.
The original imnplementation was about 600 LUTs, 50 MHz, 50 MIPS in an
Altera ACEX 1K part (very old and pretty slow).  I am working to
update it for a more current FPGA.

Rick

Article: 138865
Subject: Re: Nibz processor @ <570 MAXII LEs (16 bit generic specified), 20MHz
From: rickman <gnuarm@gmail.com>
Date: Thu, 12 Mar 2009 14:52:10 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 12, 1:44=A0pm, "Antti.Luk...@googlemail.com"
<Antti.Luk...@googlemail.com> wrote:
> On Mar 12, 7:30=A0pm, Jacko <jackokr...@gmail.com> wrote:
>
> > hi
>
> > 292 LEs fully stripped, no ROM, no RAM no IO pins, 16 bit address, 16
> > bit data Bus. Expected 20-30MHz, (36 pins plus power), About 10 MIPS
> > at 20MHz.
>
> > cheers jacko
>
> without any io/ram/rom its kinda useless?
> and such small soft cores can usually run 200mhz+ in decent FPGA's ;)
> (150mhz in low cost FPGA's)
>
> ok, 292 or <570LE, it's ir-relevant as long as there are no tools to
> program it,
> and your forth-xxx whatever isnt useable yet?
>
> there are zillions of stack soft-cpu's but i fail to see nice and easy
> tools
> todo anything with them.. no compilers
> some forth xxx things that requires some xxx to be installed on your
> PC
> and then do something very awkward and some more to get some code
> actually executing...
>
> and yes I have programmed in Forth many decades ago, think used
> something called GraphForth for msdos
>
> =3D=3D
>
> compile_my_forth_to_bin.exe hello.forth > hello.bin
>
> if that creates a ready to use bin file to run with your nibz
> and you have tons of tested libraries.. someone may get interested..
>
> if you have something totally untested, not ready no demos
> no reference design ?
>
> =3D=3D
>
> ok ZPU (a stack machine also) has GCC toolchain kind of, but it isnt
> that small anymore the core despite being advertized as smallest 32
> bit core with GCC support
>
> Antti

The author claims one incarnation is around 400 LUTs.  I have not seen
any of the four versions actually in a form that can be compiled to
run a program of your choice without some work.  I did a block diagram
of the ZPU small and estimated around 600 LUTs.  They are all saying
it is a bit slow with a clock speed of under 50 MHz, sometimes very
far below 50 MHz, IIRC and under 10, maybe under 1 DMIPS, I can't
remember exactly.  The effort is poorly organized and I found it hard
to contribute anything useful other than my block diagram drawing
which I'm not sure anyone cared about.  Most of the participants are
hard core software guys who don't seem to understand how to optimize
an FPGA CPU for resources, speed and code density.  Code density is a
primary consideration which is one I share.  But the instruction set
is designed for "efficient" C coding which means the author doesn't
want to put too much effort into the compiler to produce instructions
that are easier to put in the FPGA.  He optimized the compiler back
end and now that tail is wagging the ZPU dog.

Still, it is a very interesting effort and I am watching the mailing
list and occasionally make a post.

Rick

Article: 138866
Subject: Re: Nibz processor @ <570 MAXII LEs (16 bit generic specified), 20MHz
From: "Antti.Lukats@googlemail.com" <Antti.Lukats@googlemail.com>
Date: Thu, 12 Mar 2009 14:54:56 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 12, 11:38=A0pm, rickman <gnu...@gmail.com> wrote:
> On Mar 12, 2:56=A0pm, "Antti.Luk...@googlemail.com"
>
>
>
> <Antti.Luk...@googlemail.com> wrote:
> > On Mar 12, 12:31=A0pm, -jg <Jim.Granvi...@gmail.com> wrote:
>
> > > On Mar 12, 8:57=A0pm, rickman <gnu...@gmail.com> wrote:
>
> > > > I seem to recall that you were trying to find a bit serial CPU that
> > > > would be the smallest possible in an FPGA. =A0Did you ever find one=
 you
> > > > liked? =A0Personally, I think that is a goal with a very low target
> > > > application size. =A0But certainly there are some apps where this c=
ould
> > > > be useful.
>
> > > Bit-serial (like Cop8) only makes size-sense to simplify bus routing.
> > > - but that's almost free in a FPGA, so other needs better drive Bit-
> > > Serial.
>
> > > Bit-serial Multiply/divide can save resource, but that's less a core
> > > than
> > > an algortihm =A0trade-off, and Mul/Div are rare in the smallest cores
> > > anyway.
>
> > > One plus that appeals to me, is Execute from Serial FLASH, (and now
> > > Serial RAM)
> > > [does a nibble fetch from 4 bit SPI still count as Bit-serial ? ]
> > > =A0as resource space. Saves MANY pins, and PCB space, but I'm not sur=
e
> > > the
> > > core will be _smaller_ as a result - more likely slightly larger ?
>
> > > -jg
>
> > Jim,
>
> > this almost the old discussion ;)
>
> > yes, a bit serial processor that executes in place either from spi
> > flash or SD card
> > and uses say the dual 512 data buffers of atmel dataflash as ram could
> > be
> > very low resources
>
> > say for lowest cost Xilinx FPGA S3A-50 resources are pretty tight, so
> > if
> > a soft core can execute from the same spi flash that is used for
> > config
> > using the flash as code memory and flash buffers as ram, it would
> > retain almost all the rest of the FPGA resources for user application
>
> > should be doable in <100 Xilinx s3a slices i think
>
> > Antti
>
> If you really mean 100 ***slices*** then you are not beating the
> parallel processors. =A0The pico blaze and the Micro8 are both about 200
> LUTs, IIRC. =A0I don't measure slices because only Xilinx and Lattice
> have slices. =A0Most FPGAs have LUT4s (except for Actel and Atmel, but
> nobody uses Atmel and not many use Actel).
>
> Rick

hm, i did mean <200 lut and yes it is about the size of picoblaze i am
aware
of that but it if it works with LARGE memory space >=3D8GB and does not
use
FPGA ram (but the spi flash ram buffers) then i would say if it is fit
<200 lut it
would be very nice already. I did assume the lut number for some
minimal
but full functional system, not the the bare cpu only

Antti
PS I do have the Atmel FPSLIC board+dongle and some silicon samples
also Actel ProAsic/ProAsic3/Fusion boards and programmers,
but you are right, i can not say i have used the Atmel.. its just too
damn
expensive compared to features it has.

Article: 138867
Subject: Re: Nibz processor @ <570 MAXII LEs (16 bit generic specified), 20MHz
From: hal-usenet@ip-64-139-1-69.sjc.megapath.net (Hal Murray)
Date: Thu, 12 Mar 2009 17:08:34 -0500
Links: << >> << T >> << A >>

>
>The author claims one incarnation is around 400 LUTs.  I have not seen
>any of the four versions actually in a form that can be compiled to
>run a program of your choice without some work.  I did a block diagram
>of the ZPU small and estimated around 600 LUTs.  They are all saying
>it is a bit slow with a clock speed of under 50 MHz, sometimes very
>far below 50 MHz, IIRC and under 10, maybe under 1 DMIPS, I can't
>remember exactly.  The effort is poorly organized and I found it hard
>to contribute anything useful other than my block diagram drawing
>which I'm not sure anyone cared about.  Most of the participants are
>hard core software guys who don't seem to understand how to optimize
>an FPGA CPU for resources, speed and code density.  Code density is a
>primary consideration which is one I share.  But the instruction set
>is designed for "efficient" C coding which means the author doesn't
>want to put too much effort into the compiler to produce instructions
>that are easier to put in the FPGA.  He optimized the compiler back
>end and now that tail is wagging the ZPU dog.
>
>Still, it is a very interesting effort and I am watching the mailing
>list and occasionally make a post.

If the clock speed gets that slow, I'd consider making a
simple clean CPU and emulating the ZPU instruction set.  It
would need ROM for the microcode and somebody would have to
write the microcode.

-- 
These are my opinions, not necessarily my employer's.  I hate spam.

Article: 138868
Subject: Hidden debug print in ISE ( XIL_PROJNAV_FLOW_DEBUG_LEVEL)
From: "MM" <mbmsv@yahoo.com>
Date: Thu, 12 Mar 2009 18:08:42 -0400
Links: << >> << T >> << A >>

Perhaps this can help someone. I've spent a day trying to understand why 
suddenly I couldn't build an archived  ISE/EDK 8.2 project anymore. The ISE 
would just stop for no reason producing no reports. I still don't know why 
this happened, but at least I found how to work around the problem. I found 
in the Xilinx knowledge base that there is an environment variable 
XIL_PROJNAV_FLOW_DEBUG_LEVEL one can set to 99 to enable debug prints. I did 
it and it actually pointed me to a problem with attaching two IP cores to 
the design. There is still no explanation of why it worked before, but as I 
said at least I was able to work around the issue.


/Mikhail

Article: 138869
Subject: Re: Nibz processor @ <570 MAXII LEs (16 bit generic specified), 20MHz
From: Walter Banks <walter@bytecraft.com>
Date: Thu, 12 Mar 2009 17:18:48 -0500
Links: << >> << T >> << A >>

-jg wrote:

> On Mar 13, 8:12 am, Walter Banks <wal...@bytecraft.com> wrote:
> > Jim
> >
> > You beat me to the comment on serial processors.
> >
> > A long time ago I designed a number of bit serial processors
> > they can be very hardware efficient. There are a number of
> > very clever math algorithms that take advantage of bit serial.
>
> The obvious next question, is do you have compilers
> & maths libraries for such an animal ? ;)

We did do a COP8 compiler where some of the features was
a software / hardware solution.

There were a lot of papers on this stuff at one point. Most I
would expect if there are on the net will be available in image
only.

The COP8 is one of the well known bit serial processors. Many
of the early 8051's were bit serial. Go back far enough and there is
the PDP8S the S was jokingly referred to as slow. The IBM 1620
was a serial nybble processor. The 1620 had one of the known
serial processor advantages that of variable length numbers.

All of which could be implemented with a SD card and some logic.

Regards,

--
Walter Banks
Byte Craft Limited
http://www.bytecraft.com

Article: 138870
Subject: Re: FPGA LVDS for AC-decoupled transmit over CAT-5 cable
From: hal-usenet@ip-64-139-1-69.sjc.megapath.net (Hal Murray)
Date: Thu, 12 Mar 2009 17:25:32 -0500
Links: << >> << T >> << A >>

>was just thinking of extending SPI like comms using cheapest
>and ready made cabling
>
>so one pair in each direction only. was hoping to get 50mbit/s?

How far do you want to go?

Ethernet gets a gigabit in each direction over 4 pairs.  That
takes a lot of DSP magic and 3 level signaling.  Part of the
complication is to reduce EMI.

Ethernet uses transformers rather than capacitors.  I'm not sure
why.

You should also consider USB.

If you are using caps (or transformers) you have to do something
to make sure there are no long strings of 0s or 1.  Manchester
encoding does that at the cost of 2x in bandwidth.  (which may
not be a problem for short distances)  4b5b, 8b/10b type encoders
get back most of the bandwidth at the cost of some complexity
at each end.  Manchester is trivial to encode and pretty easy
to decode with a small state machine if you have a 8x clock
at the receiver.

-- 
These are my opinions, not necessarily my employer's.  I hate spam.

Article: 138871
Subject: Re: Nibz processor @ <570 MAXII LEs (16 bit generic specified), 20MHz
From: -jg <Jim.Granville@gmail.com>
Date: Thu, 12 Mar 2009 15:55:31 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 13, 10:32=A0am, "Antti.Luk...@googlemail.com" > for execute in
place for SD possible 4 bit cpu would be best
> as sd fetches 4 bit per clock
>
> eh, there is 4 bit 8051 !!
> Atom
>
> i wonder if that would be very small in FPGA or not
> didnt deep enough to see how much of 8051 is there
> and what is made 4 bit wide

Atom (4 bit '80C51') Data is here

http://www.coreriver.co.kr/data/manual/BM-ATOM1.1-V1.0.pdf

Atom moves to 4 bits, drops all registers,
Opcodes are 1 byte, 2 for calls
Direct memory index of 4 bits is supported.(no offset index?)
An 8 bit memory-index pointer exists
Calls are 12 bits
Some Boolean opcodes remain
Does have PUSH A, POP A, & and 8 bit Stack Pointer
No MUL/DIV, and no RETI

So not that well suited to FPGA morph, & too large for CPLDs
- FPGA's better suit 18 bit opcodes, and the dual-port ram, means
register-cores
map better.

Width is almost free inside a FPGA, & a 36 bit fetch (9 clocks) from
nibble SPI, would fit two 18b opcodes - and move you into SoftCPU
space.

-jg

Article: 138872
Subject: Re: speeding hough tranformation in microblaze
From: Kolja <ksulimma@googlemail.com>
Date: Thu, 12 Mar 2009 15:56:06 -0700 (PDT)
Links: << >> << T >> << A >>

On 12 Mrz., 13:32, SUMAN <suman...@gmail.com> wrote:
> Now we are
> detecting lines form the binary image fed to microblaze processor
> =A0We have to perform iterations to determine the value of r from the
> following equation
> r =3D x*cos(t) + y*sin(t) =A0for each (x,y) from t=3D -90 =A0degree to 90
> degree

I do not understand that specification. Are x, y and t independent so
you are iterating over three dimensional parameter space?
Do you loop over all x and y, or do you have  a sparse set of x, y
points
and only iterate over all angles for each of these?

It is strange that you call the results "r" as the equation
corresponds
to the y coordinate of a rotation. And this seems to me the clue to
solve
this problem efficiently: If you compute both the X and Y result of
the rotation in each step you can obtain all your results by
incremental
Rotations of 1 degree at a cost of 4 multiplications per result.
You should be able to do somewhere around 60M Iterations in a
Virtex-4,
more if you do C-Slow-Retiming (e.g. pipelining the iteration and work
on
multiple X,Y pairs alternatingly.

You only need the constants sin(1=B0) and cos(1=B0).
Y' =3D X*sin(1=B0) + Y*cos(1=B0)
X' =3D X*cos(1=B0) + Y*sin(1=B0);

To match your alignment above and to start with -90=B0 you need to swap
and/or
X and Y accoringly.

Have fun,

Kolja Sulimma

Article: 138873
Subject: Re: Nibz processor @ <570 MAXII LEs (16 bit generic specified), 20MHz
From: rickman <gnuarm@gmail.com>
Date: Thu, 12 Mar 2009 18:46:48 -0700 (PDT)
Links: << >> << T >> << A >>

On Mar 12, 6:08 pm, hal-use...@ip-64-139-1-69.sjc.megapath.net (Hal
Murray) wrote:
> >The author claims one incarnation is around 400 LUTs.  I have not seen
> >any of the four versions actually in a form that can be compiled to
> >run a program of your choice without some work.  I did a block diagram
> >of the ZPU small and estimated around 600 LUTs.  They are all saying
> >it is a bit slow with a clock speed of under 50 MHz, sometimes very
> >far below 50 MHz, IIRC and under 10, maybe under 1 DMIPS, I can't
> >remember exactly.  The effort is poorly organized and I found it hard
> >to contribute anything useful other than my block diagram drawing
> >which I'm not sure anyone cared about.  Most of the participants are
> >hard core software guys who don't seem to understand how to optimize
> >an FPGA CPU for resources, speed and code density.  Code density is a
> >primary consideration which is one I share.  But the instruction set
> >is designed for "efficient" C coding which means the author doesn't
> >want to put too much effort into the compiler to produce instructions
> >that are easier to put in the FPGA.  He optimized the compiler back
> >end and now that tail is wagging the ZPU dog.
>
> >Still, it is a very interesting effort and I am watching the mailing
> >list and occasionally make a post.
>
> If the clock speed gets that slow, I'd consider making a
> simple clean CPU and emulating the ZPU instruction set.  It
> would need ROM for the microcode and somebody would have to
> write the microcode.

I don't think the clock speed is all that slow.  It just doesn't do a
lot in each clock cycle I believe.  It is stack based, but the stack
is in memory, not inside the CPU.  I don't know all the details.  I
looked at the code a bit, but that is a poor way to learn the
architecture... or at least a painful way to learn it.  I did draw a
diagram of the data paths.  It is surprisingly straightforward, but
each part of the CPU process is a separate clock cycle,
Fetch, Decode, and multiple Execute steps.  As a hardware designer it
is not anything like what I would have designed, but they did keep it
fairly small at 600 LUTs ballpark.  The idea is to have other
implementations that run identical code but much faster.

Like I said, it is interesting and I'll keep watching it.

Rick

Article: 138874
Subject: Re: Nibz processor @ <570 MAXII LEs (16 bit generic specified), 20MHz
From: Jacko <jackokring@gmail.com>
Date: Thu, 12 Mar 2009 20:22:08 -0700 (PDT)
Links: << >> << T >> << A >>

On 12 Mar, 21:44, rickman <gnu...@gmail.com> wrote:
> On Mar 12, 1:30 pm, Jacko <jackokr...@gmail.com> wrote:
>
> > hi
>
> > 292 LEs fully stripped, no ROM, no RAM no IO pins, 16 bit address, 16
> > bit data Bus. Expected 20-30MHz, (36 pins plus power), About 10 MIPS
> > at 20MHz.
>
> > cheers jacko
>
> What's a MIPS?  Native instructions?  Or something that can be
> compared to other processors?

Native instructions. Fetch/execute, and Fetch/execute/execute (SUm
instruction).

> I have yet found a good way to compare these small, FPGA CPUs.  The
> ZPU is pretty small, but the originator seems to still think in terms
> of Dhrystones.  I can't begin to measure my processor in Dhrystones.

I do not yet have any C compilier so drystones are not measurable.

> The original imnplementation was about 600 LUTs, 50 MHz, 50 MIPS in an
> Altera ACEX 1K part (very old and pretty slow).  I am working to
> update it for a more current FPGA.

A re-implementation of the ALU is possible to improve clock rate, but
the area does increase as it is a 16 bit ALU, with 4 operations. So 4
ops * 4way multiplex. The reason for using a carry chain is that the
alu shrinks as some of the multiplex is merged with the alu operations
(not much more complicated than just add in luts). I understand from
altera support that the two lut3 arithmetic mode is not really used,
and so fast carry propagation is not done, and it is the critical
path, hence the extra cycle inserted.

A harvard architecture is not used. Code density is reasonably high
using threaded subroutines, as no jump opcode has to prefix the jump
address. In full ASIC custom logic such carry propergation issues are
not as dominant .So considering each instruction is 16 bit data in
width, the processor is quite impressivly small. If only the carry
chain was used effectively in lut4 mode.

Of course all optimization was for area, and not speed, with no
retiming. Just the register duplication (2 LEs) for increased
routability. Yes lack of high level software tools is a real pain for
product design. But I'm slowly working on that. I do not have a major
development team. I am just me, and this is not paid work.

I think my offer of 1 free core per chip, for just a logo print, and
documentation copyright recognition and URL is very good. Especially
as this offer unlike the still standing BSD offer allows derived
products without revealing the derived source.

Cheers jacko

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search