Site Home   Archive Home   FAQ Home   How to search the Archive   How to Navigate the Archive   
Compare FPGA features and resources   

Threads starting:
1994JulAugSepOctNovDec1994
1995JanFebMarAprMayJunJulAugSepOctNovDec1995
1996JanFebMarAprMayJunJulAugSepOctNovDec1996
1997JanFebMarAprMayJunJulAugSepOctNovDec1997
1998JanFebMarAprMayJunJulAugSepOctNovDec1998
1999JanFebMarAprMayJunJulAugSepOctNovDec1999
2000JanFebMarAprMayJunJulAugSepOctNovDec2000
2001JanFebMarAprMayJunJulAugSepOctNovDec2001
2002JanFebMarAprMayJunJulAugSepOctNovDec2002
2003JanFebMarAprMayJunJulAugSepOctNovDec2003
2004JanFebMarAprMayJunJulAugSepOctNovDec2004
2005JanFebMarAprMayJunJulAugSepOctNovDec2005
2006JanFebMarAprMayJunJulAugSepOctNovDec2006
2007JanFebMarAprMayJunJulAugSepOctNovDec2007
2008JanFebMarAprMayJunJulAugSepOctNovDec2008
2009JanFebMarAprMayJunJulAugSepOctNovDec2009
2010JanFebMarAprMayJunJulAugSepOctNovDec2010
2011JanFebMarAprMayJunJulAugSepOctNovDec2011
2012JanFebMarAprMayJunJulAugSepOctNovDec2012
2013JanFebMarAprMayJunJulAugSepOctNovDec2013
2014JanFebMarAprMayJunJulAugSepOctNovDec2014
2015JanFebMarAprMayJunJulAugSepOctNovDec2015
2016JanFebMarAprMayJunJulAugSepOctNovDec2016
2017JanFebMarAprMayJunJulAugSepOctNovDec2017
2018JanFebMarAprMayJunJulAugSepOctNovDec2018
2019JanFebMarAprMayJunJulAugSepOctNovDec2019
2020JanFebMarAprMay2020

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search

Messages from 155000

Article: 155000
Subject: Xilinx tools for XC3020???
From: Mike Butts <mbuttspdx@gmail.com>
Date: Mon, 25 Mar 2013 17:04:38 -0700 (PDT)
Links: << >>  << T >>  << A >>
I've got a 20-year-old Xilinx XC3020 development board. I think it would be fun to fire it up and bring it to the 20th anniversary FCCM in Seattle next month. (http://fccm.org/2013/)

I don't see XC3000-series supported on even the oldest archived ISE at xilinx.com. Anyone know where I can find some tools for this old chip? It has 64 CLBs and 256 flip-flops! Maybe one of you folks at Xilinx? Thanks!

  --Mike
 

Article: 155001
Subject: Re: Xilinx tools for XC3020???
From: Gabor <gabor@szakacs.org>
Date: Mon, 25 Mar 2013 21:27:11 -0500
Links: << >>  << T >>  << A >>
On 3/25/2013 7:04 PM, Mike Butts wrote:
> I've got a 20-year-old Xilinx XC3020 development board. I think it would be fun to fire it up and bring it to the 20th anniversary FCCM in Seattle next month. (http://fccm.org/2013/)
>
> I don't see XC3000-series supported on even the oldest archived ISE at xilinx.com. Anyone know where I can find some tools for this old chip? It has 64 CLBs and 256 flip-flops! Maybe one of you folks at Xilinx? Thanks!
>
>    --Mike
>
>

A major problem with the really old Xilinx tools (Alliance and
Foundation) is that Xilinx did not provide the front-end
(synthesis or schematics).  In the alliance version you
had to buy a third-party tool like ViewLogic to get a
front end.  Foundation added an Aldec front-end, but then
there was a falling out between Xilinx and Aldec so Xilinx
is no longer able to supply those tools (although for
people who already had them they still run - I have Foundation
4.1i to support really old designs, but not as old as the XC3000
series).

I'm not sure what you could get running in a chip that small
that might impress anyone at FCCM, unless it's just the
antiquity of the thing.  In any case, unless you find
someone who still has a running copy of the old tools,
you'll spend way too much to get the development environment.

-- Gabor

Article: 155002
Subject: Biggest Fake Conference in Computer Science
From: ronaldwilliams356@gmail.com
Date: Tue, 26 Mar 2013 07:16:12 -0700 (PDT)
Links: << >>  << T >>  << A >>
Biggest Fake Conference in Computer Science

I graduated from University of Florida and am currently running a computer =
firm in Florida. I have attended WORLDCOMP conference (see http://sites.goo=
gle.com/site/worlddump1 for details) in 2010. Except for few keynote speech=
es and presentations, the conference was very disappointing due to a large =
number of poor quality papers and cancellation of some sessions. I was inst=
antly suspicious of this conference.=20


Me and my friends started a study on WORLDCOMP. We submitted a fake paper t=
o WORLDCOMP 2011 and again (the same paper with a modified title) to WORLDC=
OMP 2012. This paper had numerous fundamental mistakes. Sample statements f=
rom that paper include:=20
(1). Binary logic is fuzzy logic and vice versa
(2). Pascal developed fuzzy logic
(3). Object oriented languages do not exhibit any polymorphism or inheritan=
ce
(4). TCP and IP are synonyms and are part of OSI model=20
(5). Distributed systems deal with only one computer
(6). Laptop is an example for a super computer
(7). Operating system is an example for computer hardware


Also, our paper did not express any conceptual meaning.  However, it was ac=
cepted both the times without any modifications (and without any reviews) a=
nd we were invited to submit the final paper and a payment of $500+ fee to =
present the paper. We decided to use the fee for better purposes than makin=
g Prof. Hamid Arabnia (Chairman of WORLDCOMP) rich. After that, we received=
 few reminders from WORLDCOMP to pay the fee but we never responded.=20


We MUST say that you should look at the website  http://sites.google.com/si=
te/worlddump1 if you have any thoughts to submit a paper to WORLDCOMP.  DBL=
P and other indexing agencies have stopped indexing WORLDCOMP=92s proceedin=
gs since 2011 due to its fakeness.=20


The status of your WORLDCOMP papers can be changed from =93scientific=94 to=
 =93other=94 (i.e., junk or non-technical) at anytime. See the comments htt=
p://www.mail-archive.com/tccc@lists.cs.columbia.edu/msg05168.html of a resp=
ected researcher on this. Better not to have a paper than having it in WORL=
DCOMP and spoil the resume and peace of mind forever!


Our study revealed that WORLDCOMP is a money making business, using Univers=
ity of Georgia mask, for Prof. Hamid Arabnia. He is throwing out a small ch=
unk of that money (around 20 dollars per paper published in WORLDCOMP=92s p=
roceedings) to his puppet who publicizes WORLDCOMP and also defends it at v=
arious forums, using fake/anonymous names. The puppet uses fake names and d=
efames other conferences/people to divert traffic to WORLDCOMP. That is, th=
e puppet does all his best to get a maximum number of papers published at W=
ORLDCOMP to get more money into his (and Prof. Hamid Arabnia=92s) pockets.=
=20


Monte Carlo Resort (the venue of WORLDCOMP until 2012) has refused to provi=
de the venue for WORLDCOMP=9213 because of the fears of their image being t=
arnished due to WORLDCOMP=92s fraudulent activities.=20


WORLDCOMP will not be held after 2013.


The paper submission deadline for WORLDCOMP=9213 was March 18 and it is ext=
ended to April 6 (it will be extended many times, as usual) but still there=
 are no committee members, no reviewers, and there is no conference Chairma=
n. The only contact details available on WORLDCOMP=92s website is just an e=
mail address!=20


What bothers us the most is that Prof. Hamid Arabnia never posted an apolog=
y for the damage he has done to the research community.  He is still trying=
 to defend WORLDCOMP. Let us make a direct request to him: publish all revi=
ews for all the papers (after blocking identifiable details) since 2000 con=
ference. Reveal the names and affiliations of all the reviewers (for each y=
ear) and how many papers each reviewer had reviewed on average. We also req=
uest him to look at the Open Challenge at  http://sites.google.com/site/dum=
pconf


We think that it is our professional obligation to spread this message to a=
lert the computer science community. Sorry for posting to multiple lists. S=
preading the word is the only way to stop this bogus conference. Please for=
ward this message to other mailing lists and people.=20


We are shocked with Prof. Hamid Arabnia and his puppet=92s activities http:=
//worldcomp-fake-bogus.blogspot.com Search Google using the keywords =93wor=
ldcomp, fake=94 for additional links.=20


Sincerely,
Ronald

Article: 155003
Subject: Re: Where to move for an embedded software engineer.
From: John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com>
Date: Tue, 26 Mar 2013 08:13:25 -0700
Links: << >>  << T >>  << A >>
On Tue, 11 Dec 2012 22:52:17 -0600, no one <notaclue@gmail.com> wrote:

>In article <ka3e4a$vab$1@dont-email.me>,
> hamilton <hamilton@nothere.com> wrote:
>
>> On 12/9/2012 6:04 PM, no one wrote:
>> > I assume the bay area is number one for embedded software engineers,
>> > but where else are the big markets, as companies run from califoria taxes.
>> >
>> > Denver, CO - Does big population mean high tech?
>> > Phoenix, AZ - Sun birds.
>> > Albuquerque, NM - Sun birds, ballon festival.
>> > Salt Lake City, UT - Mormons, big population.
>> > Portland, OR - Big population.
>> > Seattle, WA - All those ex-Microsofties starting companies.
>> >
>> > Which of these are go, or no-go?
>> >
>> > And if the bay area is it, where in the bay area?
>
>I find I must specify the California bay area, picky, picky, picky. ;)
>
>> I see by your list, you are not going East of the Miss.
>
>Correct. ;)


There are plenty of jobs on the SF peninsula, but housing is insanely expensive.
Easy Bay has lots of tech companies too, and housing is cheaper.


>
>> Embedded Software Engineers is no longer a term of embedded processor 
>> engineers.
>> 
>> Everyone uses it anymore, so you really need to be specific about _your_ 
>> definition of embedded engineer.
>> 
>> As this is an FPGA newsgroup, do you mean Embedded FPGA engineer ?
>
>No, I just happen to lurk here as the posts are interesting.
>
>> Do you mean assembly language / C language Embedded Engineer ?
>
>Correct, and this group seemed to cover software as well as hardware,
>though a few think not. If anyone would nominate a non-dead more software
>embedded newsgroup I will gladly go take a look.
>
>With the exception of C++ groups, fringe freaks debating broken ideas
>do not excite me.
>
>> PS: Don't Come to Denver, we have too many UN-employed engineers already.
>
>I have scoped out Denver a little and each of the suburb cities seems to
>have a major high tech company. On the downside last time I drove 
>through I found the traffic to be horrible. Denver is so big I would 
>have to pick a sub-city as the commute is to long, same as California.
>
>All those places I listed (except the bay area) would allow me to engage
>my geology hobby on the weekends.

We have rocks here too.




-- 

John Larkin                  Highland Technology Inc
www.highlandtechnology.com   jlarkin at highlandtechnology dot com   

Precision electronic instrumentation
Picosecond-resolution Digital Delay and Pulse generators
Custom timing and laser controllers
Photonics and fiberoptic TTL data links
VME  analog, thermocouple, LVDT, synchro, tachometer
Multichannel arbitrary waveform generators

Article: 155004
Subject: What a Xilinx fpga could do in 1988
From: Bill Sloman <bill.sloman@gmail.com>
Date: Tue, 26 Mar 2013 13:17:42 -0700 (PDT)
Links: << >>  << T >>  << A >>
I'm writing up a project that ran from 1988 to 1991. It involved
building an ECL-based digital processor that ran at 25MHz, working
into a couple of ECL SRAMs. Debugging the hardware was tedious and
took more than a year.

There's evidence that there was an alternative proposal which would
have dropped the processing speed to 10MHz, and I suspect that
engineers involved might have planned on doing the digital processing
in a a re-programmable Xilinx fpga.

What I'd like to find out is whether the Xilinx parts that were
availlable back then could have supported what we needed to do.

We were sampling a sequential process, at 256 discrete points.

We had two 16-bit lists of data in SRAM. One list represented the data
we were collecting (Data), and the other was a 15-bit representation
of what we thought we were seeing (Results).

Every 100nsec we would have got a 7-bit A/D converter output and would
have had to add it to a 16-bit number pulled out of the Data S/RAM,
and write the sum back into the same SRAM address (though the first
version of what we built wrote the data into a second Data SRAM amd
ping-ponged between two Data SRAMs on successive passes).

After we'd done enough Accumulation passes through our 256 element
list of Data points, we'd make an Update pass, and up-shift the
accumulated data by eight or fewer bits (depending on how many
Accumulation passes we'd done - never more than 2^8 (256), and
subtract the 15-bit bit Result representation of what we thought we
had from the shifted accumulated data.

We could then down shift the difference by anything up to 15 bits
( depending on how reliable the Results we thought we had were) and
add it back onto to our 15-bit Result representation of what we
thought we had, to improve the reliability of that particular number,
and write this improved number back into the Result store

Obviously, we had to back-fill the most significant bits of the down-
shifted number with the sign bit of the difference (and of course that
got overlooked in the first version of the ECL-based system).

In practice, 15-bits was an over-kill, and the final version of the
ECL-based system settled for a maximum down-shift of 7 bits - with the
arithemtic for longer accumulations being handed off to the system
processor, which wasn't as fast, but didn't have to to the job often
enough for it to matter..

The Updata pass could run quite a bit slower than 10MHz, but we would
have liked to use a barrel-shifter, which could set up any shift from
+8 to -7 (or -15) as we did in the ECL system, rather than a shift-
register shifter.

Would this have been practical in Xilinx fpga's back in 1988?

--
Bill Sloman, Sydney


Article: 155005
Subject: Re: What a Xilinx fpga could do in 1988
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Tue, 26 Mar 2013 22:26:01 +0000 (UTC)
Links: << >>  << T >>  << A >>
Bill Sloman <bill.sloman@gmail.com> wrote:

> I'm writing up a project that ran from 1988 to 1991. It involved
> building an ECL-based digital processor that ran at 25MHz, working
> into a couple of ECL SRAMs. Debugging the hardware was tedious and
> took more than a year.
 
> There's evidence that there was an alternative proposal which would
> have dropped the processing speed to 10MHz, and I suspect that
> engineers involved might have planned on doing the digital processing
> in a a re-programmable Xilinx fpga.
 
> What I'd like to find out is whether the Xilinx parts that were
> availlable back then could have supported what we needed to do.

I did some Xilinx designs (but never got to implement them) in
about 1995, I believe at 33MHz on an XC4013. It took some careful
pipelining and such to get that speed.

I beleive the XC4000 series goes back to 1991, though that might be
close to the beginning.

The problem with your question is that you didn't specify an
implementation. The FPGA might, for example, have two logic blocks
that run at 5MHz used alternately, for a 10MHz throughput.
(Or three, or four, or more, if needed.)

TTL easily runs at 10MHz, so some external TTL latches could get
the clock rate down, while keeping the 10MHz, or even 25MHz 
throughput.

-- glen


Article: 155006
Subject: Re: Xilinx tools for XC3020???
From: rickman <gnuarm@gmail.com>
Date: Tue, 26 Mar 2013 20:00:51 -0400
Links: << >>  << T >>  << A >>
On 3/25/2013 8:04 PM, Mike Butts wrote:
> I've got a 20-year-old Xilinx XC3020 development board. I think it would be fun to fire it up and bring it to the 20th anniversary FCCM in Seattle next month. (http://fccm.org/2013/)
>
> I don't see XC3000-series supported on even the oldest archived ISE at xilinx.com. Anyone know where I can find some tools for this old chip? It has 64 CLBs and 256 flip-flops! Maybe one of you folks at Xilinx? Thanks!
>
>    --Mike

I have a set of tools from the 1999 time frame that might just do what 
you want.  It has a parallel port key which is still around somewhere. 
I don't have the nerve to try to get this working on anything remotely 
current.  If you have a copy of Windows 98 running on a ISA bus machine 
somewhere it might do it for you.  But that will be the problem, finding 
a machine that will still run the software.  I guess it is possible it 
might run on a current machine... if you have the floppy drives for 
it... lol  I'm not sure if the media is floppy or CD, at that time it 
was a toss, even for larger distributions I think.

If you want it, I might just box it up and send it all to you.  I am a 
junk collector, but I know I'll never fire that up again.

-- 

Rick

Article: 155007
Subject: Re: Xilinx tools for XC3020???
From: Mike Butts <mbuttspdx@gmail.com>
Date: Tue, 26 Mar 2013 22:24:52 -0700 (PDT)
Links: << >>  << T >>  << A >>
Thanks Rick and Gabor,

You reminded me to dig through my dusty old CD-ROMs. I found two sets of Xilinx Student Edition (predecessor to WebPack), one from 1999, the other from 2001, both with notes that might include the codes required. I'll see if they work on my PC.

As for schematic or synthesis tools, no need - back then I got quite comfortable writing xnf files from scratch and running map/par/bitgen from the command line.

Of course all anybody could fit in 64 LUTs is a demo toy like random dice or the infamous traffic light controller. It's a reminder of how FPGAs have grown with Moore's Law in 20 years, from 1K gates to 10M gates!

Article: 155008
Subject: Re: Xilinx tools for XC3020???
From: Philip Herzog <ph1a@arcor.de>
Date: Wed, 27 Mar 2013 10:58:26 +0100
Links: << >>  << T >>  << A >>
On 27.03.2013 01:00, rickman wrote:
> I don't have the nerve to try to get this working on anything remotely 
> current.  

Virtual machines do a very good job here. I have ISE 4.1 running on a
Windows XP virtual machine here to support Spartan XL.

It's a good thing nobody asked for any changes on the XC5206 Design that
we still sell. It's not even VHDL or Verilog. Rumor has it someone here
has a Windows 3.1 machine (Hardware, not virtual) with the tools to
support it somewhere. But I'm not going wake sleeping dogs here.

-   Philip
-- 
Rule 15: Know your way out (Zombieland)




Article: 155009
Subject: Re: What a Xilinx fpga could do in 1988
From: Bill Sloman <bill.sloman@gmail.com>
Date: Wed, 27 Mar 2013 10:02:43 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Mar 27, 11:26=A0am, glen herrmannsfeldt <g...@ugcs.caltech.edu>
wrote:
> Bill Sloman <bill.slo...@gmail.com> wrote:
> > I'm writing up a project that ran from 1988 to 1991. It involved
> > building an ECL-based digital processor that ran at 25MHz, working
> > into a couple of ECL SRAMs. Debugging the hardware was tedious and
> > took more than a year.
> > There's evidence that there was an alternative proposal which would
> > have dropped the processing speed to 10MHz, and I suspect that
> > engineers involved might have planned on doing the digital processing
> > in a a re-programmable Xilinx fpga.
> > What I'd like to find out is whether the Xilinx parts that were
> > availlable back then could have supported what we needed to do.
>
> I did some Xilinx designs (but never got to implement them) in
> about 1995, I believe at 33MHz on an XC4013. It took some careful
> pipelining and such to get that speed.
>
> I beleive the XC4000 series goes back to 1991, though that might be
> close to the beginning.

Not that close. Cambridge Instruments used Xilinx parts to rework the
digital logic in their Quantimet Image Analysis gear around 1987. It
saved quite a lot of board space and money - the TTL msi logic that
had been doing the job before (since at least 1975 in various
manifestations) took up a lot of printed board space.

> The problem with your question is that you didn't specify an
> implementation. The FPGA might, for example, have two logic blocks
> that run at 5MHz used alternately, for a 10MHz throughput.
> (Or three, or four, or more, if needed.)

If that could have been made to work, I've got my answer.

> TTL easily runs at 10MHz, so some external TTL latches could get
> the clock rate down, while keeping the 10MHz, or even 25MHz
> throughput.

The data would have been held in TTL-compatible SRAM. We only needed a
couple of banks of  256 16-bit words. Extra latches for intermediate
results wouldn't have been a problem, but I'd expect that the
buffering on the periphery of the fpga to drive the relatively high
capacitance of external tracks would have added significant extra
propagation delay.

The ECL-based variant that we did build at the time - slowly, because
it was a pig to debug - used half a dozen 1024 x 8 ECL SRAMs because
they were the smallest parts I could buy that ran fast enough. Ten
years later, the smallest fast SRAMs were closer to 1M x8.

--
Bill Sloman, Sydney


Article: 155010
Subject: Re: What a Xilinx fpga could do in 1988
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Wed, 27 Mar 2013 17:29:45 +0000 (UTC)
Links: << >>  << T >>  << A >>
Bill Sloman <bill.sloman@gmail.com> wrote:

(snip, I wrote)
>> I beleive the XC4000 series goes back to 1991, though that might be
>> close to the beginning.
 
> Not that close. Cambridge Instruments used Xilinx parts to rework the
> digital logic in their Quantimet Image Analysis gear around 1987. It
> saved quite a lot of board space and money - the TTL msi logic that
> had been doing the job before (since at least 1975 in various
> manifestations) took up a lot of printed board space.

I meant that, as well as I remember, the XC4000 series dates to
about 1991. Earlier series were earlier than that.

-- glen

Article: 155011
Subject: Re: What a Xilinx fpga could do in 1988
From: rickman <gnuarm@gmail.com>
Date: Wed, 27 Mar 2013 14:11:38 -0400
Links: << >>  << T >>  << A >>
On 3/27/2013 1:29 PM, glen herrmannsfeldt wrote:
> Bill Sloman<bill.sloman@gmail.com>  wrote:
>
> (snip, I wrote)
>>> I beleive the XC4000 series goes back to 1991, though that might be
>>> close to the beginning.
>
>> Not that close. Cambridge Instruments used Xilinx parts to rework the
>> digital logic in their Quantimet Image Analysis gear around 1987. It
>> saved quite a lot of board space and money - the TTL msi logic that
>> had been doing the job before (since at least 1975 in various
>> manifestations) took up a lot of printed board space.
>
> I meant that, as well as I remember, the XC4000 series dates to
> about 1991. Earlier series were earlier than that.

I think the 3000 series was the one that took off and made Xilinx a 
household name... well, if you lived in a household of rabid digital 
design guys.

Before that was the 2000 series which was a rather limited device I 
believe.  It was more than a CPLD, but not by tons.  I think it had 64 
or maybe 100 LUT/FFs.  I never knew the details of the architecture 
because it was already obsolete by the time I got a chance to work with 
FPGAs.  My understanding is that it was an ok device for a first 
product, but was not found in so many designs.  This is possibly because 
it was replaced fairly quickly by the 3000 or because the tools were so 
arcane that not just anyone could work with it.

I think my first FPGA design was with the then new 4000 series part 
which had some significant improvements over the 3000 series.  Just like 
now, there aren't too many product starts with the older series.  So 
while the 3000 series had lots of design wins and continued to be made 
for some 15 years I believe, the 4000 started the trend of each new 
generation being the marketing focus in order to continue getting design 
wins.

-- 

Rick

Article: 155012
Subject: Re: What a Xilinx fpga could do in 1988
From: Richard Damon <Richard@Damon-Family.org>
Date: Wed, 27 Mar 2013 22:02:48 -0400
Links: << >>  << T >>  << A >>
On 3/26/13 4:17 PM, Bill Sloman wrote:
> I'm writing up a project that ran from 1988 to 1991. It involved
> building an ECL-based digital processor that ran at 25MHz, working
> into a couple of ECL SRAMs. Debugging the hardware was tedious and
> took more than a year.
...

> 
> Would this have been practical in Xilinx fpga's back in 1988?
> 
> --
> Bill Sloman, Sydney
> 

While I wasn't using "Brand X" at the time, I was using "Brand A" and we
were delivering a product doing comparable level (actually somewhat more
so) level of processing at the time. I think we were using ping-ponging
of memory accesses (read memory A, write the results a few clocks later
into memory B, on next pass, read memory B, write results into memory
A). I am fairly sure that the Xilinx parts available at the time were
comparable.

Article: 155013
Subject: Re: What a Xilinx fpga could do in 1988
From: Mike Butts <mbuttspdx@gmail.com>
Date: Wed, 27 Mar 2013 21:05:59 -0700 (PDT)
Links: << >>  << T >>  << A >>
I've got a 1989 Xilinx "Programmable Gate Array Data Book" in my hand. Befo=
re the Internet, kids, you had to get a data book from the chip maker's loc=
al rep. If you were a hardware engineer, your office had a prominent shelf =
(or shelves) with all your data books lined up. Depending on how many, how =
current and how cool your data books were, that's how cool you were.

It has the XC3000 series, including the (infamous) XC3090, with 320 CLBs, a=
llegedly good for 9000 gates(!). That part really got people's attention. T=
he Splash I reconfigurable supercomputer was an array of XC3090s. Even the =
XC3020 with 64 CLBs and 64 IOBs was pretty useful. The databook has an app =
note for a 100 MHz 8-digit frequency counter in 51 CLBs: timebase (8 CLBs),=
 BCD counter (16), five shift registers (20), 2 control CLBs and 1 CLB to s=
uppress leading zeros.

Looking at what you were doing, Bill, I'm confident a 3000-series FPGA with=
 an SRAM attached could have done the job. Maybe even an XC3020.

  --Mike

On Tuesday, March 26, 2013 1:17:42 PM UTC-7, Bill Sloman wrote:
> I'm writing up a project that ran from 1988 to 1991. It involved
>=20
> building an ECL-based digital processor that ran at 25MHz, working
>=20
> into a couple of ECL SRAMs. Debugging the hardware was tedious and
>=20
> took more than a year.
>=20
>=20
>=20
> There's evidence that there was an alternative proposal which would
>=20
> have dropped the processing speed to 10MHz, and I suspect that
>=20
> engineers involved might have planned on doing the digital processing
>=20
> in a a re-programmable Xilinx fpga.
>=20
>=20
>=20
> What I'd like to find out is whether the Xilinx parts that were
>=20
> availlable back then could have supported what we needed to do.
>=20
>=20
>=20
> We were sampling a sequential process, at 256 discrete points.
>=20
>=20
>=20
> We had two 16-bit lists of data in SRAM. One list represented the data
>=20
> we were collecting (Data), and the other was a 15-bit representation
>=20
> of what we thought we were seeing (Results).
>=20
>=20
>=20
> Every 100nsec we would have got a 7-bit A/D converter output and would
>=20
> have had to add it to a 16-bit number pulled out of the Data S/RAM,
>=20
> and write the sum back into the same SRAM address (though the first
>=20
> version of what we built wrote the data into a second Data SRAM amd
>=20
> ping-ponged between two Data SRAMs on successive passes).
>=20
>=20
>=20
> After we'd done enough Accumulation passes through our 256 element
>=20
> list of Data points, we'd make an Update pass, and up-shift the
>=20
> accumulated data by eight or fewer bits (depending on how many
>=20
> Accumulation passes we'd done - never more than 2^8 (256), and
>=20
> subtract the 15-bit bit Result representation of what we thought we
>=20
> had from the shifted accumulated data.
>=20
>=20
>=20
> We could then down shift the difference by anything up to 15 bits
>=20
> ( depending on how reliable the Results we thought we had were) and
>=20
> add it back onto to our 15-bit Result representation of what we
>=20
> thought we had, to improve the reliability of that particular number,
>=20
> and write this improved number back into the Result store
>=20
>=20
>=20
> Obviously, we had to back-fill the most significant bits of the down-
>=20
> shifted number with the sign bit of the difference (and of course that
>=20
> got overlooked in the first version of the ECL-based system).
>=20
>=20
>=20
> In practice, 15-bits was an over-kill, and the final version of the
>=20
> ECL-based system settled for a maximum down-shift of 7 bits - with the
>=20
> arithemtic for longer accumulations being handed off to the system
>=20
> processor, which wasn't as fast, but didn't have to to the job often
>=20
> enough for it to matter..
>=20
>=20
>=20
> The Updata pass could run quite a bit slower than 10MHz, but we would
>=20
> have liked to use a barrel-shifter, which could set up any shift from
>=20
> +8 to -7 (or -15) as we did in the ECL system, rather than a shift-
>=20
> register shifter.
>=20
>=20
>=20
> Would this have been practical in Xilinx fpga's back in 1988?
>=20
>=20
>=20
> --
>=20
> Bill Sloman, Sydney


Article: 155014
Subject: Re: What a Xilinx fpga could do in 1988
From: Bill Sloman <bill.sloman@gmail.com>
Date: Thu, 28 Mar 2013 03:33:24 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Mar 28, 5:05=A0pm, Mike Butts <mbutts...@gmail.com> wrote:
> I've got a 1989 Xilinx "Programmable Gate Array Data Book" in my hand. Be=
fore the Internet, kids, you had to get a data book >from the chip maker's =
local rep. If you were a hardware engineer, your office had a prominent she=
lf (or shelves) with all your >data books lined up. Depending on how many, =
how current and how cool your data books were, that's how cool you were.

Been there, done that. I've got a 1986 National Linear Applications
databook on my bookshelf at the moment - though the bulk of my
databooks  are still in a box in my wife's office, on the last leg of
the journey from Cambridge via the Netherlands to Sydney. Sadly the
1989 Xilinx databook isn't one of them

> It has the XC3000 series, including the (infamous) XC3090, with 320 CLBs,=
 allegedly good for 9000 gates(!). That part really >got people's attention=
. The Splash I reconfigurable supercomputer was an array of XC3090s. Even t=
he XC3020 with 64 CLBs >and 64 IOBs was pretty useful. The databook has an =
app note for a 100 MHz 8-digit frequency counter in 51 CLBs: timebase >(8 C=
LBs), BCD counter (16), five shift registers (20), 2 control CLBs and 1 CLB=
 to suppress leading zeros.

> Looking at what you were doing, Bill, I'm confident a 3000-series FPGA wi=
th an SRAM attached could have done the job. Maybe even an XC3020.

I've found a data sheet for the XC3000 series on the web dated 1988,
and thought that I'd downloaded it. What I can find now are all dated
1998, which isn't helpful.

The reference to the Splash 1 reconfigurable super-computer is more
helpful since it gives me a 1991 reference

Custom Integrated Circuits Conference, 1991., Proceedings of the IEEE
1991
Date of Conference: 12-15 May 1991
Author(s): Waugh, Thomas C. Xilinx Inc., San Jose, CA, USA

which suggests that the parts (as opposed to the datasheet) had been
available back when we glueing chunks of ECL together. Back in 1989 I
had a datasheet for a really fast Sony ECL counter, but my
unsuccessful attempts to buy a couple suggested that it was nothing
more than vapourware. Not the first time I'd run into market-research
by data-sheet publication, nor the last. Even the AMD Taxichip - which
we did use - wasn't quite the device that the early issue datasheets
described, but once AMD had worked out how to make it right, what they
did make was fine (if not quite what they'd initially had in mind).

--
Bill Sloman, Sydney




Article: 155015
Subject: Re: What a Xilinx fpga could do in 1988
From: rickman <gnuarm@gmail.com>
Date: Thu, 28 Mar 2013 15:48:51 -0400
Links: << >>  << T >>  << A >>
On 3/28/2013 6:33 AM, Bill Sloman wrote:
>
> The reference to the Splash 1 reconfigurable super-computer is more
> helpful since it gives me a 1991 reference
>
> Custom Integrated Circuits Conference, 1991., Proceedings of the IEEE
> 1991
> Date of Conference: 12-15 May 1991
> Author(s): Waugh, Thomas C. Xilinx Inc., San Jose, CA, USA
>
> which suggests that the parts (as opposed to the datasheet) had been
> available back when we glueing chunks of ECL together. Back in 1989 I
> had a datasheet for a really fast Sony ECL counter, but my
> unsuccessful attempts to buy a couple suggested that it was nothing
> more than vapourware. Not the first time I'd run into market-research
> by data-sheet publication, nor the last. Even the AMD Taxichip - which
> we did use - wasn't quite the device that the early issue datasheets
> described, but once AMD had worked out how to make it right, what they
> did make was fine (if not quite what they'd initially had in mind).

Can you tell us why this is being researched so many years after the 
project?

-- 

Rick

Article: 155016
Subject: Re: What a Xilinx fpga could do in 1988
From: hal-usenet@ip-64-139-1-69.sjc.megapath.net (Hal Murray)
Date: Thu, 28 Mar 2013 14:53:29 -0500
Links: << >>  << T >>  << A >>
In article <df1103e8-6e9c-42ae-abef-942ea83ba23d@ve4g2000pbc.googlegroups.com>,
 Bill Sloman <bill.sloman@gmail.com> writes:

>I've found a data sheet for the XC3000 series on the web dated 1988,
>and thought that I'd downloaded it. What I can find now are all dated
>1998, which isn't helpful.

I've got a board in front of me with a 3020 on it.  Date code is 8845.
We had a 3090 is another part of that project.

-- 
These are my opinions.  I hate spam.


Article: 155017
Subject: Re: What a Xilinx fpga could do in 1988
From: bill.sloman@gmail.com
Date: Thu, 28 Mar 2013 19:21:41 -0700 (PDT)
Links: << >>  << T >>  << A >>
On Friday, 29 March 2013 06:48:51 UTC+11, rickman  wrote:
> On 3/28/2013 6:33 AM, Bill Sloman wrote:
> >=20
> > The reference to the Splash 1 reconfigurable super-computer is more=20
> > helpful since it gives me a 1991 reference=20
> >=20
> > Custom Integrated Circuits Conference, 1991., Proceedings of the IEEE=
=20
> > 1991=20
> > Date of Conference: 12-15 May 1991=20
> > Author(s): Waugh, Thomas C. Xilinx Inc., San Jose, CA, USA
> >
> > which suggests that the parts (as opposed to the datasheet) had been
> > available back when we glueing chunks of ECL together. Back in 1989 I=
=20
> > had a datasheet for a really fast Sony ECL counter, but my
> > unsuccessful attempts to buy a couple suggested that it was nothing=20
> > more than vapourware. Not the first time I'd run into market-research
> > by data-sheet publication, nor the last. Even the AMD Taxichip - which=
=20
> > we did use - wasn't quite the device that the early issue datasheets=20
> > described, but once AMD had worked out how to make it right, what they=
=20
> > did make was fine (if not quite what they'd initially had in mind).
> =20
> Can you tell us why this is being researched so many years after the =20
> project?

Sure. When the project was cancelled - at the end of 1991 - and I got made =
redundant with most of the rest of the project team, I got the computer man=
ager to print out my weekly reports for the previous four years which cover=
ed the entire history of the project - or at least the part that I was dire=
ctly involved in - and I took them home with me.

This was - of course - totally illegal, so I didn't do anything with them a=
t the time.

After I ran out of employers - in June 2003 - I started scanning and OCRing=
 my way though this pile of paper. I'd got a fair way through it by 2009 an=
d swapped a few e-mails with interested parties back then, then got distrac=
ted by getting a new aortic valve.

I finally finished the job a few months ago and started writing a sort of h=
istory of the project, which involved making sense of stuff that had been g=
oing on before I got involved. It has become to seem clear that there had b=
een a fairly well-worked out plan for a less ambitious machine, which the g=
uy who would have been selling the machine, and - at that stage - getting m=
ost of the profits from the sales - didn't like very much. There's no actua=
l evidence that the less ambitious plan involved Xilinx fpga's, but it's pl=
ausible.

As a speculation, it adds spice to an otherwise bland and uninteresting tal=
e.

--=20
Bill Sloman, Sydney


Article: 155018
Subject: xcv800 free design tools
From: xcv800 <loveimmigr@yahoo.com>
Date: Fri, 29 Mar 2013 07:58:10 -0700 (PDT)
Links: << >>  << T >>  << A >>
 HI

I have an old Xilinx board called "XCV800" .So my question is:
What is the xilinx software version that supports the programming and the bit stream implementation of this kind-XCV800- of FPGA board?

did there were totally free xilinx software version supports XCV800?

Thanks a lot for helping me

Article: 155019
Subject: Re: xcv800 free design tools
From: GaborSzakacs <gabor@alacron.com>
Date: Fri, 29 Mar 2013 15:46:48 -0400
Links: << >>  << T >>  << A >>
xcv800 wrote:
>  HI
> 
> I have an old Xilinx board called "XCV800" .So my question is:
> What is the xilinx software version that supports the programming and the bit stream implementation of this kind-XCV800- of FPGA board?
> 
> did there were totally free xilinx software version supports XCV800?
> 
> Thanks a lot for helping me

This is the original Virtex series, supported up to revision 10.1.03i
of ISE.  However being a larger part it is not covered by the free
WebPack.  You should contact your Xilinx sales rep to see if they
can "lend" you the software, because as I recall xilinx was pretty
loose about licenses on the older versions (10.1 and earlier) that
only require a "key" and not a license file.

-- Gabor

Article: 155020
Subject: MISC - Stack Based vs. Register Based
From: rickman <gnuarm@gmail.com>
Date: Fri, 29 Mar 2013 17:00:50 -0400
Links: << >>  << T >>  << A >>
I have been working with stack based MISC designs in FPGAs for some 
years.  All along I have been comparing my work to the work of others. 
These others were the conventional RISC type processors supplied by the 
FPGA vendors as well as the many processor designs done by individuals 
or groups as open source.

So far my CPUs have always ranked reasonably well in terms of speed, but 
more importantly to me, very well in terms of size and code density.  My 
efforts have shown it hard to improve on code density by a significant 
degree while simultaneously minimizing the resources used by the design. 
  Careful selection of the instruction set can both improve code density 
and minimize logic used if measured together, but there is always a 
tradeoff.  One can always be improved at the expense of the other.

The last couple of days I was looking at some code I plan to use and 
realized that it could be a lot more efficient if I could find a way to 
use more parallelism inside the CPU and use fewer instructions.  So I 
started looking at defining separate opcodes for the two primary 
function units in the design, the data stack and the return stack.  Each 
has its own ALU.  The data stack has a full complement of capabilities 
while the return stack can only add, subtract and compare.  The return 
stack is actually intended to be an "address" processing unit.

While trying to figure out how to maximize the parallel capabilities of 
these units, I realized that many operations were just stack 
manipulations.  Then I read the thread about the relative "cost" of 
stack ops vs memory accesses and I realized these were what I needed to 
optimize.  I needed to find a way to not use an instruction and a clock 
cycle for moving data around on the stack.

In the thread on stack ops it was pointed out repeatedly that very often 
the stack operands would be optimized to register operands, meaning they 
wouldn't need to do the stack ops at all really.  So I took a look at a 
register based MISC design.  Guess what, I don't see the disadvantage! 
I have pushed this around for a couple of days and although I haven't 
done a detailed design, I think I have looked at it enough to realize 
that I can design a register oriented MISC CPU that will run as fast, if 
not faster than my stack based design and it will use fewer 
instructions.  I still need to add some features like support for a 
stack in memory, in other words, pre-increment/post-decrement (or the 
other way around...), but I don't see where this is a bad design.  It 
may end up using *less* logic as well.  My stack design provides access 
to the stack pointers which require logic for both the pointers and 
muxing them into the data stack for reading.

I guess looking at other peoples designs (such as Chuck's) has changed 
my perspective over the years so that I am willing and able to do 
optimizations in ways I would not have wanted to do in the past.  But I 
am a bit surprised that there has been so much emphasis on stack 
oriented MISC machines which it may well be that register based MISC 
designs are also very efficient, at least if you aren't building them to 
service a C compiler or trying to match some ideal RISC model.

-- 

Rick

Article: 155021
Subject: Re: MISC - Stack Based vs. Register Based
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Fri, 29 Mar 2013 21:43:36 +0000 (UTC)
Links: << >>  << T >>  << A >>
In comp.arch.fpga rickman <gnuarm@gmail.com> wrote:
> I have been working with stack based MISC designs in FPGAs for some 
> years.  All along I have been comparing my work to the work of others. 
> These others were the conventional RISC type processors supplied by the 
> FPGA vendors as well as the many processor designs done by individuals 
> or groups as open source.

You might also look at some VLIW designs. Not that the designs
themselves will be useful, but maybe some of the ideas that were used
would help. 

Well, much of the idea of RISC is that code density isn't very
important, and that many of the more complicated instructions made
assembly language programming easier, but compilers didn't use them.
 
> So far my CPUs have always ranked reasonably well in terms of speed, but 
> more importantly to me, very well in terms of size and code density.  

Do you mean the size of CPU (lines of verilog) or size of the programs
that run on it?

> My 
> efforts have shown it hard to improve on code density by a significant 
> degree while simultaneously minimizing the resources used by the design. 
>  Careful selection of the instruction set can both improve code density 
> and minimize logic used if measured together, but there is always a 
> tradeoff.  One can always be improved at the expense of the other.

Seems to me that much of the design of VAX was to improve code 
density when main memory was still a very significant part of the
cost of a machine. The large number of addressing modes allowed for
efficient use of bits. It also greatly complicated making an efficient
pipelined processor. With little overlap and a microprogrammed CPU,
it is easy to go sequentially through the instruction bytes and process
them in order. Most of the complication is in the microcode itself.

But every level of logic adds delay. Using a wide bus to fast memory
is more efficient that a complicated decoder. But sometimes RISC went
too far. In early RISC, there was the idea of one cycle per instruction.
They couldn't do that for multiply, so they added multiply-step, an
instruction that you execute many times for each multiply operation.
(And maybe no divide at all.) 

For VLIW, a very wide instruction word allows for specifying many 
different operations at the same time. It relies on complicated
compilers to optimally pack the multiple operations into the
instruction stream.

> The last couple of days I was looking at some code I plan to use and 
> realized that it could be a lot more efficient if I could find a way to 
> use more parallelism inside the CPU and use fewer instructions.  So I 
> started looking at defining separate opcodes for the two primary 
> function units in the design, the data stack and the return stack.  Each 
> has its own ALU.  The data stack has a full complement of capabilities 
> while the return stack can only add, subtract and compare.  The return 
> stack is actually intended to be an "address" processing unit.

This kind of thought is what lead to the branch delay slot on many
RISC processors. There is work to be done for branches, especially
for branch to or return from subroutine. Letting the processor know
early allows for more overlap. So, on many early (maybe not now) RISC
machines the instruction after a branch instruction is executed before
the branch is taken. (It might be a NOOP, though.) Again, reduced code
density (if it is NOOP) in exchange for a faster instruction cycle.)
 
> While trying to figure out how to maximize the parallel capabilities of 
> these units, I realized that many operations were just stack 
> manipulations.  Then I read the thread about the relative "cost" of 
> stack ops vs memory accesses and I realized these were what I needed to 
> optimize.  I needed to find a way to not use an instruction and a clock 
> cycle for moving data around on the stack.

Well, consider the x87 stack instructions. Some operations exist in
forms that do and don't pop something off the stack. A small increase
in the number of opcodes used allows for avoiding many POP instructions.
 
> In the thread on stack ops it was pointed out repeatedly that very often 
> the stack operands would be optimized to register operands, meaning they 
> wouldn't need to do the stack ops at all really.  So I took a look at a 
> register based MISC design.  Guess what, I don't see the disadvantage! 

Well, actually the 8087 was designed to be either a register or stack
machine, as many instructions can index into the stack. If you keep
track of the current stack depth, you can find data in the stack like
in registers. Well, it was supposed to be able to do that.

> I have pushed this around for a couple of days and although I haven't 
> done a detailed design, I think I have looked at it enough to realize 
> that I can design a register oriented MISC CPU that will run as fast, if 
> not faster than my stack based design and it will use fewer 
> instructions.  I still need to add some features like support for a 
> stack in memory, in other words, pre-increment/post-decrement (or the 
> other way around...), but I don't see where this is a bad design.  It 
> may end up using *less* logic as well.  My stack design provides access 
> to the stack pointers which require logic for both the pointers and 
> muxing them into the data stack for reading.

Well, the stack design has the advantage that you can use instruction
bits either for a memory address or for the operation, allowing for much
smaller instructions. But that only works as long as everything is in
the right place on the stack.
 
> I guess looking at other peoples designs (such as Chuck's) has changed 
> my perspective over the years so that I am willing and able to do 
> optimizations in ways I would not have wanted to do in the past.  But I 
> am a bit surprised that there has been so much emphasis on stack 
> oriented MISC machines which it may well be that register based MISC 
> designs are also very efficient, at least if you aren't building them to 
> service a C compiler or trying to match some ideal RISC model.

Seems to me that there hasn't been much done in stack machines since
the B5500, the first computer I ever used when I was about nine.

-- glen


Article: 155022
Subject: Re: MISC - Stack Based vs. Register Based
From: rickman <gnuarm@gmail.com>
Date: Fri, 29 Mar 2013 20:28:45 -0400
Links: << >>  << T >>  << A >>
On 3/29/2013 5:43 PM, glen herrmannsfeldt wrote:
> In comp.arch.fpga rickman<gnuarm@gmail.com>  wrote:
>> I have been working with stack based MISC designs in FPGAs for some
>> years.  All along I have been comparing my work to the work of others.
>> These others were the conventional RISC type processors supplied by the
>> FPGA vendors as well as the many processor designs done by individuals
>> or groups as open source.
>
> You might also look at some VLIW designs. Not that the designs
> themselves will be useful, but maybe some of the ideas that were used
> would help.
>
> Well, much of the idea of RISC is that code density isn't very
> important, and that many of the more complicated instructions made
> assembly language programming easier, but compilers didn't use them.

I am somewhat familiar with VLIW.  I am also very familiar with 
microcode which is the extreme VLIW.  I have coded microcode for an I/O 
processor on an attached array processor.  That's like saying I coded a 
DMA controller in a DSP chip, but before DSP chips were around.


>> So far my CPUs have always ranked reasonably well in terms of speed, but
>> more importantly to me, very well in terms of size and code density.
>
> Do you mean the size of CPU (lines of verilog) or size of the programs
> that run on it?

I don't count lines of HDL code.  I can't see that as a useful metric 
for much.  I am referring to code density of the compiled code for the 
target CPU.  All of my work is in FPGAs and memory is limited inside the 
chip... well, at least in the small ones... lol   Some FPGAs now have 
literally Mbits of memory on chip.  Still, nearly all of my work is 
miniature and low power, sometimes *very* low power.  So there may only 
be 6 block memories in the FPGA for example.


>> My
>> efforts have shown it hard to improve on code density by a significant
>> degree while simultaneously minimizing the resources used by the design.
>>   Careful selection of the instruction set can both improve code density
>> and minimize logic used if measured together, but there is always a
>> tradeoff.  One can always be improved at the expense of the other.
>
> Seems to me that much of the design of VAX was to improve code
> density when main memory was still a very significant part of the
> cost of a machine. The large number of addressing modes allowed for
> efficient use of bits. It also greatly complicated making an efficient
> pipelined processor. With little overlap and a microprogrammed CPU,
> it is easy to go sequentially through the instruction bytes and process
> them in order. Most of the complication is in the microcode itself.

One of the big differences is that they were not much limited in the 
size of the machine itself.  They could throw lots of gates at the 
problem and not worry too much other than how it impacted speed.  I am 
again more limited in the small FPGAs I use and I expect I will achieve 
higher clock speeds by keeping the machine simple as well.  Also, as an 
architectural decision, there is no microcode and all instructions are 
one clock cycle.  This helps with simplification and also makes 
interrupt processing very simple.


> But every level of logic adds delay. Using a wide bus to fast memory
> is more efficient that a complicated decoder. But sometimes RISC went
> too far. In early RISC, there was the idea of one cycle per instruction.
> They couldn't do that for multiply, so they added multiply-step, an
> instruction that you execute many times for each multiply operation.
> (And maybe no divide at all.)

I"m not sure what your point is.  What part of this is "too far"?  This 
is exactly the type of design I am doing, but to a greater extent.


> For VLIW, a very wide instruction word allows for specifying many
> different operations at the same time. It relies on complicated
> compilers to optimally pack the multiple operations into the
> instruction stream.

Yes, in theory that is what VLIW is.  This is just one step removed from 
microcode where the only limitation to how parallel operations can be is 
the data/address paths themselves.  The primary application of VLIW I 
have seen is in the TI 6000 series DSP chips.  But in reality this is 
not what I consider VLIW.  This design uses eight CPUs which are mostly 
similar, but not quite identical with two sets of four CPU units sharing 
a register file, IIRC.  In reality each of the eight CPUs gets its own 
32 bit instruction stream.  They all operate in lock step, but you can't 
do eight FIR filters.  I think of the four in a set, two are set up to 
do full math, etc and two are able to generate addresses.  So this is 
really two CPUs with dual MACs and two address generators as it ends up 
being used most of the time.  But then they make it run at clocks of 
over 1 GHz so it is a damn fast DSP and handles most of the cell phone 
calls as part of the base station.

Regardless of whether it is a "good" example of VLIW, it was a marketing 
success.


>> The last couple of days I was looking at some code I plan to use and
>> realized that it could be a lot more efficient if I could find a way to
>> use more parallelism inside the CPU and use fewer instructions.  So I
>> started looking at defining separate opcodes for the two primary
>> function units in the design, the data stack and the return stack.  Each
>> has its own ALU.  The data stack has a full complement of capabilities
>> while the return stack can only add, subtract and compare.  The return
>> stack is actually intended to be an "address" processing unit.
>
> This kind of thought is what lead to the branch delay slot on many
> RISC processors. There is work to be done for branches, especially
> for branch to or return from subroutine. Letting the processor know
> early allows for more overlap. So, on many early (maybe not now) RISC
> machines the instruction after a branch instruction is executed before
> the branch is taken. (It might be a NOOP, though.) Again, reduced code
> density (if it is NOOP) in exchange for a faster instruction cycle.)

That is a result of the heavy pipelining that is being done.  I like to 
say my design is not pipelined, but someone here finally convinced me 
that my design *is* pipelined with the execution in parallel with the 
next instruction fetch, but there is never a need to stall or flush 
because there are no conflicts.

I want a design to be fast, but not at the expense of complexity.  That 
is one way I think like Chuck Moore.  Keep it simple and that gives you 
speed.


>> While trying to figure out how to maximize the parallel capabilities of
>> these units, I realized that many operations were just stack
>> manipulations.  Then I read the thread about the relative "cost" of
>> stack ops vs memory accesses and I realized these were what I needed to
>> optimize.  I needed to find a way to not use an instruction and a clock
>> cycle for moving data around on the stack.
>
> Well, consider the x87 stack instructions. Some operations exist in
> forms that do and don't pop something off the stack. A small increase
> in the number of opcodes used allows for avoiding many POP instructions.

In Forth speak a POP would be a DROP.  That is not often used in Forth 
really, or in my apps.  I just wrote some code for my stack CPU and I 
think there were maybe two DROPs in just over 100 instructions.  I am 
talking about the DUPs, SWAPs, OVERs and such.  The end up being needed 
enough that it makes the register design look good... at least at first 
blush.  I am looking at how to organize a register based instruction set 
without expanding the size of the instructions.  I'm realizing that is 
one issue with registers, you have to specify them.  But I don't need to 
make the machine totally general like the goal for RISC.  That is so 
writing compilers is easier.  I don't need to consider that.


>> In the thread on stack ops it was pointed out repeatedly that very often
>> the stack operands would be optimized to register operands, meaning they
>> wouldn't need to do the stack ops at all really.  So I took a look at a
>> register based MISC design.  Guess what, I don't see the disadvantage!
>
> Well, actually the 8087 was designed to be either a register or stack
> machine, as many instructions can index into the stack. If you keep
> track of the current stack depth, you can find data in the stack like
> in registers. Well, it was supposed to be able to do that.
>
>> I have pushed this around for a couple of days and although I haven't
>> done a detailed design, I think I have looked at it enough to realize
>> that I can design a register oriented MISC CPU that will run as fast, if
>> not faster than my stack based design and it will use fewer
>> instructions.  I still need to add some features like support for a
>> stack in memory, in other words, pre-increment/post-decrement (or the
>> other way around...), but I don't see where this is a bad design.  It
>> may end up using *less* logic as well.  My stack design provides access
>> to the stack pointers which require logic for both the pointers and
>> muxing them into the data stack for reading.
>
> Well, the stack design has the advantage that you can use instruction
> bits either for a memory address or for the operation, allowing for much
> smaller instructions. But that only works as long as everything is in
> the right place on the stack.

Yes, it is a tradeoff between instruction size and the number of ops 
needed to get a job done.  I'm looking at trimming the instruction size 
down to give a workable subset for register operations.


>> I guess looking at other peoples designs (such as Chuck's) has changed
>> my perspective over the years so that I am willing and able to do
>> optimizations in ways I would not have wanted to do in the past.  But I
>> am a bit surprised that there has been so much emphasis on stack
>> oriented MISC machines which it may well be that register based MISC
>> designs are also very efficient, at least if you aren't building them to
>> service a C compiler or trying to match some ideal RISC model.
>
> Seems to me that there hasn't been much done in stack machines since
> the B5500, the first computer I ever used when I was about nine.

I wouldn't say that.  There may not be many commercial designs out 
there, but there have been a few stack based CPU chips (mostly from the 
work of Chuck Moore) and there are a number of stack CPUs internal to 
chips.  Just ask Bernd Paysan.  He has done several I believe.  Mine has 
only been run in FPGAs.

-- 

Rick


Article: 155023
Subject: Re: MISC - Stack Based vs. Register Based
From: Arlet Ottens <usenet+5@c-scape.nl>
Date: Sat, 30 Mar 2013 08:20:42 +0100
Links: << >>  << T >>  << A >>
On 03/29/2013 10:00 PM, rickman wrote:
> I have been working with stack based MISC designs in FPGAs for some
> years.  All along I have been comparing my work to the work of others.
> These others were the conventional RISC type processors supplied by the
> FPGA vendors as well as the many processor designs done by individuals
> or groups as open source.
>
> So far my CPUs have always ranked reasonably well in terms of speed, but
> more importantly to me, very well in terms of size and code density.  My
> efforts have shown it hard to improve on code density by a significant
> degree while simultaneously minimizing the resources used by the design.
>   Careful selection of the instruction set can both improve code density
> and minimize logic used if measured together, but there is always a
> tradeoff.  One can always be improved at the expense of the other.
>

I once made a CPU design for an FPGA that had multiple stacks. There was 
a general purpose stack "A", two index stacks "X" and "Y", and a return 
stack "R". ALU operations worked between A and any other stack, so they 
only required 2 bits in the opcode. There was also a move instruction 
that could move data from a source to a destination stack.

Having access to multiple stacks means you spend less time shuffling 
data on the stack. There's no more need for swap, over, rot and similar 
stack manipulation instructions. The only primitive operations you need 
are push and pop.

For instance, I had a load instruction that could load from memory using 
the address in the X stack, and push the result on the A stack. The cool 
part is that the X stack itself isn't changed by this operation, so the 
same address can be used multiple time. So, you could do a

  LOAD (X)  ; load from (X) and push on A
  1         ; push literal on A
  ADD       ; add top two elements of A
  STORE (X) ; pop A, and store in (X)

to increment a location in memory.

And if you wanted to increment X to access the next memory location, 
you'd do:

  1         ; push literal on A
  ADD X     ; pop X, pop A, add, and push result on A.
  MOVE A, X ; pop A, and push on X

It was an 8 bit architecture with 9 bit instructions (to match the FPGA 
block RAM + parity bit). Having 9 bit instructions allows an 8 bit 
literal push to be encoded in 1 instruction.

Feel free to e-mail if you want more details.




Article: 155024
Subject: Re: MISC - Stack Based vs. Register Based
From: "Rod Pemberton" <do_not_have@notemailnotq.cpm>
Date: Sat, 30 Mar 2013 17:54:57 -0400
Links: << >>  << T >>  << A >>
"rickman" <gnuarm@gmail.com> wrote in message
news:kj4vae$msi$1@dont-email.me...

> I have been working with stack based MISC designs in FPGAs for
> some years.  All along I have been comparing my work to the work
> of others.  These others were the conventional RISC type
> processors supplied by the FPGA vendors as well as the many
> processor designs done by individuals or groups as open source.
>
> So far my CPUs have always ranked reasonably well in terms of
> speed, but more importantly to me, very well in terms of size
> and code density.  My efforts have shown it hard to improve on
> code density by a significant degree while simultaneously
> minimizing the resources used by the design.  Careful selection
> of the instruction set can both improve code density and
> minimize logic used if measured together, but there is always a
> tradeoff.  One can always be improved at the expense of the
> other.
>
> The last couple of days I was looking at some code I plan to use
> and realized that it could be a lot more efficient if I could
> find a way to use more parallelism inside the CPU and use fewer
> instructions. So I started looking at defining separate opcodes
> for the two primary function units in the design, the data stack
> and the return stack.  Each has its own ALU.  The data stack has
> a full complement of capabilities while the return stack can
> only add, subtract and compare.  The return stack is actually
> intended to be an "address" processing unit.
>
> While trying to figure out how to maximize the parallel
> capabilities of these units, I realized that many operations
> were just stack manipulations.  Then I read the thread about the
> relative "cost" of stack ops vs memory accesses and I realized
> these were what I needed to optimize.  I needed to find a way to
> not use an instruction and a clock cycle for moving data around
> on the stack.
>
> In the thread on stack ops it was pointed out repeatedly that
> very often the stack operands would be optimized to register
> operands, meaning they wouldn't need to do the stack ops at all
> really.  So I took a look at a register based MISC design.
> Guess what, I don't see the disadvantage!  I have pushed this
> around for a couple of days and although I haven't done a
> detailed design, I think I have looked at it enough to realize
> that I can design a register oriented MISC CPU that will run as
> fast, if not faster than my stack based design and it will use
> fewer instructions.  I still need to add some features like
> support for a stack in memory, in other words,
> pre-increment/post-decrement (or the other way around...), but I
> don't see where this is a bad design.  It may end up using
> *less* logic as well.  My stack design provides access to the
> stack pointers which require logic for both the pointers and
> muxing them into the data stack for reading.
>
> I guess looking at other peoples designs (such as Chuck's) has
> changed my perspective over the years so that I am willing and
> able to do optimizations in ways I would not have wanted to do
> in the past. But I am a bit surprised that there has been so
> much emphasis on stack oriented MISC machines which it may well
> be that register based MISC designs are also very efficient,
> at least if you aren't building them to service a C compiler or
> trying to match some ideal RISC model.
>

Are those your actual results or did you just reiterate what is on
Wikipedia?  Yes, that's a serious question.  Read the MISC page:
http://en.wikipedia.org/wiki/Minimal_instruction_set_computer

See ... ?!

Code density is a CISC concept.  I don't see how it applies to
your MISC project.  Increasing code density for a MISC processor
means implementing more powerful instructions, i.e., those that do
more work, while minimizing bytes in the instruction opcode
encoding.  Even if you implement CISC-like instructions, you can't
forgo the MISC instructions you already have in order to add the
CISC-like instructions.  So, to do that, you'll need to increase
the size of the instruction set, as well as implement a more
complicated instruction decoder.  I.e., that means the processor
will no longer be MISC, but MISC+minimal CISC hybrid, or pure
CISC...

No offense, but you seem to be "reinventing the wheel" in terms of
microprocessor design.  You're coming to the same conclusions that
were found in the 1980's, e.g., concluding a register based
machine can perform better than a stack based machine, except
you've applied it to MISC in an FPGA package...  How is that a new
conclusion?

Also, please read up on CISC and RISC:
http://en.wikipedia.org/wiki/Reduced_instruction_set_computing
http://en.wikipedia.org/wiki/Complex_instruction_set_computing

You posted to two groups.  Which group was  the " ... thread about
the relative 'cost' of stack ops vs memory accesses ..." posted
on?  comp.lang.forth? comp.arch.fpga?  I can look it up, but I'd
rather not.

Also, you cross-posted to comp.arch.fpga.  While they'll likely be
familiar with FPGAs, most there are not going to be familiar with
the features of stack-based processors or Forth processors that
you discuss indirectly within your post.  They might not be
familiar with ancient CISC concepts such as "code density" either,
or understand why it was important at one point in time.  E.g., I
suspect this Forth related stuff from above won't be widely
understood on c.a.f. without clarification:

"peoples[sic] designs (such as Chuck's)"
- various Forth processors by Charles Moore

"the data stack and the return stack."
- interpreted Forth machine model


Rod Pemberton





Site Home   Archive Home   FAQ Home   How to search the Archive   How to Navigate the Archive   
Compare FPGA features and resources   

Threads starting:
1994JulAugSepOctNovDec1994
1995JanFebMarAprMayJunJulAugSepOctNovDec1995
1996JanFebMarAprMayJunJulAugSepOctNovDec1996
1997JanFebMarAprMayJunJulAugSepOctNovDec1997
1998JanFebMarAprMayJunJulAugSepOctNovDec1998
1999JanFebMarAprMayJunJulAugSepOctNovDec1999
2000JanFebMarAprMayJunJulAugSepOctNovDec2000
2001JanFebMarAprMayJunJulAugSepOctNovDec2001
2002JanFebMarAprMayJunJulAugSepOctNovDec2002
2003JanFebMarAprMayJunJulAugSepOctNovDec2003
2004JanFebMarAprMayJunJulAugSepOctNovDec2004
2005JanFebMarAprMayJunJulAugSepOctNovDec2005
2006JanFebMarAprMayJunJulAugSepOctNovDec2006
2007JanFebMarAprMayJunJulAugSepOctNovDec2007
2008JanFebMarAprMayJunJulAugSepOctNovDec2008
2009JanFebMarAprMayJunJulAugSepOctNovDec2009
2010JanFebMarAprMayJunJulAugSepOctNovDec2010
2011JanFebMarAprMayJunJulAugSepOctNovDec2011
2012JanFebMarAprMayJunJulAugSepOctNovDec2012
2013JanFebMarAprMayJunJulAugSepOctNovDec2013
2014JanFebMarAprMayJunJulAugSepOctNovDec2014
2015JanFebMarAprMayJunJulAugSepOctNovDec2015
2016JanFebMarAprMayJunJulAugSepOctNovDec2016
2017JanFebMarAprMayJunJulAugSepOctNovDec2017
2018JanFebMarAprMayJunJulAugSepOctNovDec2018
2019JanFebMarAprMayJunJulAugSepOctNovDec2019
2020JanFebMarAprMay2020

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search