Messages from 112250

Article: 112250
Subject: Re: memory init in Altera bitfiles, (like data2mem) is it possible?
From: "Antti" <Antti.Lukats@xilant.com>
Date: 18 Nov 2006 10:53:28 -0800
Links: << >> << T >> << A >>

Dennis Ruffer schrieb:

> On 2006-11-18 07:11:36 -0700, "Antti" <Antti.Lukats@xilant.com> said:
>
> > here is dump of the SOF parser:
> >
> > Version: Quartus II Compiler Version 6.0 Build 202 06/20/2006 Service
> > Pack 1 SJ Full Version
> > Device: EP1C6Q240C6
> > OCP: (V6AF7P00A2;V6AF7PBCEC;V6AF7PBCE1;)
> > Option 18: FF00FFFF
> > Option :19 (?) 16
> > Option :17 (logic) 139900
> > Option :21 (ram) 433 Option 29: 20006E00,CF19B6F1
> > CRC16: FE54
> >
> > Antti
>
> I suspect that there is another layer of parsing involved.  Here's what
> I'm getting from a relatively complex model that we are using:
>
> Quartus II Compiler Version 5.0 Build 171 11/03/2005
>  Service Pack 2 SJ Full Version
> Device: EP2S60F1020C3
> OCP: (V6AF7PBCEC;V6AF7PBCE1;)
> Option 18: FF00FFFF
> OPT:19 16
>  1073124: 00 00 00 00  20 00 20 00 -
> 		  00 00 01 00  FF FF FF FF  .... . .........
>
> OPT:17 1923240
>  1073124: 00 00 00 00  00 00 DC C4 -
> 		  EA 00 01 00  00 00 00 00  ................
>
> OPT:21 6928
>  1073124: 00 00 00 00  00 00 1C D8 -
> 		  00 00 01 00  00 00 00 81  ................
>
> Option 23: 2
> CRC16: 83F8  ok
>
> However, thanks for the start.
>
> Now, to dig into the RAM
>
> DaR

hum - you are already calculating the CRC16?

if you can light on details either here or private I want mind ;)

I keep digging too, of course..

Antti

Article: 112251
Subject: master support for OPB device
From: get2venu@gmail.com
Date: 18 Nov 2006 11:10:17 -0800
Links: << >> << T >> << A >>

Hi,

There seems to be an error in the MASTER_CNTL_STATE_MACHINE in the
<project>/pcores/<core>/hdl/vhdl/user_logic.vhd  "user logic master
support" generated by Xilinx EDK8.2

My IP is a Master/Slave on the OPB Bus. I have defined 4 32-bit slave
registers ( slv0 , slv1 , slv2 , slv3 ) with address 30000000 to
3000000C , and am attempting to read data from memory location 20000000
to 2000000C and write it into the slave registers.

I initialised my Master Model registers with the following values:
Control Register : 80
ip2ip_addr          :  0x30000000
ip2bus_addr       :  0x20000000
length                :  0x0010 = ( attempting to transfer 16 bytes of
data )

When I did the Bus Function Modelling , then only the slv0  register
was being written into , none of the remaining 3 were written.

I checked the state transitions in ModelSim and noted that the "master
model control state machine" was going through the following set of
states:

Idle -> Last_Burst -> Check_Burst_done -> Idle

I feel the error is the following set of lines in (part of the process:
MASTER_CNTL_STATE_MACHINE)

          when LAST_BURST =>
            if ( Bus2IP_MstLastAck = '1' ) then
              mst_cntl_state       <= CHK_BURST_DONE;
              mst_sm_bus_lock      <= mst_cntl_bus_lock;
           ***************************** error starts
*******************************
              mst_xfer_count       <=
mst_xfer_count-((mst_xfer_count/4)*4);
              mst_bus_addr_count   <=
mst_bus_addr_count+(mst_xfer_count/4)*4;
              mst_ip_addr_count    <=
mst_ip_addr_count+(mst_xfer_count/4)*4;
          ***************************** error ends
***************************************

mst_xfer_count is unconditionally set to zero , even though the 16
bytes transfer might not be complete. Because of this when the FSM
enters into the CHK_BURST_DONE state , it has no other option but to go
into the IDLE state and stop any further transfer. I have attached the
FSM code which I refer too..

          when CHK_BURST_DONE =>
            if ( mst_xfer_count = 0 ) then
              -- transfer done
              mst_cntl_state       <= IDLE;
              mst_sm_set_done      <= '1';
              mst_sm_busy          <= '0';
            elsif ( mst_xfer_count <= 4 ) then
              -- need single beat transfer
              mst_cntl_state       <= SINGLE;
              mst_sm_bus_lock      <= mst_cntl_bus_lock;
            elsif ( mst_xfer_count < 32 ) then
              -- need burst transfer less than 32 bytes
              mst_cntl_state       <= LAST_BURST;
              mst_sm_bus_lock      <= mst_cntl_bus_lock;
            else
              -- need burst transfer greater than 32 bytes
              mst_cntl_state       <= BURST_8;
              mst_sm_bus_lock      <= mst_cntl_bus_lock;
            end if;

For my particular option ... i modified the LAST_BURST code as follows:

              mst_xfer_count       <= mst_xfer_count-4;
              mst_bus_addr_count   <= mst_bus_addr_count+4;
              mst_ip_addr_count    <= mst_ip_addr_count+4;

Now the State Machine goes through the following states:

Idle -> Last_Burst -> Check_Burst_done -> Last_Burst ->
Check_Burst_done -> Last_Burst -> Check_Burst_done -> Single -> Idle.

and writes data into all 4 slave registers perfectly.

Since I was working with 32 bit data transfer it works fine for me. But
this might not work elsewhere.

Has any one else faced this problem ?

Thanks 
Venu

Article: 112252
Subject: Re: memory init in Altera bitfiles, (like data2mem) is it possible?
From: already5chosen@yahoo.com
Date: 18 Nov 2006 11:22:51 -0800
Links: << >> << T >> << A >>


>
> this is something that is really interesting for some purposes, but
> currently I really am
> looking for solution to have NO compile, only sof and elf merging...
>
> here is dump of the SOF parser:
>

Antti,
You appear to care too much about how things called.
Does it matter if the thing is called "Smart Recompile" or "No
compile"?
Does it matter if the thing is called "merge" or "assemble"?
Does it matter if the command line is "data2mem" or "quartus_cdb"?
IMHO, the answer to all questions is "No, as well as they do the same
thing".

Now, does they do the same thing?
>From the time perspective, yes. Altera assembler is too slow to my
liking, but it is not really slow. On the modern PC it takes about 1min
for the big Stratix2-130; many times faster for smaller chips. The time
seems to be independent of complexity of design.
>From the perspective of not having Nios2 license on software
development machine I fear the answer is "No". You need the license.
[rant on]
Altera Nios tools are really stupid in that regard. There are too many
dependencies between Nios2 software development tools and Quartus/SOPC
hardware development tools starting from the fact that you can't even
*install* Nios2 tools on the machine without Quartus. Seem Altera
didn't realize yet that hardware and software development more often
than not is done by different persons and the later can't care less
about the tools of the former. Actually, I suspect that their treatment
of Nios2 tools is pretty close to violation of GPL, but I am no lawyer.
[rant off]

Article: 112253
Subject: EDK 8.2 Block RAM error
From: "Antti" <Antti.Lukats@xilant.com>
Date: 18 Nov 2006 11:27:49 -0800
Links: << >> << T >> << A >>

hi,

it seems that either EDK 8.2 or ISE 8.2 has still some issues with the
BRAM inits, if an EDK system has more than 1 BRAM block, then the BMM
file is generated "looking good", makes sense but when it passes
NGCbuild then only the first memory block gets the "PLACED" locations
assigned, the second one stays af if not located, those DATA2MEM would
not be able to initialize the data into second BRAM block.

I wonder if there is some trick or fix or is it required to wait for
9.1 + SPx ?

there might be a solution, namly the OpenFire (microblaze clone)
includes a perl script that processes the .LL file and re-generates the
BMM with proper PLACED constraint _after_ the xilinx tool flow.

Using tools from OpenFire to fix up the BUGS in EDK is how is that word
- ridicilous? but maybe its the only option currently

Antti

Article: 112254
Subject: Re: memory init in Altera bitfiles, (like data2mem) is it possible?
From: "Antti" <Antti.Lukats@xilant.com>
Date: 18 Nov 2006 11:38:45 -0800
Links: << >> << T >> << A >>

already5chosen@yahoo.com schrieb:

> >
> > this is something that is really interesting for some purposes, but
> > currently I really am
> > looking for solution to have NO compile, only sof and elf merging...
> >
> > here is dump of the SOF parser:
> >
>
> Antti,
> You appear to care too much about how things called.
> Does it matter if the thing is called "Smart Recompile" or "No
> compile"?
> Does it matter if the thing is called "merge" or "assemble"?
> Does it matter if the command line is "data2mem" or "quartus_cdb"?
> IMHO, the answer to all questions is "No, as well as they do the same
> thing".
>
> Now, does they do the same thing?
> >From the time perspective, yes. Altera assembler is too slow to my
> liking, but it is not really slow. On the modern PC it takes about 1min
> for the big Stratix2-130; many times faster for smaller chips. The time
> seems to be independent of complexity of design.
> >From the perspective of not having Nios2 license on software
> development machine I fear the answer is "No". You need the license.
> [rant on]
> Altera Nios tools are really stupid in that regard. There are too many
> dependencies between Nios2 software development tools and Quartus/SOPC
> hardware development tools starting from the fact that you can't even
> *install* Nios2 tools on the machine without Quartus. Seem Altera
> didn't realize yet that hardware and software development more often
> than not is done by different persons and the later can't care less
> about the tools of the former. Actually, I suspect that their treatment
> of Nios2 tools is pretty close to violation of GPL, but I am no lawyer.
> [rant off]

;) I dont really care about naming.

but with Xilinx I have huge set of of BIT files (aka SOF) + BMM file
(ram loc info) now I can compile C code with GCC, and run data2mem, it
completes withing seconds and a result is ready to load bit file for
any Xilinx FPGA that includes the software. This flow EXCLUDES all FPGA
tools, it really takes ready made bitfile (aka Altera SOF) and merges
the elf or object into it. During that process the only files required
are the BIT file itself and ram location info file, no FPGA tool
generated files are required.

the path with altera asm or cdb or whatever isnt really the same, or I
am mistaken here?

the process must be simple:

1) compile C code to ELF or binary
2) merge ELF with fpga hardware (BIT or SOF)

it works really nice with xilinx tools, it works also in Lattice flow,
but with Altera it seems to be really impossible. I think altera
has/had it for mercury but for softcore the sof file update function is
missing.

Antti

Article: 112255
Subject: Re: board - T562.jpg
From: John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com>
Date: Sat, 18 Nov 2006 12:24:16 -0800
Links: << >> << T >> << A >>

On Sat, 18 Nov 2006 17:58:50 GMT, PeteS <peter.smith8380@ntlworld.com>
wrote:

>John Larkin wrote:
>> On Sat, 18 Nov 2006 11:30:03 GMT, PeteS <peter.smith8380@ntlworld.com>
>> wrote:
>> 
>>> John Larkin wrote:
>>>> This is the thing I'm working on this month. It's a delay generator,
>>>> with an MC68332 uP on the back side that manages things. One of my
>>>> guys quit, leaving behind about 14 klines of nasty, buggy code, so I
>>>> thought it over for about 18 seconds and tossed it and started over
>>>> from scratch. I'm workin with another guy who is re-doing the nasty,
>>>> buggy FPGA design, ditto. He says bad things about the V8.2/SP3 Xilinx
>>>> WebPack software.
>>>>
>>>> The application program is in flash, soldered down, and we're going to
>>>> include a flash boot-block thing that lets you reflash the app code
>>>> through the serial or ethernet ports, to upgrade the firmware. That's
>>>> sort of mind-boggling, since the flash which holds this boot program
>>>> disappears from the uP bus while it's being erased or programmed.
>>>>
>>>> John
>>>>
>>>>
>>>>
>>> That's an interesting non-ortho arrangement :)
>>>
>>> On Webpack 8.2 SP3, I am afraid he's right. There have been some 
>>> rumblings on comp.arch.fpga about it recently, and I 'upgraded' to it 
>>> for my latest design, and then it would not process 3 previous designs, 
>>> although those have no errors I can see. Xilinx claimed it had 
>>> 'tightened up' certain things, but it caused me some grief because those 
>>> designs are the basis for a number of things where the FPGA code is 
>>> designed to in-system upgradeable should some new feature be requested. 
>>> I eventually re-installed (from an old full download) 7.1 for those 
>>> projects.
>> 
>> Yeah, I wish people would stop breaking things, and stay absolutely
>> backwards-compatible to existing designs. FPGA were supposed to make
>> hardware design easier, and then they sent in an army of programmers
>> to replace hardware problems with software nightmares.
>> 
>> The *service pack* is a 300 megabyte download.
>> 
>>> I've done the reprogrammable flash thing myself and I definitely concur 
>>> it _can_ get a little hairy.
>>>
>>> Wish you the best of luck getting the design fully operational. What are 
>>> the specs?
>> 
>> Here it is. Fortunately, the hardware is in good shape, so all we have
>> to do is pound the firmware and fpga into submission. Soon.
>> 
>> http://www.highlandtechnology.com/DSS/T560DS.html
>> 
>> 
>> John
>> 
>Nice specs indeed.
>
>If I had the money, I'd want one ;)
>
>Cheers
>
>PeteS

But hey, this is a revelation:

Hardware design has always been comforting because it is direct,
simple, visible, wysiwyg, physical, and generally reliable. The tools,
oscilloscopes and such, are approachable and dependable. I can use a
30-year old tube-type TEK oscilloscope to debug the most modern analog
or digital circuits, without downloading and installing service packs.

Software is abstract, indirect, bizarre, and unreliable. The tools are
buggy, bloated, always changing, unpredictable, pig slow, and seldom
backwards-compatible. I can't use current-gen tools to edit a 2-year
old FPGA design, and I'm lucky if I can somehow still find and run the
older tools.

So, FPFAs, VHDL, and the associated software tools are the trojan
horse that's finally letting the software people get revenge, finally
allowing them to force us hardware designers depend on (and endlessly
pay for) their bizantine and unreliable methodologies, to trap us in
the gotta-upgrade-but-every-generation-has-more-new-bugs loop.

And the new Windows-based scopes and logic analyzers, of course...
same idea.

John

Article: 112256
Subject: Re: memory init in Altera bitfiles, (like data2mem) is it possible?
From: already5chosen@yahoo.com
Date: 18 Nov 2006 12:26:04 -0800
Links: << >> << T >> << A >>

Antti wrote:
>
> ;) I dont really care about naming.
>
> but with Xilinx I have huge set of of BIT files (aka SOF) + BMM file
> (ram loc info) now I can compile C code with GCC, and run data2mem, it
> completes withing seconds and a result is ready to load bit file for
> any Xilinx FPGA that includes the software. This flow EXCLUDES all FPGA
> tools, it really takes ready made bitfile (aka Altera SOF) and merges
> the elf or object into it. During that process the only files required
> are the BIT file itself and ram location info file, no FPGA tool
> generated files are required.
>
> the path with altera asm or cdb or whatever isnt really the same, or I
> am mistaken here?
>
> the process must be simple:
>
> 1) compile C code to ELF or binary
> 2) merge ELF with fpga hardware (BIT or SOF)
>
> it works really nice with xilinx tools, it works also in Lattice flow,
> but with Altera it seems to be really impossible. I think altera
> has/had it for mercury but for softcore the sof file update function is
> missing.
>
> Antti

Altera SOF file simply doesn't contain enough information to do the
merge like you want. .sof is really just a simple binary image almost
identical to .rbf. You need additional knowledge in order to know where
within the SOF to put the information contained in the .elf or .hex
file produced by the Nios2 tools. The location information is hidden in
undocumented intermediate files in DB directory which format likely
isn't even compatible between different versions of Quartus.

I still don't understand why do you care? From practical perspective
Altera solution is adequate.

More so, if you generate the Nios2 core with JTAG debug support and
your on-chip program memory is connected to data port of the core then
during development/debugging phase you don't need to merge anything.
Nios2 tools are smart enough to load elf image directly into required
on-chip RAM location on the running FPGA in exactly the same way they
do it for a case of external RAM.

Article: 112257
Subject: Re: memory init in Altera bitfiles, (like data2mem) is it possible?
From: Jim Granville <no.spam@designtools.maps.co.nz>
Date: Sun, 19 Nov 2006 09:30:10 +1300
Links: << >> << T >> << A >>

Antti wrote:
> 
> it works really nice with xilinx tools, it works also in Lattice flow,
> but with Altera it seems to be really impossible. I think altera
> has/had it for mercury but for softcore the sof file update function is
> missing.

  Seems it would be very much in Altera's interest to assist these
efforts ?

  I'm reminded of AutoCAD's DWF effort, where they place a Library
of free usable modules ( IIRC as DLL?) and then users can create
their own DWF with some calls.

  Altera should have a web page for this type of work ?

-jg

Article: 112258
Subject: Re: Synthesis size of Circuits?
From: "Thomas Stanka" <usenet_10@stanka-web.de>
Date: 18 Nov 2006 12:49:07 -0800
Links: << >> << T >> << A >>

Hi,

olson_ord@yahoo.it schrieb:
> > The number of gates _will_ increase, if you add gates!
> > I guess you even didn't care about signal buffering. A signal driving
> > 10k gates needs roughly 1000 gates for signal buffering.
>
> 	I found this comment useful - because I could not get this information
> in the manuals or other tutorials. (I never imagined so many gates
> would be required for buffering.)

This number depends on the maximum fanout allowed for buffers of your
library. If your technology allows a max fanout of 10 per Gate, you
need more than 1k buffers to have a overall fanout of 10k for a signal.
If your technology allows a maximum fanout of 32, you need only 324
buffer.
But you have typically massive timing problems when your actual fanout
reaches max. fanout.
In some FPGAs you need to use special routing resources (clk-nets) for
a large fanout. those nets are typically strong limited in number.

> > This code seems to introduce combinatorial paths with length of more
> > than 10k gates.
> > Don't do this. I guess even 100 gates per path will kill your tool.
>
>
> 	I did not understand this. Why would long combinatorial paths kill my
> tool (i.e. why is it difficult to synthesize - Routing Memory??)
> 	Do you (or anyone else reading this) have any suggestions for
> decreasing the combinatorial path length?

The task of synthesis is NP, meaning your effort rises exponential with
the number of involved variables. I would consider each gate-input as
variable for synthesis (with redundant inputs hopefully merged) .
Second problem would be place&routing to fit timing. The more gates
drive an logic cone, the more inputs have to be considered to place and
route this logic cone. 
 
bye Thomas

Article: 112259
Subject: Re: EDK 8.2 Block RAM error
From: "MM" <mbmsv@yahoo.com>
Date: Sat, 18 Nov 2006 16:16:48 -0500
Links: << >> << T >> << A >>

Antti,

I am not sure if I correctly understand what your problem is, but I do some
massaging to all of the bmm files, which include more than one address
space. Just delete all of the END_ADDRESS_SPACE lines except for the very
last one and all of the ADDRESS_SPACE lines except for the very first one.
Then correct the address range to include all the blocks and save the file.

/Mikhail




"Antti" <Antti.Lukats@xilant.com> wrote in message
news:1163878069.748187.262430@h54g2000cwb.googlegroups.com...
> hi,
>
> it seems that either EDK 8.2 or ISE 8.2 has still some issues with the
> BRAM inits, if an EDK system has more than 1 BRAM block, then the BMM
> file is generated "looking good", makes sense but when it passes
> NGCbuild then only the first memory block gets the "PLACED" locations
> assigned, the second one stays af if not located, those DATA2MEM would
> not be able to initialize the data into second BRAM block.
>
> I wonder if there is some trick or fix or is it required to wait for
> 9.1 + SPx ?
>
> there might be a solution, namly the OpenFire (microblaze clone)
> includes a perl script that processes the .LL file and re-generates the
> BMM with proper PLACED constraint _after_ the xilinx tool flow.
>
> Using tools from OpenFire to fix up the BUGS in EDK is how is that word
> - ridicilous? but maybe its the only option currently
>
> Antti
>

Article: 112260
Subject: Re: EDK 8.2 Block RAM error
From: "Antti" <Antti.Lukats@xilant.com>
Date: 18 Nov 2006 14:26:32 -0800
Links: << >> << T >> << A >>

MM schrieb:

> Antti,
>
> I am not sure if I correctly understand what your problem is, but I do some
> massaging to all of the bmm files, which include more than one address
> space. Just delete all of the END_ADDRESS_SPACE lines except for the very
> last one and all of the ADDRESS_SPACE lines except for the very first one.
> Then correct the address range to include all the blocks and save the file.
>
> /Mikhail
>

I tried that too, but did seem to make a difference, the second block
does not
get any PLACED values during ngdbild :(

two blocks only work in case of 128MB space that is made of 2 times
64kb
but if i have say 32kb + 8kb then the second one is not working :(

Antti

Article: 112261
Subject: Re: board - T562.jpg
From: PeteS <peter.smith8380@ntlworld.com>
Date: Sat, 18 Nov 2006 23:15:50 GMT
Links: << >> << T >> << A >>

John Larkin wrote:
> On Sat, 18 Nov 2006 17:58:50 GMT, PeteS <peter.smith8380@ntlworld.com>
> wrote:
> 
>> John Larkin wrote:
>>> On Sat, 18 Nov 2006 11:30:03 GMT, PeteS <peter.smith8380@ntlworld.com>
>>> wrote:
>>>
>>>> John Larkin wrote:
>>>>> This is the thing I'm working on this month. It's a delay generator,
>>>>> with an MC68332 uP on the back side that manages things. One of my
>>>>> guys quit, leaving behind about 14 klines of nasty, buggy code, so I
>>>>> thought it over for about 18 seconds and tossed it and started over
>>>>> from scratch. I'm workin with another guy who is re-doing the nasty,
>>>>> buggy FPGA design, ditto. He says bad things about the V8.2/SP3 Xilinx
>>>>> WebPack software.
>>>>>
>>>>> The application program is in flash, soldered down, and we're going to
>>>>> include a flash boot-block thing that lets you reflash the app code
>>>>> through the serial or ethernet ports, to upgrade the firmware. That's
>>>>> sort of mind-boggling, since the flash which holds this boot program
>>>>> disappears from the uP bus while it's being erased or programmed.
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>>
>>>> That's an interesting non-ortho arrangement :)
>>>>
>>>> On Webpack 8.2 SP3, I am afraid he's right. There have been some 
>>>> rumblings on comp.arch.fpga about it recently, and I 'upgraded' to it 
>>>> for my latest design, and then it would not process 3 previous designs, 
>>>> although those have no errors I can see. Xilinx claimed it had 
>>>> 'tightened up' certain things, but it caused me some grief because those 
>>>> designs are the basis for a number of things where the FPGA code is 
>>>> designed to in-system upgradeable should some new feature be requested. 
>>>> I eventually re-installed (from an old full download) 7.1 for those 
>>>> projects.
>>> Yeah, I wish people would stop breaking things, and stay absolutely
>>> backwards-compatible to existing designs. FPGA were supposed to make
>>> hardware design easier, and then they sent in an army of programmers
>>> to replace hardware problems with software nightmares.
>>>
>>> The *service pack* is a 300 megabyte download.
>>>
>>>> I've done the reprogrammable flash thing myself and I definitely concur 
>>>> it _can_ get a little hairy.
>>>>
>>>> Wish you the best of luck getting the design fully operational. What are 
>>>> the specs?
>>> Here it is. Fortunately, the hardware is in good shape, so all we have
>>> to do is pound the firmware and fpga into submission. Soon.
>>>
>>> http://www.highlandtechnology.com/DSS/T560DS.html
>>>
>>>
>>> John
>>>
>> Nice specs indeed.
>>
>> If I had the money, I'd want one ;)
>>
>> Cheers
>>
>> PeteS
> 
> But hey, this is a revelation:
> 
> Hardware design has always been comforting because it is direct,
> simple, visible, wysiwyg, physical, and generally reliable. The tools,
> oscilloscopes and such, are approachable and dependable. I can use a
> 30-year old tube-type TEK oscilloscope to debug the most modern analog
> or digital circuits, without downloading and installing service packs.
> 
> Software is abstract, indirect, bizarre, and unreliable. The tools are
> buggy, bloated, always changing, unpredictable, pig slow, and seldom
> backwards-compatible. I can't use current-gen tools to edit a 2-year
> old FPGA design, and I'm lucky if I can somehow still find and run the
> older tools.
> 
> So, FPFAs, VHDL, and the associated software tools are the trojan
> horse that's finally letting the software people get revenge, finally
> allowing them to force us hardware designers depend on (and endlessly
> pay for) their bizantine and unreliable methodologies, to trap us in
> the gotta-upgrade-but-every-generation-has-more-new-bugs loop.
> 
> And the new Windows-based scopes and logic analyzers, of course...
> same idea.
> 
> John
> 
> 
Well, this could fill a book.

One of my real pet peeves is I don't have control over the synthesis and 
PAR process such as have with C (for example - I've written many 
hundreds of KLines of C).

As an example, there are times I _deliberately_ want to delay a dignal 
by a prop delay or two, but the damn tools optimise it out. If I had a 
'volatile' keyword, or perhaps 'pragma' to tell the tool to *leave this 
damn code alone and don't think of optimising*, it might be better.

This snippet for example;

wire sigA; // input sig
wire sigB; // intermediate signal
wire SigC; // output signal

BUF (a.sigA, b.SigB);
BUF (a.SigB, SigC);

..

the tools will optimise them out when I _specifically_ want them there 
to add a delay. Of course, you can always say

SigC = #10 SigA;

but I prefer to use buffers because the timings are specified far better 
than the tools will do what I want.

Windows based scopes - I have always said that a smart power switch on a 
windows machine is the dumbest thing I ever heard of.

No - I agree with you. Real hardware is properly parameterisable and not 
subject to the latest 'thoughts' of a software engineer who doesn't have 
a F***ing clue about hardware.

Cheers

PeteS

Article: 112262
Subject: Re: board - T562.jpg
From: PeteS <peter.smith8380@ntlworld.com>
Date: Sat, 18 Nov 2006 23:23:30 GMT
Links: << >> << T >> << A >>

PeteS wrote:
> John Larkin wrote:
>> On Sat, 18 Nov 2006 17:58:50 GMT, PeteS <peter.smith8380@ntlworld.com>
>> wrote:
>>
>>> John Larkin wrote:
>>>> On Sat, 18 Nov 2006 11:30:03 GMT, PeteS <peter.smith8380@ntlworld.com>
>>>> wrote:
>>>>
>>>>> John Larkin wrote:
>>>>>> This is the thing I'm working on this month. It's a delay generator,
>>>>>> with an MC68332 uP on the back side that manages things. One of my
>>>>>> guys quit, leaving behind about 14 klines of nasty, buggy code, so I
>>>>>> thought it over for about 18 seconds and tossed it and started over
>>>>>> from scratch. I'm workin with another guy who is re-doing the nasty,
>>>>>> buggy FPGA design, ditto. He says bad things about the V8.2/SP3 
>>>>>> Xilinx
>>>>>> WebPack software.
>>>>>>
>>>>>> The application program is in flash, soldered down, and we're 
>>>>>> going to
>>>>>> include a flash boot-block thing that lets you reflash the app code
>>>>>> through the serial or ethernet ports, to upgrade the firmware. That's
>>>>>> sort of mind-boggling, since the flash which holds this boot program
>>>>>> disappears from the uP bus while it's being erased or programmed.
>>>>>>
>>>>>> John
>>>>>>
>>>>>>
>>>>>>
>>>>> That's an interesting non-ortho arrangement :)
>>>>>
>>>>> On Webpack 8.2 SP3, I am afraid he's right. There have been some 
>>>>> rumblings on comp.arch.fpga about it recently, and I 'upgraded' to 
>>>>> it for my latest design, and then it would not process 3 previous 
>>>>> designs, although those have no errors I can see. Xilinx claimed it 
>>>>> had 'tightened up' certain things, but it caused me some grief 
>>>>> because those designs are the basis for a number of things where 
>>>>> the FPGA code is designed to in-system upgradeable should some new 
>>>>> feature be requested. I eventually re-installed (from an old full 
>>>>> download) 7.1 for those projects.
>>>> Yeah, I wish people would stop breaking things, and stay absolutely
>>>> backwards-compatible to existing designs. FPGA were supposed to make
>>>> hardware design easier, and then they sent in an army of programmers
>>>> to replace hardware problems with software nightmares.
>>>>
>>>> The *service pack* is a 300 megabyte download.
>>>>
>>>>> I've done the reprogrammable flash thing myself and I definitely 
>>>>> concur it _can_ get a little hairy.
>>>>>
>>>>> Wish you the best of luck getting the design fully operational. 
>>>>> What are the specs?
>>>> Here it is. Fortunately, the hardware is in good shape, so all we have
>>>> to do is pound the firmware and fpga into submission. Soon.
>>>>
>>>> http://www.highlandtechnology.com/DSS/T560DS.html
>>>>
>>>>
>>>> John
>>>>
>>> Nice specs indeed.
>>>
>>> If I had the money, I'd want one ;)
>>>
>>> Cheers
>>>
>>> PeteS
>>
>> But hey, this is a revelation:
>>
>> Hardware design has always been comforting because it is direct,
>> simple, visible, wysiwyg, physical, and generally reliable. The tools,
>> oscilloscopes and such, are approachable and dependable. I can use a
>> 30-year old tube-type TEK oscilloscope to debug the most modern analog
>> or digital circuits, without downloading and installing service packs.
>>
>> Software is abstract, indirect, bizarre, and unreliable. The tools are
>> buggy, bloated, always changing, unpredictable, pig slow, and seldom
>> backwards-compatible. I can't use current-gen tools to edit a 2-year
>> old FPGA design, and I'm lucky if I can somehow still find and run the
>> older tools.
>>
>> So, FPFAs, VHDL, and the associated software tools are the trojan
>> horse that's finally letting the software people get revenge, finally
>> allowing them to force us hardware designers depend on (and endlessly
>> pay for) their bizantine and unreliable methodologies, to trap us in
>> the gotta-upgrade-but-every-generation-has-more-new-bugs loop.
>>
>> And the new Windows-based scopes and logic analyzers, of course...
>> same idea.
>>
>> John
>>
>>
> Well, this could fill a book.
> 
> One of my real pet peeves is I don't have control over the synthesis and 
> PAR process such as have with C (for example - I've written many 
> hundreds of KLines of C).
> 
> As an example, there are times I _deliberately_ want to delay a dignal 
> by a prop delay or two, but the damn tools optimise it out. If I had a 
> 'volatile' keyword, or perhaps 'pragma' to tell the tool to *leave this 
> damn code alone and don't think of optimising*, it might be better.
> 
> This snippet for example;
> 
> wire sigA; // input sig
> wire sigB; // intermediate signal
> wire SigC; // output signal
> 
> BUF (a.sigA, b.SigB);
> BUF (a.SigB, SigC);
> 
> ..
> 
> the tools will optimise them out when I _specifically_ want them there 
> to add a delay. Of course, you can always say
> 
> SigC = #10 SigA;
> 
> but I prefer to use buffers because the timings are specified far better 
> than the tools will do what I want.
> 
> Windows based scopes - I have always said that a smart power switch on a 
> windows machine is the dumbest thing I ever heard of.
> 
> No - I agree with you. Real hardware is properly parameterisable and not 
> subject to the latest 'thoughts' of a software engineer who doesn't have 
> a F***ing clue about hardware.
> 
> Cheers
> 
> PeteS

and further (yeah, I know I had typos in the previous rant)

When I do pure hardware I do not have to try and figure out what the 
hell was done to implement my statements.

This was a major issue on a design I did about 4 years ago where I 
interfaced an upstream bus to the busses on 6 devices (with a lot of 
other stuff) and the synthesis / PAR etc kept optimising away certain 
things that were there to maintain the timing. The response I got was 
'well, use pure synchronous design' but in this case it was simply not 
possible (am issue I am sure you'll understand).

I once deliberately did a DeMorgan transform by hand because I did nto 
trust the tool to do it right. (Code available on request ;) )

Cheers

PeteS

Article: 112263
Subject: Re: memory init in Altera bitfiles, (like data2mem) is it possible?
From: Jim Granville <no.spam@designtools.maps.co.nz>
Date: Sun, 19 Nov 2006 12:49:46 +1300
Links: << >> << T >> << A >>

already5chosen@yahoo.com wrote:

> Antti wrote:
> 
>>;) I dont really care about naming.
>>
>>but with Xilinx I have huge set of of BIT files (aka SOF) + BMM file
>>(ram loc info) now I can compile C code with GCC, and run data2mem, it
>>completes withing seconds and a result is ready to load bit file for
>>any Xilinx FPGA that includes the software. This flow EXCLUDES all FPGA
>>tools, it really takes ready made bitfile (aka Altera SOF) and merges
>>the elf or object into it. During that process the only files required
>>are the BIT file itself and ram location info file, no FPGA tool
>>generated files are required.
>>
>>the path with altera asm or cdb or whatever isnt really the same, or I
>>am mistaken here?
>>
>>the process must be simple:
>>
>>1) compile C code to ELF or binary
>>2) merge ELF with fpga hardware (BIT or SOF)
>>
>>it works really nice with xilinx tools, it works also in Lattice flow,
>>but with Altera it seems to be really impossible. I think altera
>>has/had it for mercury but for softcore the sof file update function is
>>missing.
>>
>>Antti
> 
> 
> Altera SOF file simply doesn't contain enough information to do the
> merge like you want. .sof is really just a simple binary image almost
> identical to .rbf. You need additional knowledge in order to know where
> within the SOF to put the information contained in the .elf or .hex
> file produced by the Nios2 tools. The location information is hidden in
> undocumented intermediate files in DB directory which format likely
> isn't even compatible between different versions of Quartus.
> 
> I still don't understand why do you care? From practical perspective
> Altera solution is adequate.

No, it has blindspots, see below.

> 
> More so, if you generate the Nios2 core with JTAG debug support and
> your on-chip program memory is connected to data port of the core then
> during development/debugging phase you don't need to merge anything.
> Nios2 tools are smart enough to load elf image directly into required
> on-chip RAM location on the running FPGA in exactly the same way they
> do it for a case of external RAM.

There are two parts of a design flow.
The first I'll call first-pass, which is a full NIOS+Peripherals+Code 
build and that must use Altera tools.

The _subsequent_ passes, only need to insert the code, they do NOT 
require, and should not need any Altera tools, especially any licensed 
ones. This second area is Antti's focus.

Yes, the _subsequent_ pass insert tool may need some simple help to tell 
it which RAM blocks take which code segments
[Code, Initialised data, Sizes for error checking  ]
but that is easily provided in an ASCII config file the tools read.

OR, the first build could place careful signatures (not code) into the 
used segments, and the insert-tool could scan for those, and replace 
them with the users code. Makes to tool usage simpler still, and the
tool might even be able to auto-sense Xilinx/Altera/Lattice streams.

-jg

Article: 112264
Subject: Re: board - T562.jpg
From: John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com>
Date: Sat, 18 Nov 2006 16:14:21 -0800
Links: << >> << T >> << A >>

On Sat, 18 Nov 2006 23:23:30 GMT, PeteS <peter.smith8380@ntlworld.com>
wrote:

>
>When I do pure hardware I do not have to try and figure out what the 
>hell was done to implement my statements.
>
>This was a major issue on a design I did about 4 years ago where I 
>interfaced an upstream bus to the busses on 6 devices (with a lot of 
>other stuff) and the synthesis / PAR etc kept optimising away certain 
>things that were there to maintain the timing. The response I got was 
>'well, use pure synchronous design' but in this case it was simply not 
>possible (am issue I am sure you'll understand).

Yup, this *is* the real world. We recently had to do a clock-edge
deglitcher, using delay elements. It couldn't be synchronous, because
we were, well, deglitching the clock! Ditto stuff like charge-pump
phase detectors, where you really need exactly what you need, delays
and all.

>I once deliberately did a DeMorgan transform by hand because I did nto 
>trust the tool to do it right. (Code available on request ;) )
>
>Cheers
>
>PeteS

One thing you can do is add a pulldown to a pin (or ground it) and
call that signal ZERO or something. Then just OR it with things to
create new, buffered, delayed things. If you need more, run it through
a shift register and create ZERO1, ZERO2, etc. The compiler can't
optimize them out!

So the FPGA software people ought to provide us an irreducible ZERO
without wasting a pin, or a buffer that stays a buffer always.

So, is there a block of logic so complex that the compiler can't
figure out that it indeed will always output a zero? Maybe the MSB of
a thousand-year counter, but that wastes flops. Maybe some small but
clever state machine that always makes zero but is too tricky to be
optimized?

John

Article: 112265
Subject: Re: board - T562.jpg
From: PeteS <peter.smith8380@ntlworld.com>
Date: Sun, 19 Nov 2006 00:24:34 GMT
Links: << >> << T >> << A >>

John Larkin wrote:
> On Sat, 18 Nov 2006 23:23:30 GMT, PeteS <peter.smith8380@ntlworld.com>
> wrote:
> 
>> When I do pure hardware I do not have to try and figure out what the 
>> hell was done to implement my statements.
>>
>> This was a major issue on a design I did about 4 years ago where I 
>> interfaced an upstream bus to the busses on 6 devices (with a lot of 
>> other stuff) and the synthesis / PAR etc kept optimising away certain 
>> things that were there to maintain the timing. The response I got was 
>> 'well, use pure synchronous design' but in this case it was simply not 
>> possible (am issue I am sure you'll understand).
> 
> Yup, this *is* the real world. We recently had to do a clock-edge
> deglitcher, using delay elements. It couldn't be synchronous, because
> we were, well, deglitching the clock! Ditto stuff like charge-pump
> phase detectors, where you really need exactly what you need, delays
> and all.
> 
> 
>> I once deliberately did a DeMorgan transform by hand because I did nto 
>> trust the tool to do it right. (Code available on request ;) )
>>
>> Cheers
>>
>> PeteS
> 
> 
> One thing you can do is add a pulldown to a pin (or ground it) and
> call that signal ZERO or something. Then just OR it with things to
> create new, buffered, delayed things. If you need more, run it through
> a shift register and create ZERO1, ZERO2, etc. The compiler can't
> optimize them out!
> 
> So the FPGA software people ought to provide us an irreducible ZERO
> without wasting a pin, or a buffer that stays a buffer always.
> 
> So, is there a block of logic so complex that the compiler can't
> figure out that it indeed will always output a zero? Maybe the MSB of
> a thousand-year counter, but that wastes flops. Maybe some small but
> clever state machine that always makes zero but is too tricky to be
> optimized?
> 
> John
> 

As noted, I once did a DeMorgan transform by hand (simple one too) and 
it materially changed the compiled output. (For those who understand the 
transform, don't get upset at the comments; they were there for people 
who _didn't_).

Here it is:

******************************

	else begin
		cs0 <= cs_l;	// make a direct copy of the cs signal
		cs1 <= (cs0 | cs_l);	// Note - this uses a DeMorgan transform
		// the formula really is this : cs1 <= !(!cs_l & !cs0)
		// rather than cs1 <= (!cs_l & !cs0)
		// Note the extra output inversion, which renders an
		// inversion unnecessary
		// DeMorgan's theorem is this
		// <y = !a & !b> === <!y = a | b>
		// i.e. invert all signals and change and to or and vice
		// versa. Note this works only on basic functions
		// ( &, |, ! ) (AND, OR, INVERT)
		// for a valid select, we need two consecutive samples of
		// cs_l low. Latch a low, then look at the latch and the
		// signal on the pin. If both are low, cs1 goes low. If
		// either are high (glitch, runt pulse) then cs1 stays high
		// We trigger on internal cs (cs1) going low
		// Why did I use it? Because it requires no inverters and
		// therefore saves me a gate delay by simply using a LUT
		// without inversion
		//
		// Of course, the tools *might* do this, but I can't
		// guarantee it, so I'll *****ing so it myself.
	end
*****************************************

Cheers

PeteS

Article: 112266
Subject: Re: Static Timing Analysis vs Dinamic Timing Analysis
From: "Matthew Hicks" <mdhicks2@uiuc.edu>
Date: Sat, 18 Nov 2006 21:17:52 -0600
Links: << >> << T >> << A >>

One is static and the other is dynamic ... pretty simple.  Everyone uses the 
static version because of its properties versus dynamic static analysis. 
For information about the properties of both look in your textbook.

---Matthew Hicks


"jajo" <jmunir@gmail.com> wrote in message 
news:1163861153.117305.220690@m7g2000cwm.googlegroups.com...
> Hi!,
>
> Could anybody explain me these concepts and their differences?. And
> what is done in foundation ISE tool of xilinx?.
>
> Thanx
>
> Jajo
>

Article: 112267
Subject: Re: pulse jitter due to clock
From: "Matthew Hicks" <mdhicks2@uiuc.edu>
Date: Sat, 18 Nov 2006 21:25:45 -0600
Links: << >> << T >> << A >>

I thought I just read something on one of Xilinx's "techXclusives" about one 
of the improvements of the Virtex 4 over the Virtex 2 series was that the 
global clock routing went from single ended to differential.


---Matthew Hicks


"Austin Lesea" <austin@xilinx.com> wrote in message 
news:455F2F5C.4030705@xilinx.com...
> Symon,
>
> Well, yes they are differential across the chip.
>
> And, what they accomplish is less jitter than if they had been single 
> ended.
>
> It is quite a battle:  voltage goes down, distances get longer (for 
> smaller wires), more stuff is switching, etc.  Gains made may not appear 
> to be substantial, yet without them, the result would have been far worse 
> (no small gain, but a huge loss of performance).
>
> Austin
>
>
> Symon wrote:
>
>> "Austin Lesea" <austin@xilinx.com> wrote in message 
>> news:ejkus8$kkk1@cnn.xsj.xilinx.com...
>>
>>>>So, that's a cool thing. Did you guys do any measurements on the jitter
>>>>performance of this? I.e. how much jitter is added to a differential 
>>>>data
>>>>signal coming out of an IOB clocked by a BUFIO driven from a 
>>>>differential
>>>>clock coming to the FPGA 'Clock Capable' pins.?
>>>
>>>Yes, we have performed a great deal of characterization.  And the clock
>>>capable pins, or even a plain IOB has no real difference in jitter
>>>performance.
>>>
>>
>> Hi Austin,
>> Thanks for getting back! Your reply surprised me; I now wonder just what 
>> does the diff clock routing bring to the party if not better jitter 
>> performance? BTW, are the regular global clock networks differential?
>> Thanks, Syms.

Article: 112268
Subject: Re: board - T562.jpg
From: John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com>
Date: Sat, 18 Nov 2006 19:30:20 -0800
Links: << >> << T >> << A >>

On Sun, 19 Nov 2006 00:24:34 GMT, PeteS <peter.smith8380@ntlworld.com>
wrote:

>John Larkin wrote:
>> On Sat, 18 Nov 2006 23:23:30 GMT, PeteS <peter.smith8380@ntlworld.com>
>> wrote:
>> 
>>> When I do pure hardware I do not have to try and figure out what the 
>>> hell was done to implement my statements.
>>>
>>> This was a major issue on a design I did about 4 years ago where I 
>>> interfaced an upstream bus to the busses on 6 devices (with a lot of 
>>> other stuff) and the synthesis / PAR etc kept optimising away certain 
>>> things that were there to maintain the timing. The response I got was 
>>> 'well, use pure synchronous design' but in this case it was simply not 
>>> possible (am issue I am sure you'll understand).
>> 
>> Yup, this *is* the real world. We recently had to do a clock-edge
>> deglitcher, using delay elements. It couldn't be synchronous, because
>> we were, well, deglitching the clock! Ditto stuff like charge-pump
>> phase detectors, where you really need exactly what you need, delays
>> and all.
>> 
>> 
>>> I once deliberately did a DeMorgan transform by hand because I did nto 
>>> trust the tool to do it right. (Code available on request ;) )
>>>
>>> Cheers
>>>
>>> PeteS
>> 
>> 
>> One thing you can do is add a pulldown to a pin (or ground it) and
>> call that signal ZERO or something. Then just OR it with things to
>> create new, buffered, delayed things. If you need more, run it through
>> a shift register and create ZERO1, ZERO2, etc. The compiler can't
>> optimize them out!
>> 
>> So the FPGA software people ought to provide us an irreducible ZERO
>> without wasting a pin, or a buffer that stays a buffer always.
>> 
>> So, is there a block of logic so complex that the compiler can't
>> figure out that it indeed will always output a zero? Maybe the MSB of
>> a thousand-year counter, but that wastes flops. Maybe some small but
>> clever state machine that always makes zero but is too tricky to be
>> optimized?
>> 
>> John
>> 
>
>As noted, I once did a DeMorgan transform by hand (simple one too) and 
>it materially changed the compiled output. (For those who understand the 
>transform, don't get upset at the comments; they were there for people 
>who _didn't_).
>
>Here it is:
>
>******************************
>
>	else begin
>		cs0 <= cs_l;	// make a direct copy of the cs signal
>		cs1 <= (cs0 | cs_l);	// Note - this uses a DeMorgan transform
>		// the formula really is this : cs1 <= !(!cs_l & !cs0)
>		// rather than cs1 <= (!cs_l & !cs0)
>		// Note the extra output inversion, which renders an
>		// inversion unnecessary
>		// DeMorgan's theorem is this
>		// <y = !a & !b> === <!y = a | b>
>		// i.e. invert all signals and change and to or and vice
>		// versa. Note this works only on basic functions
>		// ( &, |, ! ) (AND, OR, INVERT)
>		// for a valid select, we need two consecutive samples of
>		// cs_l low. Latch a low, then look at the latch and the
>		// signal on the pin. If both are low, cs1 goes low. If
>		// either are high (glitch, runt pulse) then cs1 stays high
>		// We trigger on internal cs (cs1) going low
>		// Why did I use it? Because it requires no inverters and
>		// therefore saves me a gate delay by simply using a LUT
>		// without inversion
>		//
>		// Of course, the tools *might* do this, but I can't
>		// guarantee it, so I'll *****ing so it myself.
>	end
>*****************************************
>
>Cheers
>
>PeteS

They should back off this religious devotion to fully synchronous
logic and give us a couple of dozen programmable true delay elements,
scattered about the chip. But they won't because it's not politically
correct, and because they figure that we're so dumb that we'd get into
trouble using them.


John

Article: 112269
Subject: spartan-3e starter kit and ethernet
From: hakan.aydin@gmail.com
Date: 18 Nov 2006 21:08:15 -0800
Links: << >> << T >> << A >>

Hello,

I recently purchased a spartan-3e starter kit from Xilinx. (I have so
far been unimpressed with the reference designs supplied with the kit,
but that's another subject...)

I am trying to use the ethernet connector on the board to communicate
with a PC, and so far I have been unsuccessful. I do not want to use a
soft-core like picoblaze, I just want to be able to communicate using
the ethernet port. I looked for a reference design on the internet and
the closest I could come up with was the fpga4fun site here:

http://www.fpga4fun.com/10BASE-T.html

however, I couldn't get that to work. I also tried using the ethernet
ip core from opencores.org, but that doesn't even fit on the device on
the starter kit (xc3s500e)

If anyone successfully used the ethernet connector with the spartan-3e
starter kit for data communication, I would be grateful if you could
provide an example or at least some guidance on how to proceed.

Best regards,

Article: 112270
Subject: Re: board - T562.jpg
From: "rickman" <gnuarm@gmail.com>
Date: 18 Nov 2006 21:52:32 -0800
Links: << >> << T >> << A >>

PeteS wrote:
> John Larkin wrote:
> > On Sat, 18 Nov 2006 23:23:30 GMT, PeteS <peter.smith8380@ntlworld.com>
> > wrote:
> >
> >> When I do pure hardware I do not have to try and figure out what the
> >> hell was done to implement my statements.
> >>
> >> This was a major issue on a design I did about 4 years ago where I
> >> interfaced an upstream bus to the busses on 6 devices (with a lot of
> >> other stuff) and the synthesis / PAR etc kept optimising away certain
> >> things that were there to maintain the timing. The response I got was
> >> 'well, use pure synchronous design' but in this case it was simply not
> >> possible (am issue I am sure you'll understand).
> >
> > Yup, this *is* the real world. We recently had to do a clock-edge
> > deglitcher, using delay elements. It couldn't be synchronous, because
> > we were, well, deglitching the clock! Ditto stuff like charge-pump
> > phase detectors, where you really need exactly what you need, delays
> > and all.
> >
> >
> >> I once deliberately did a DeMorgan transform by hand because I did nto
> >> trust the tool to do it right. (Code available on request ;) )
> >>
> >> Cheers
> >>
> >> PeteS
> >
> >
> > One thing you can do is add a pulldown to a pin (or ground it) and
> > call that signal ZERO or something. Then just OR it with things to
> > create new, buffered, delayed things. If you need more, run it through
> > a shift register and create ZERO1, ZERO2, etc. The compiler can't
> > optimize them out!
> >
> > So the FPGA software people ought to provide us an irreducible ZERO
> > without wasting a pin, or a buffer that stays a buffer always.
> >
> > So, is there a block of logic so complex that the compiler can't
> > figure out that it indeed will always output a zero? Maybe the MSB of
> > a thousand-year counter, but that wastes flops. Maybe some small but
> > clever state machine that always makes zero but is too tricky to be
> > optimized?
> >
> > John
> >
>
> As noted, I once did a DeMorgan transform by hand (simple one too) and
> it materially changed the compiled output. (For those who understand the
> transform, don't get upset at the comments; they were there for people
> who _didn't_).
>
> Here it is:
>
> ******************************
>
> 	else begin
> 		cs0 <= cs_l;	// make a direct copy of the cs signal
> 		cs1 <= (cs0 | cs_l);	// Note - this uses a DeMorgan transform
> 		// the formula really is this : cs1 <= !(!cs_l & !cs0)
> 		// rather than cs1 <= (!cs_l & !cs0)
> 		// Note the extra output inversion, which renders an
> 		// inversion unnecessary
> 		// DeMorgan's theorem is this
> 		// <y = !a & !b> === <!y = a | b>
> 		// i.e. invert all signals and change and to or and vice
> 		// versa. Note this works only on basic functions
> 		// ( &, |, ! ) (AND, OR, INVERT)
> 		// for a valid select, we need two consecutive samples of
> 		// cs_l low. Latch a low, then look at the latch and the
> 		// signal on the pin. If both are low, cs1 goes low. If
> 		// either are high (glitch, runt pulse) then cs1 stays high
> 		// We trigger on internal cs (cs1) going low
> 		// Why did I use it? Because it requires no inverters and
> 		// therefore saves me a gate delay by simply using a LUT
> 		// without inversion
> 		//
> 		// Of course, the tools *might* do this, but I can't
> 		// guarantee it, so I'll *****ing so it myself.
> 	end
> *****************************************

What exactly did the tool do with this design?  I don't see anything
that will tell the tool not to optimize the logic and eliminate your
gates entirely.  Logically your equation is equivalent to the original
input signal.  There are attributes you can give a signal to prevent
the tool from optimizing it away, did you use one on cs0?

I also don't see what advantage your DeMorgan transform has.  If you
lock the cs0 signal so that it is not optmized away, the tool will use
a LUT to produce cs0 from cs_I.  It will also use a LUT to do the OR.

There are delay elements built into the IOBs of Xilinx parts.  I don't
recall just what you can and can't do with them, but if you run a
signal through the delay and also run it without the delay, you can
then OR the two in just the way you intended above.  This may require
two input pins however and it may not be usable unless you are running
the signal into the input FF, I'm not sure.

Article: 112271
Subject: Re: spartan-3e starter kit and ethernet
From: "Antti" <Antti.Lukats@xilant.com>
Date: 18 Nov 2006 23:16:11 -0800
Links: << >> << T >> << A >>

hakan.aydin@gmail.com schrieb:

> Hello,
>
> I recently purchased a spartan-3e starter kit from Xilinx. (I have so
> far been unimpressed with the reference designs supplied with the kit,
> but that's another subject...)
>
> I am trying to use the ethernet connector on the board to communicate
> with a PC, and so far I have been unsuccessful. I do not want to use a
> soft-core like picoblaze, I just want to be able to communicate using
> the ethernet port. I looked for a reference design on the internet and
> the closest I could come up with was the fpga4fun site here:
>
> http://www.fpga4fun.com/10BASE-T.html
>
> however, I couldn't get that to work. I also tried using the ethernet
> ip core from opencores.org, but that doesn't even fit on the device on
> the starter kit (xc3s500e)
>
> If anyone successfully used the ethernet connector with the spartan-3e
> starter kit for data communication, I would be grateful if you could
> provide an example or at least some guidance on how to proceed.
>
> Best regards,

you want like my answers but, -

if you want to use ethernet then you have almost no choices other than
use also some softcore processor.

the onlys oftcore-ethernet solution that will fit into s3--500e is the
MicroBlaze + OPB ethernet (or ethernet lite) there is ready to go
system that can load uclinux and works fully with ethernet.

if you are trying to use ethernet without EDK then the time you spend
(waste) is way more than the price of EDK. of course unless you just
want to play around and your goal is wasting time.

I am surprised about your comments regarding the reference designs, I
think specially the designs from Ken Chapman provided for this board
are really cool, I just ysterday looked up the frequency couner design
from the s3e starterkit redfesign, well only to get the super cool
SPAR_DCM_TEST thing to work, ok I agree the online refdesign was for
ISE 8.1 so I had some trouble getting it working on ISE 8.2 but still
there are plenty of design to learn from.


Antti

Article: 112272
Subject: Re: memory init in Altera bitfiles, (like data2mem) is it possible?
From: "Antti Lukats" <antti@openchip.org>
Date: Sun, 19 Nov 2006 08:21:29 +0100
Links: << >> << T >> << A >>

"Jim Granville" <no.spam@designtools.maps.co.nz> schrieb im Newsbeitrag 
news:455f9bd0$1@clear.net.nz...
> already5chosen@yahoo.com wrote:
>
>> Antti wrote:
>>
>>>;) I dont really care about naming.
>>>
>>>but with Xilinx I have huge set of of BIT files (aka SOF) + BMM file
>>>(ram loc info) now I can compile C code with GCC, and run data2mem, it
>>>completes withing seconds and a result is ready to load bit file for
>>>any Xilinx FPGA that includes the software. This flow EXCLUDES all FPGA
>>>tools, it really takes ready made bitfile (aka Altera SOF) and merges
>>>the elf or object into it. During that process the only files required
>>>are the BIT file itself and ram location info file, no FPGA tool
>>>generated files are required.
>>>
>>>the path with altera asm or cdb or whatever isnt really the same, or I
>>>am mistaken here?
>>>
>>>the process must be simple:
>>>
>>>1) compile C code to ELF or binary
>>>2) merge ELF with fpga hardware (BIT or SOF)
>>>
>>>it works really nice with xilinx tools, it works also in Lattice flow,
>>>but with Altera it seems to be really impossible. I think altera
>>>has/had it for mercury but for softcore the sof file update function is
>>>missing.
>>>
>>>Antti
>>
>>
>> Altera SOF file simply doesn't contain enough information to do the
>> merge like you want. .sof is really just a simple binary image almost
>> identical to .rbf. You need additional knowledge in order to know where
>> within the SOF to put the information contained in the .elf or .hex
>> file produced by the Nios2 tools. The location information is hidden in
>> undocumented intermediate files in DB directory which format likely
>> isn't even compatible between different versions of Quartus.
>>
>> I still don't understand why do you care? From practical perspective
>> Altera solution is adequate.
>
> No, it has blindspots, see below.
>
>>
>> More so, if you generate the Nios2 core with JTAG debug support and
>> your on-chip program memory is connected to data port of the core then
>> during development/debugging phase you don't need to merge anything.
>> Nios2 tools are smart enough to load elf image directly into required
>> on-chip RAM location on the running FPGA in exactly the same way they
>> do it for a case of external RAM.
>
> There are two parts of a design flow.
> The first I'll call first-pass, which is a full NIOS+Peripherals+Code 
> build and that must use Altera tools.
>
> The _subsequent_ passes, only need to insert the code, they do NOT 
> require, and should not need any Altera tools, especially any licensed 
> ones. This second area is Antti's focus.
>
> Yes, the _subsequent_ pass insert tool may need some simple help to tell 
> it which RAM blocks take which code segments
> [Code, Initialised data, Sizes for error checking  ]
> but that is easily provided in an ASCII config file the tools read.
>
> OR, the first build could place careful signatures (not code) into the 
> used segments, and the insert-tool could scan for those, and replace them 
> with the users code. Makes to tool usage simpler still, and the
> tool might even be able to auto-sense Xilinx/Altera/Lattice streams.
>
> -jg
>

Hi Jim,

you are quickthinker I havent even spelled out the "signature" approuch and 
yet you are suggesting it, yes that is defenetly one possibility, if I think 
more then its possible the preferred choice for me, to have and tool (maybe 
even open-source thing) that uses BIT or SOF files that have brams filled 
with special signature patterns and derives the bram and bitline mapping 
form the signatures. this should work for most fpgas. sure the bram 
bitlocation info has to be derived also, for xilinx it is available in LL 
files, for altera it maybe harder to obtain but should be doable still with 
some smart scripts :)

Antti

Article: 112273
Subject: Re: combinatorical divide by 2 in FPGA
From: "Antti Lukats" <antti@openchip.org>
Date: Sun, 19 Nov 2006 08:23:27 +0100
Links: << >> << T >> << A >>


"JustJohn" <john.l.smith@l-3com.com> schrieb im Newsbeitrag 
news:1163815159.844199.113180@h48g2000cwc.googlegroups.com...
>
>
> On Nov 17, 6:52 am, "Antti" <Antti.Luk...@xilant.com> wrote:
>> JustJohn schrieb:
>>
>>
>>
>>
>>
>> > John_H wrote:
>> > > Antti wrote:
>> > > > Hi
>>
>> > > > this may sound strange but I need flip flop or divide by 2 element 
>> > > > that
>> > > > is made purely out of combinatorical logic and passes fpga
>> > > > implementation without being optimized away, and work in FPGA 
>> > > > fabric. I
>> > > > tried some D-flop code but it gets really optimized away and doesnt
>> > > > work.
>>
>> > > > I wonder if someone has a solution for this. It really must be pure
>> > > > logic, eg using DCM or BUFR is not ok, the solution must not use 
>> > > > any
>> > > > fabric flip flops or fabric clocked primitives at all.
>>
>> > > > Antti
>>
>> > > Sounds like you need a master/slave pair of latches (a DFF made out 
>> > > of
>> > > gates).  The only trouble would be convincing the tool to not 
>> > > optimize
>> > > out or complain too harshly about the combinatorial loops.
>>
>> > > Two SR Latches chained together.  The S and R inputs are the normal 
>> > > and
>> > > inverted signal from the other stage (inverted between one pair) and 
>> > > the
>> > > clock acts as a 3rd input - normal on one SR Latch, inverted on the
>> > > other - to each NAND gate to make the latch transparent or to allow 
>> > > the
>> > > latch.
>>
>> > > I'm thinking back to the old TI data book that mapped out all the 
>> > > simple
>> > > TTL logic.  Since you don't need asynchronous set/reset logic, you
>> > > should need just the 4 LUTs.
>>
>> > Two LUTS/FF. Only one LUT required per latch, one for master latch, one
>> > for slave, and . wiring, err routing, delays should be verified.
>> > I'd list the init values, but haven't had my coffee yet this morning.
>> > Just Johnhm, get your coffee and write it down :)
>> and you get extra e-coffe from me.
>>
>> the non-LUT version of the master slave thing did not yield anyuseful
>> result.
>> sounds interesting to have a FF out of 2LUT (if it work also..)
>>
>> Antti- Hide quoted text -- Show quoted text -
>
> (Not recommended for critical applications!)
> Two 4-LUT divider, does not optimise away, sims ok too,
> max clock is route dependent, you probably want to add RLOCs to force
> the same slice for the LUTs, give Clock a chance to have close to
> identical delays to both LUTs, if Clock delays are different, make sure
> the Slave LUT is later.
> You can toss the Reset term and instantiate LUT3s instead of LUT4s.
>
> Let us know if it works for you...
>
> library IEEE;
> use IEEE.STD_LOGIC_1164.ALL;
> library UNISIM;
> use UNISIM.VComponents.all;
> entity Antti_FF is
>    Port ( Clock : in   STD_LOGIC;
>           Reset : in   STD_LOGIC;
>           Div_2 : out  STD_LOGIC);
> end Antti_FF;
> architecture Behavioral of Antti_FF is
>  signal  Master  : std_logic;
>  signal  Slave   : std_logic;
> begin
> Master_LUT: LUT4
> -- Master LUT (First in the Master/Slave FF)
> -- Holds Feedback when clock is high,
> -- Passes Inverted Data when Clock is Low.
> -- Reset available using 4-LUT, can be tied to 0
> -- R=Reset D=Data C=Clock F=Feedback O=Output
> -- R:  1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
> -- C:  1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0
> -- D:  1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0
> -- F:  1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
> -- O:  0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 1
>  generic map ( INIT => x"00A3" )
>  port map (
>    O  => Master,
>    I0 => Master,
>    I1 => Slave,
>    I2 => Clock,
>    I3 => Reset );
> Slave_LUT: LUT4
> -- Slave LUT inverts Clock sense,
> -- Passes Data when Clock is High,
> -- Passes Feedback when clock is low
> -- R:  1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
> -- C:  1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0
> -- D:  1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0
> -- F:  1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
> -- O:  0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0
>  generic map ( INIT => x"00CA" )
>  port map (
>    O  => Slave,
>    I0 => Slave,
>    I1 => Master,
>    I2 => Clock,
>    I3 => Reset );
> Div_2 <= Slave;
> end Behavioral;
>

WAU, now I have a Flif flop with may name on it!
thanks a lot John I will defenetly try it out!

Antti

Article: 112274
Subject: Re: memory init in Altera bitfiles, (like data2mem) is it possible?
From: Jim Granville <no.spam@designtools.maps.co.nz>
Date: Sun, 19 Nov 2006 21:15:30 +1300
Links: << >> << T >> << A >>

Antti Lukats wrote:
> "Jim Granville" <no.spam@designtools.maps.co.nz> schrieb im Newsbeitrag 
>>
>>There are two parts of a design flow.
>>The first I'll call first-pass, which is a full NIOS+Peripherals+Code 
>>build and that must use Altera tools.
>>
>>The _subsequent_ passes, only need to insert the code, they do NOT 
>>require, and should not need any Altera tools, especially any licensed 
>>ones. This second area is Antti's focus.
>>
>>Yes, the _subsequent_ pass insert tool may need some simple help to tell 
>>it which RAM blocks take which code segments
>>[Code, Initialised data, Sizes for error checking  ]
>>but that is easily provided in an ASCII config file the tools read.
>>
>>OR, the first build could place careful signatures (not code) into the 
>>used segments, and the insert-tool could scan for those, and replace them 
>>with the users code. Makes to tool usage simpler still, and the
>>tool might even be able to auto-sense Xilinx/Altera/Lattice streams.
>>
>>-jg
>>
> 
> 
> Hi Jim,
> 
> you are quickthinker I havent even spelled out the "signature" approuch and 
> yet you are suggesting it, yes that is defenetly one possibility, if I think 
> more then its possible the preferred choice for me, to have and tool (maybe 
> even open-source thing) that uses BIT or SOF files that have brams filled 
> with special signature patterns and derives the bram and bitline mapping 
> form the signatures. this should work for most fpgas. sure the bram 
> bitlocation info has to be derived also, for xilinx it is available in LL 
> files, for altera it maybe harder to obtain but should be doable still with 
> some smart scripts :)

  I figured you had probably already thought of this :)

I'm not sure you need bitlocation info, as the signature can be quite 
large (== unlikely to have alias), and you could also report how many
BRAMs were mapped, so the user could spot any false positives and
report them. (so you could fix the signatures), but I'd rate the
chance of you creating an alias-prone signature quite low anyway :)
- A mapping count gives everyone confidance.


You would need to know the checksum calculation.

IIRC, some of the streaming loaders you've described make use of that 
'signature locking' feature in FPGAs, where they can discard run-in
bits, until a preamble is seen.

-jg

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search