Messages from 148250

Article: 148250
Subject: Re: DMA operation to 64-bits PC platform
From: Charles Gardiner <charles.gardiner@invalid.invalid>
Date: Fri, 02 Jul 2010 09:41:45 +0200
Links: << >> << T >> << A >>

Hi Frank,

> The way it works is as follows:
> - the application allocates the memory (malloc).
> - a pointer to this memory is passed to the driver (custom made
> driver).
> - the driver creates a scatter-gather list by using the
> GetScatterGatherList method from the DMA_ADAPTER object.

You are aware of the following text from the Microsoft WDK Docu? Particularily the
first line.

>>
GetScatterGatherList is not a system routine that can be called directly by name.
This routine is callable only by pointer from the address returned in a
DMA_OPERATIONS structure. Drivers obtain the address of this routine by calling
IoGetDmaAdapter.

As soon as the appropriate DMA channel and any necessary map registers are
available, GetScatterGatherList creates a scatter/gather list, initializes the map
registers, and then calls the driver-supplied AdapterListControl routine to carry
out the I/O operation.

GetScatterGatherList combines the actions of the AllocateAdapterChannel and
MapTransfer routines for drivers that perform scatter/gather DMA.
GetScatterGatherList determines how many map registers are required for the
transfer, allocates the map registers, maps the buffers for DMA, and fills in the
scatter/gather list. It then calls the supplied AdapterListControl routine,
passing a pointer to the scatter/gather list in ScatterGather. The driver should
retain this pointer for use when calling PutScatterGatherList. Note that
GetScatterGatherList does not have the queuing restrictions that apply to
AllocateAdapterChannel.

In its AdapterListControl routine, the driver should perform the I/O. On return
from the driver-supplied routine, GetScatterGatherList keeps the map registers but
frees the DMA adapter structure. The driver must call PutScatterGatherList (which
flushes the buffers) before it can access the data in the buffer.
>>

> - the driver writes each entry of the scatter-gather list (which
> contains a physical address and length) to the FPGA.
> - the FPGA receives data (though another interface) and writes this
> data to the memory of the pc by use of DMA (just generates write
> requests).
> - after writing the data the FPGA generates an interrupt of PCIe (not
> working yet, but we know when the FPGA finished a transaction).
> 
> I now understand I have to verify runtime if the physical address is
> below or above 4 GB and use a 3 DW respectively 4 DW TLP header. I
> will change that in the FPGA and give it a try.
> 
> About the addresses, these are correct. We did the following test:
> write the virtual memory from the application and read the memory by
> using the physical addresses in the driver. In the driver we read what
> the application has written.
> 

> Any other suggestions?

If you are convinced the addresses are correct, I would look at two other things.

1) Is you driver completing the request properly IoCompleteRequest()

2) Are the data being cached somewhere, Here, I would try a zero length read (from
the driver. PCIe TLP with length 1 and all BEs zero) on the last address
transferred to memory. Just discard the resulting completion. The PCIe spec says
the system must intrepret this as a flush.

By the way which buffering method is your driver using for the DMA transfer
(Buffered, Direct, Neither)

> 
> Frank

Article: 148251
Subject: Re: Xilinx xapp175, empty + full flag really synchronous?
From: "maxascent" <maxascent@n_o_s_p_a_m.n_o_s_p_a_m.yahoo.co.uk>
Date: Fri, 02 Jul 2010 03:19:30 -0500
Links: << >> << T >> << A >>

I have also designed async fifos with success. There is quite a lot of good
info out there about them you just need to dig and put it together.
Personally I hate using Coregen unless I really have to as it is so
limiting when designing memories.

Jon	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 148252
Subject: PN crashing (64 bit)
From: "Roger" <rogerwilson@hotmail.com>
Date: Fri, 2 Jul 2010 09:45:15 +0100
Links: << >> << T >> << A >>

I have version 12.1 WebPack running on Vista 64. I've created a new project 
(the Stopwatch one in the Tutorial) and am trying to add source files. When 
the Add Source window appears all is OK but when I try to move to a 
different directory at the same level or above the one that was highlighted 
originally, the program crashes completely.

Has anyone else seen this?

Thanks.

Article: 148253
Subject: SPI Flash configuration and data access rate
From: Gladys <yuhui.b@gmail.com>
Date: Fri, 2 Jul 2010 02:30:35 -0700 (PDT)
Links: << >> << T >> << A >>

Hi, I'm implementing an FPGA prototype to do some image processing
such as dead pixel correction. The xilinx FPGA will be configured with
an SPI flash memory,where the .bit configuration file inside as well
as dead pixel locations are stored.

I want to correct the dead pixel while receiving each pixel data,
compare each location with the pre-stored dead pixel locations, if it
turns out to be a dead pixel, I can directly correct it.

The question are:
1How could I access the locations information? should I first read
them out and store in the dualport SDRAM?
2How can I know if the current pixel location equals to one of  the
dead pixel locations? What I can do is to compare current location to
all the locations one by one, however, if there are 100 bad pixel
location, and my input pixel data rate is 25MHZ, I need to have pixel
location data rate of 25MHZ * 100 to make sure I can compare them all
while receiving one pixel data. This is impossibel.

Thanksfor your helps!

Article: 148254
Subject: Re: Xilinx xapp175, empty + full flag really synchronous?
From: firefox3107 <firefox3107@gmail.com>
Date: Fri, 2 Jul 2010 02:32:59 -0700 (PDT)
Links: << >> << T >> << A >>

It seems that everyone has it own point of view that reliabilty
concerns. :(

I wanna add my design to opencores, so it should be technology
independent.

Article: 148255
Subject: Re: SPI Flash configuration and data access rate
From: Gladys <yuhui.b@gmail.com>
Date: Fri, 2 Jul 2010 02:39:49 -0700 (PDT)
Links: << >> << T >> << A >>

Sorry, for the 2nd question, I just forgot that I can use Lool Up
Table, just use CASE statement to implement this.
So the only problem is how can I access the location data from SPI
flash, since these data should be always available for bad pixel
correction while FPGA running.

Article: 148256
Subject: Re: PN crashing (64 bit)
From: "maxascent" <maxascent@n_o_s_p_a_m.n_o_s_p_a_m.yahoo.co.uk>
Date: Fri, 02 Jul 2010 07:10:25 -0500
Links: << >> << T >> << A >>

You may want to check that ISE supports 64-bit Vista otherwise you could
try running in compatibilty mode.

Jon	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 148257
Subject: Re: SPI Flash configuration and data access rate
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Fri, 2 Jul 2010 12:35:20 +0000 (UTC)
Links: << >> << T >> << A >>

Gladys <yuhui.b@gmail.com> wrote:

> Hi, I'm implementing an FPGA prototype to do some image processing
> such as dead pixel correction. The xilinx FPGA will be configured with
> an SPI flash memory,where the .bit configuration file inside as well
> as dead pixel locations are stored.
 
> I want to correct the dead pixel while receiving each pixel data,
> compare each location with the pre-stored dead pixel locations, if it
> turns out to be a dead pixel, I can directly correct it.
 
> The question are:
> 1How could I access the locations information? should I first read
> them out and store in the dualport SDRAM?

I would probably store inside the FPGA, either BRAM or LUT RAM.
There aren't so many dead pixels that it won't fit, right?

> 2How can I know if the current pixel location equals to one of  the
> dead pixel locations? What I can do is to compare current location to
> all the locations one by one, however, if there are 100 bad pixel
> location, and my input pixel data rate is 25MHZ, I need to have pixel
> location data rate of 25MHZ * 100 to make sure I can compare them all
> while receiving one pixel data. This is impossibel.

If they are in the right order, you always know which one is 
coming next.  Just compare one!

Though many FPGA are big enough to do the compares in parallel,
you would need to program in the maximum number such that enough
comparators were generated.

-- glen

Article: 148258
Subject: Re: Xilinx xapp175, empty + full flag really synchronous?
From: Gabor <gabor@alacron.com>
Date: Fri, 2 Jul 2010 06:37:07 -0700 (PDT)
Links: << >> << T >> << A >>

On Jul 2, 5:32=A0am, firefox3107 <firefox3...@gmail.com> wrote:
> It seems that everyone has it own point of view that reliabilty
> concerns. :(
>
> I wanna add my design to opencores, so it should be technology
> independent.

One clock latency for an empty flag is not easy to achieve.  The
standard method uses binary address counters followed by binary
to Gray code conversion.  This then needs to be registered in
the original clock domain to avoid glitches at the comparator
interface.  To get lower latency you would need to build synchronous
Gray code counters, which takes a bit more logic.  The additional
combinatorial latency for generating large Gray counters can
then reduce the maximum operating frequency of the FIFO.

Regards,
Gabor

Article: 148259
Subject: Re: DMA operation to 64-bits PC platform
From: Frank van Eijkelenburg <fei.technolution@gmail.com>
Date: Fri, 2 Jul 2010 07:41:37 -0700 (PDT)
Links: << >> << T >> << A >>

On Jul 2, 9:41=A0am, Charles Gardiner <charles.gardi...@invalid.invalid>
wrote:
> Hi Frank,
>
> > The way it works is as follows:
> > - the application allocates the memory (malloc).
> > - a pointer to this memory is passed to the driver (custom made
> > driver).
> > - the driver creates a scatter-gather list by using the
> > GetScatterGatherList method from the DMA_ADAPTER object.
>
> You are aware of the following text from the Microsoft WDK Docu? Particul=
arily the
> first line.
>
>
>
> GetScatterGatherList is not a system routine that can be called directly =
by name.
> This routine is callable only by pointer from the address returned in a
> DMA_OPERATIONS structure. Drivers obtain the address of this routine by c=
alling
> IoGetDmaAdapter.
>
> As soon as the appropriate DMA channel and any necessary map registers ar=
e
> available, GetScatterGatherList creates a scatter/gather list, initialize=
s the map
> registers, and then calls the driver-supplied AdapterListControl routine =
to carry
> out the I/O operation.
>
> GetScatterGatherList combines the actions of the AllocateAdapterChannel a=
nd
> MapTransfer routines for drivers that perform scatter/gather DMA.
> GetScatterGatherList determines how many map registers are required for t=
he
> transfer, allocates the map registers, maps the buffers for DMA, and fill=
s in the
> scatter/gather list. It then calls the supplied AdapterListControl routin=
e,
> passing a pointer to the scatter/gather list in ScatterGather. The driver=
 should
> retain this pointer for use when calling PutScatterGatherList. Note that
> GetScatterGatherList does not have the queuing restrictions that apply to
> AllocateAdapterChannel.
>
> In its AdapterListControl routine, the driver should perform the I/O. On =
return
> from the driver-supplied routine, GetScatterGatherList keeps the map regi=
sters but
> frees the DMA adapter structure. The driver must call PutScatterGatherLis=
t (which
> flushes the buffers) before it can access the data in the buffer.
>
>
>
>
>
> > - the driver writes each entry of the scatter-gather list (which
> > contains a physical address and length) to the FPGA.
> > - the FPGA receives data (though another interface) and writes this
> > data to the memory of the pc by use of DMA (just generates write
> > requests).
> > - after writing the data the FPGA generates an interrupt of PCIe (not
> > working yet, but we know when the FPGA finished a transaction).
>
> > I now understand I have to verify runtime if the physical address is
> > below or above 4 GB and use a 3 DW respectively 4 DW TLP header. I
> > will change that in the FPGA and give it a try.
>
> > About the addresses, these are correct. We did the following test:
> > write the virtual memory from the application and read the memory by
> > using the physical addresses in the driver. In the driver we read what
> > the application has written.
>
> > Any other suggestions?
>
> If you are convinced the addresses are correct, I would look at two other=
 things.
>
> 1) Is you driver completing the request properly IoCompleteRequest()
>
> 2) Are the data being cached somewhere, Here, I would try a zero length r=
ead (from
> the driver. PCIe TLP with length 1 and all BEs zero) on the last address
> transferred to memory. Just discard the resulting completion. The PCIe sp=
ec says
> the system must intrepret this as a flush.
>
> By the way which buffering method is your driver using for the DMA transf=
er
> (Buffered, Direct, Neither)
>
>
>
> > Frank

Hi Charles,

I am not sure if we understand each other. What do you mean by
completing the request with IoCompleteRequest? There is no request
from software point of view. The FPGA will do a DMA write (data from
FPGA to PC memory) at its own initiative. The allocated memory is used
as long as the software is running. I do not allocate new memory for
each new DMA transfer, but at startup a large piece of memory is
allocated and the physical addresses are written to the FPGA by the
driver software.

And yes, we use a DMA adapter in combination with the
GetScatterGatherList method. We already used this in another project
but that was PCI and DMA read (data from PC memory to FPGA).

By the way, where can I set the type of DMA?

best regards,

Frank

Article: 148260
Subject: Re: Xilinx xapp175, empty + full flag really synchronous?
From: Peter Alfke <alfke@sbcglobal.net>
Date: Fri, 2 Jul 2010 11:34:10 -0700 (PDT)
Links: << >> << T >> << A >>

On Jul 2, 6:37=A0am, Gabor <ga...@alacron.com> wrote:
> On Jul 2, 5:32=A0am, firefox3107 <firefox3...@gmail.com> wrote:
>
> > It seems that everyone has it own point of view that reliabilty
> > concerns. :(
>
> > I wanna add my design to opencores, so it should be technology
> > independent.
>
> One clock latency for an empty flag is not easy to achieve. =A0The
> standard method uses binary address counters followed by binary
> to Gray code conversion. =A0This then needs to be registered in
> the original clock domain to avoid glitches at the comparator
> interface. =A0To get lower latency you would need to build synchronous
> Gray code counters, which takes a bit more logic. =A0The additional
> combinatorial latency for generating large Gray counters can
> then reduce the maximum operating frequency of the FIFO.
>
> Regards,
> Gabor

Please allow me to make some comments.
I have been involved in FIFO design since 1970, starting with the
Fairchild 3341, and ending with the industry-first hard-coded FPGA
FIFO designs at Xilinx.

Given true-dual-ported BlockRAMs with independent addressing,
clocking, and control, most of a FIFO design seems to be trivial and
obvious. The devil is in the generation of the control flags Full and
Empty. These flags must be fast and also reliable, i.e. glitch-free.
(The designs are symmetrical, so I will only cover Empty.)
The flags are driven by an identity comparison of the two addresses,
therefore it is wise to use Gray codes. It may also be desirable to
have binary addresses available for arithmetic calculations (almost
empty, Dipstick etc). The simplest solution is to count binary, but
synchronously convert to Gray, so that the two results appear
simultaneously at their respective flip-flop outputs.
That means there is no latency or speed penalty whatsoever for the
conversion. It just costs one extra flip-flop+gate per bit, negligible
in a custom design.
The onset of Empty is caused by the read clock, and is thus naturally
synchronous to the Read side of the design.
The trailing edge of Empty is, however, caused by the write clock.
Without careful synchronization, Empty would have a non-synchronous
trailing edge, which would inevitably lead to malfunction, sooner or
later.
Most of the sometimes acrimonious debates about safe design center
around the synchronization of the trailing edge,and how to avoid
metastability problems. Nobody should bemoan the fact that this
synchronizer consumes time, that Empty thus has a tendency to linger
on for, say, a clock period or even two. That delay does not really
sacrifice performance. (Any delay of the leading edge of Empty would
of course directly affect performance).
We tested the Xilinx hard-coded design by writing at 200 MHz and
asynchronously reading at approx. 500 MHz, so that the Empty flag was
exercised 200 million times per second, and we monitored the address
counters. There was not a single error in a week of testing (10 exp 14
operations) with a forever changing sub-femtosecond timing granularity
between the two clocks. So much about reliability and metastabilty.
(Yes, the Virtex 4 version has a separate unrelated subtle problem,
solved with a well-documented work-around)

Hope this helps somebody.
Peter Alfke, formerly Xilinx Applications.

Article: 148261
Subject: Re: DMA operation to 64-bits PC platform
From: Charles Gardiner <charles.gardiner@invalid.invalid>
Date: Fri, 02 Jul 2010 21:04:03 +0200
Links: << >> << T >> << A >>

Hi Frank,

> 
> I am not sure if we understand each other. 

Yes, it certainly sounds like that.

> What do you mean by
> completing the request with IoCompleteRequest? There is no request
> from software point of view. 

I think this might clear up the reason why your data is missing. (See also below
about the type of DMA). I don't think the S/G list you are getting is describing
your application buffer. This is best done by specifying DO_DIRECT_IO as the DMA
method for your device. If you specify DO_BUFFERED_IO you will get an S/G List
describing an intermediate buffer in kernel space and this probably never gets
copied over to your application space buffer unless you terminate the request.
I've never done the 'neither' method myself and from what I hear, it's a
complicated beast.

> The FPGA will do a DMA write (data from
> FPGA to PC memory) at its own initiative. The allocated memory is used
> as long as the software is running. I do not allocate new memory for
> each new DMA transfer, but at startup a large piece of memory is
> allocated and the physical addresses are written to the FPGA by the
> driver software.

Sounds like you are doing something like a circular buffer in memory which stays
alive as long as your device does?

> 
> And yes, we use a DMA adapter in combination with the
> GetScatterGatherList method. We already used this in another project
> but that was PCI and DMA read (data from PC memory to FPGA).
> 
> By the way, where can I set the type of DMA?

Typically, you set the DMA buffering method in your AddDevice function after you
create your device object. Quoting from Oney's book,

NTSTATUS AddDevice(..) {
   PDEVICE_OBJECT    fdo;

   IoCreateDevice(....., &fdo);
   fdo->Flags |= DO_BUFFERED_IO;
            <or>
   fdo->Flags |= DO_DIRECT_IO;
            <or>
   fdo->Flags |= 0;	// i.e. neither Direct nor Buffered

And, you can't change your mind afterwards.


By the way if my assumption about the circular buffer in your design is correct,
there is a slightly more standard solution (standard in the sense that everybody
on the microsoft drivers newgroup seems to do it). It however requires two threads
in your application. The first one requests a buffer (using new or malloc) and
sets up an I/O Request ReadFile, WriteFile or DeviceIoControl referencing this
buffer. This is performed as an asynchronous request.

The driver recognises this request and pends it indefinitely, (typically terminate
it when your driver is shutting down, otherwise windows will probably hang).
Pending the request has the nice side effect that the buffer now becomes locked
down permanently.

Assuming you have set up your driver to use DO_DIRECT_IO DMA, you can get the S/G
list describing the application space buffer as you are currently doing and feed
this to your FPGA.

Using the second thread in your application you can constantly read data from the
locked down pages (you app. space buffer) that are being written by your FPGA.


Assuming the DO_DIRECT_IO solves your problem (I think there is a good chance), I
would however still consider migrating to a KMDF based driver, particularily if
you are writing a new one. It's much easier to maintain and is probably more
portable for future MS versions.

> 
> best regards,
> 
> Frank

best regards,
Charles

Article: 148262
Subject: Re: carrier tracking over zero frequency point
From: "kadhiem_ayob" <kadhiem_ayob@n_o_s_p_a_m.n_o_s_p_a_m.yahoo.co.uk>
Date: Fri, 02 Jul 2010 16:04:24 -0500
Links: << >> << T >> << A >>

To further explain my problem:

I use an NCO to generate stimulus of cos/sin pair, multiply with the QAM
i/q symbols...then track the carrier as I change the NCO frequency in
steps.

When crossing zero frequency, tracking manages first few crosses then may
fail but relocks. Looking at the waveforms at zero crossing point, I get 
into some doubts about my understanding in principle. I know we can get
negative frequency from positive frequency by inverting the sine wave or by
reading the LUT in reverse time. These two ways are not identical at the
crossing point. My tracking is based on reversing LUT for -f and so is my
NCO.
With reverse LUT reading, I notice that phase can suffer variation at the
crossing; for example if I hit the mid-poit of a symmetrical section of
waves then read back to get -f then the continuity looks best. At other
extreme I could be before the peak of cosine only to reverse back leading
to anything between 0 ~ 360 phase variation of negative pair with respect
to positive pair.
The question is how can I expect tracking not to fail with such phase
variation. Or in other words; how does a real RF upconverter/downconverter
link final frequency cross the zero and how to model it best?

Regards

kadhiem
	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 148263
Subject: Re: DMA operation to 64-bits PC platform
From: nico@puntnl.niks (Nico Coesel)
Date: Fri, 02 Jul 2010 21:35:12 GMT
Links: << >> << T >> << A >>

Frank van Eijkelenburg <fei.technolution@gmail.com> wrote:

>On Jul 2, 2:19=A0am, Charles Gardiner <charles.gardi...@invalid.invalid>
>wrote:
>> Frank van Eijkelenburg schrieb:
>>
>> > Hi,
>>
>> > I have a custom made PCIe board with a Virtex 5 FPGA on which I
>> > implemented a DMA unit which uses the PCIe endpoint block plus v1.14.
>> > I also implemented simple read/write operations from the PC to the
>> > board (the board responds with completion TLPs). The read/write
>> > operations are working, DMA is not working
>>
>> > The board is inserted in a pc with Windows 7 64 bits platform. An
>> > application allocates virtual memory and passes the memory block to
>> > the driver. The driver locks the memory and converts the virtual
>> > addresses into physical addresses. These physical addresses are
>> > written to the FPGA.
>>
>> How are you doing this? Normally, an application requests a buffer using =
>malloc()
>> or new() and gets a handle to the driver using CreateFile(). You then use
>> WriteFile(hDevice, Buffer,...), ReadFile(hDevice, Buffer,....) or
>> DeviceIoControl() to initiate a transfer to/from =A0the device. Thats the
>> application side.
>>
>> On the driver(kernel) side, I would strongly recommend that you write a K=
>MDF based
>> driver. Download the windows WDK, all it costs is your email. (You have t=
>o log in
>> over Microsoft Connect, last time I looked). There are lots of examples t=
>here,
>> including for PCI(e) based DMA. To (very quickly) summarise, your driver =
>requests
>> the scatter/gather list describing the buffers (see
>> WdfDmaTransactionInitializeUsingRequest() in the WDK API docs as a starti=
>ng point)
>> above and passes these to your hardware one-by-one which then does DMA in=
> or out.
>> With a call to WdfRequestComplete the buffers are released by the kernel =
>and your
>> application can reuse them or free them up as required. (This is of cours=
>e all
>> considerably more than a days work, by the way.)
>>
>> You do not have to explicitly lock down the buffer yourself. Windows does=
> this for
>> you while the I/O request is active. (Read/WriteFile from your app up to
>> WdfRequestComplete from the driver)
>>
>>
>>
>> > When I start an DMA operation, I can see in chipscope the correct
>> > physical addresses in the TLP header. However, I do not see the
>> > correct values in the allocated memory. What can I do to check where
>> > it is going wrong?
>>
>> In this case, I would first doubt whether the addresses are correct.
>>
>> > Another question is about the memory request TLPs. What should I use,
>> > 32 or 64 bit write requests? Or do I have to check runtime if the
>> > physical memory address is below or above the 4 GB (and use
>> > respectively 32 and 64 bit requests)?
>>
>> The PCIe spec says: a transfer below 4 GB must use a 3 DWord header, a tr=
>ansfer
>> above 4 GB must use a 4 DWord header. i.e. a four dword header wth addres=
>s[63:32]
>> set to zero is invalid.
>>
>>
>>
>> > Thanks in advance,
>>
>> > Frank
>
>The way it works is as follows:
>- the application allocates the memory (malloc).
>- a pointer to this memory is passed to the driver (custom made
>driver).

I strongly doubt you can use a malloc pointer to a driver. Actually
I'm quite sure this doesn't work. When the driver is active, the
application memory may be swapped to the hard-drive. And the pointer
must be translated to a physical address.

I'd go the other way around: have the driver allocate the memory and
pass a pointer to this memory to the application (this will require
some messing around with translation and access rights).

-- 
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------

Article: 148264
Subject: Re: SPI Flash configuration and data access rate
From: John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com>
Date: Fri, 02 Jul 2010 14:58:07 -0700
Links: << >> << T >> << A >>

On Fri, 2 Jul 2010 12:35:20 +0000 (UTC), glen herrmannsfeldt
<gah@ugcs.caltech.edu> wrote:

>Gladys <yuhui.b@gmail.com> wrote:
>
>> Hi, I'm implementing an FPGA prototype to do some image processing
>> such as dead pixel correction. The xilinx FPGA will be configured with
>> an SPI flash memory,where the .bit configuration file inside as well
>> as dead pixel locations are stored.
> 
>> I want to correct the dead pixel while receiving each pixel data,
>> compare each location with the pre-stored dead pixel locations, if it
>> turns out to be a dead pixel, I can directly correct it.
> 
>> The question are:
>> 1How could I access the locations information? should I first read
>> them out and store in the dualport SDRAM?
>
>I would probably store inside the FPGA, either BRAM or LUT RAM.
>There aren't so many dead pixels that it won't fit, right?
>
>> 2How can I know if the current pixel location equals to one of  the
>> dead pixel locations? What I can do is to compare current location to
>> all the locations one by one, however, if there are 100 bad pixel
>> location, and my input pixel data rate is 25MHZ, I need to have pixel
>> location data rate of 25MHZ * 100 to make sure I can compare them all
>> while receiving one pixel data. This is impossibel.
>
>If they are in the right order, you always know which one is 
>coming next.  Just compare one!
>
>Though many FPGA are big enough to do the compares in parallel,
>you would need to program in the maximum number such that enough
>comparators were generated.
>
>-- glen

Or use a block ram as a direct bad-pixel map, assuming you have as
many bits of ram available as pixels in the image.

At load time, a short (x,y) list of bad pixels could be read from
serial flash and use to set the tag bits in the ram.

John

Article: 148265
Subject: Re: SPI Flash configuration and data access rate
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Fri, 2 Jul 2010 22:07:30 +0000 (UTC)
Links: << >> << T >> << A >>

John Larkin <jjlarkin@highnotlandthistechnologypart.com> wrote:
(snip, I wrote)

>>I would probably store inside the FPGA, either BRAM or LUT RAM.
>>There aren't so many dead pixels that it won't fit, right?
(snip)

> Or use a block ram as a direct bad-pixel map, assuming you have as
> many bits of ram available as pixels in the image.

Yes you could do that.  As the number of pixels per display is
ever increasing, though, it might not be so convenient.

> At load time, a short (x,y) list of bad pixels could be read from
> serial flash and use to set the tag bits in the ram.

OK, so parallel load a word from BRAM, and shift it at the
pixel clock rate.  It might be that is faster than a comparator, 
though the comparator could be pipelined if needed, for speed.
The BRAM load would have to be carefully pipelined to arrive
at the appropriate time.

As far as I know, most set a low maximum for the number of
bad pixels.  I believe my monitor has none, though I might not
notice a black pixel.

-- glen

Article: 148266
Subject: Re: carrier tracking over zero frequency point
From: "kadhiem_ayob" <kadhiem_ayob@n_o_s_p_a_m.n_o_s_p_a_m.yahoo.co.uk>
Date: Sat, 03 Jul 2010 08:51:59 -0500
Links: << >> << T >> << A >>

I believe I finally sorted out the issue thanks to Matlab.

A LUT based approach for frquency synthesis is ok for positive or negative
frequency synthesis. However it is very glitchy at zero cross over. The
right model is to generate 2 frequencies(one drifting down, the other up so
that they criss-cross). The two frequency sources are then multiplied
together to produce proper vector for zero crossing.

Regards

kadhiem	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Article: 148267
Subject: Re: Xilinx BULLSHITIX-8, when?
From: John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com>
Date: Sat, 03 Jul 2010 08:10:00 -0700
Links: << >> << T >> << A >>

On Wed, 30 Jun 2010 18:50:42 -0700 (PDT), Bryan
<bryan.fletcher@avnet.com> wrote:

>I work for Avnet, which seems not to be too popular with this crowd,
>but I will share my experience anyway.  I have a project with
>XC6SLX16-2CSG324 and LPDDR that seems to work well with MIG 3.3 in ISE
>11.4.  Granted, we are only running at 200 MHz.  We do not provide a
>200 MHz input to the chip.  We have a 66 MHz oscillator input to the
>FPGA.  It is true that by default, MIG generates a design that assumes
>the native system clock is the same as the memory clock.  The only
>clocking customization that MIG allows is the choice between single-
>ended or differential clock.  However, since MIG provides all the HDL
>sources for the clock infrastructure, it is possible to modify the
>clocking structure to generate the correct memory clock given any
>system clock that meets the specifications of the PLL.
>
>I have some instructions that explains step-by-step how to do this for
>the Avnet board (www.em.avnet.com/spartan6lx-evl).  If you are
>interested, please contact Avnet Technical Support (www.em.avnet.com/
>techsupport).  In addition to this LPDDR example, Xilinx provides
>working hardware examples for DDR2 on the SP601 and DDR3 on the
>SP605.  Avnet has another board with DDR3 that has been proven out at
>800 Mbps in hardware (www.em.avnet.com/spartan6lx150t-dev).
>
>The other critical thing to do with these DDR designs is proper PCB
>layout and termination, without which the design will fail.  Xilinx
>provides some very specific layout guidelines in UG388 that need to be
>followed if you want the full memory interface performance.
>
>Xilinx recently published revised specifications for the MCB.  See
>http://www.xilinx.com/support/answers/35818.htm
>The Spartan-6 Memory Controller Block (MCB) has new data rate
>specifications and performance modes for DDR2 and DDR3 interfaces as
>specified in version 1.5 of the Spartan-6 FPGA Data Sheet (DS162):
>http://www.xilinx.com/support/documentation/data_sheets/ds162.pdf
>
>You should also be aware of the MIG Design Advisory Answer Record.
>http://www.xilinx.com/support/answers/33566.htm
>
>Bryan
>Avnet

Bryan,

We (I work with Rob) have contacted our tech support guy at Avnet, and
asked him for your doc. After about six interchanges, we still can't
get our hands on it.

John

Article: 148268
Subject: Re: fooling the compiler
From: John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com>
Date: Sat, 03 Jul 2010 08:13:53 -0700
Links: << >> << T >> << A >>

On Fri, 25 Jun 2010 17:20:21 GMT, nico@puntnl.niks (Nico Coesel)
wrote:

>John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote:
>
>>
>>
>>We have a Spartan6/45 that's talking to 16 separate SPI A/D
>>converters. The data we get back is different, but the clock and chip
>>select timings are the same. To get the timing right, avoiding routing
>>delays, we need our outgoing stuff to be reclocked by i/o cell
>>flipflops.
>>
>>So what happens is that we have one state machine running all 16 SPI
>>interfaces. We tell the software that we want the adc chip select
>>flops in i/o cells. The compiler decides that all are seeing the same
>>input signal, so reduces them to one flipflop. Then it concludes that
>>that flipflop can't be in an i/o block, and builds it that way. The
>>resulting routing delays are deadly.
>>
>>We couldn't find a way to force these 16 flops into IOBs. Really.
>
>Constraints usually help. In that case it should duplicate logic (if
>this option is on) to meet timing specifications.

Turns out, according to Xilinx, that IOB=TRUE (which is a suggestion
to the compiler) works, but IOB=FORCE (which is supposed to be
mandatory) doesn't. We just left the shift register in there.

John

Article: 148269
Subject: Re: fooling the compiler
From: "krw@att.bizzzzzzzzzzzz" <krw@att.bizzzzzzzzzzzz>
Date: Sat, 03 Jul 2010 10:28:32 -0500
Links: << >> << T >> << A >>

On Sat, 03 Jul 2010 08:13:53 -0700, John Larkin
<jjlarkin@highNOTlandTHIStechnologyPART.com> wrote:

>On Fri, 25 Jun 2010 17:20:21 GMT, nico@puntnl.niks (Nico Coesel)
>wrote:
>
>>John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote:
>>
>>>
>>>
>>>We have a Spartan6/45 that's talking to 16 separate SPI A/D
>>>converters. The data we get back is different, but the clock and chip
>>>select timings are the same. To get the timing right, avoiding routing
>>>delays, we need our outgoing stuff to be reclocked by i/o cell
>>>flipflops.
>>>
>>>So what happens is that we have one state machine running all 16 SPI
>>>interfaces. We tell the software that we want the adc chip select
>>>flops in i/o cells. The compiler decides that all are seeing the same
>>>input signal, so reduces them to one flipflop. Then it concludes that
>>>that flipflop can't be in an i/o block, and builds it that way. The
>>>resulting routing delays are deadly.
>>>
>>>We couldn't find a way to force these 16 flops into IOBs. Really.
>>
>>Constraints usually help. In that case it should duplicate logic (if
>>this option is on) to meet timing specifications.
>
>Turns out, according to Xilinx, that IOB=TRUE (which is a suggestion
>to the compiler) works, but IOB=FORCE (which is supposed to be
>mandatory) doesn't. We just left the shift register in there.

I'm surprised that works.  It didn't a couple of years ago, when I last used
Xilinx stuff.  The other thing to watch is tristate forcing.  I found they had
to be in the top level of the hierarchy to work right.  Maybe that was just a
problem with Virtex4 and the PPC stuff, though.

Article: 148270
Subject: Re: carrier tracking over zero frequency point
From: glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date: Sat, 3 Jul 2010 17:03:55 +0000 (UTC)
Links: << >> << T >> << A >>

kadhiem_ayob <kadhiem_ayob@n_o_s_p_a_m.n_o_s_p_a_m.yahoo.co.uk> wrote:
> I believe I finally sorted out the issue thanks to Matlab.

> A LUT based approach for frquency synthesis is ok for positive or negative
> frequency synthesis. However it is very glitchy at zero cross over. The
> right model is to generate 2 frequencies(one drifting down, the other up so
> that they criss-cross). The two frequency sources are then multiplied
> together to produce proper vector for zero crossing.

Reminds me of an HP signal generator, I believe 3325B, which uses
a combination of frequency generators and mixers to generate the
range of frequencies from 1uHz to 20.999999999 MHz.

I used to know more of the details about how it works, but part
rembering and part looking on the web, it generates 30MHz more than
the desired frequency, and then mixes down for the resulting output.

-- glen

Article: 148271
Subject: Re: fooling the compiler
From: nico@puntnl.niks (Nico Coesel)
Date: Sat, 03 Jul 2010 17:32:28 GMT
Links: << >> << T >> << A >>

John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote:

>On Fri, 25 Jun 2010 17:20:21 GMT, nico@puntnl.niks (Nico Coesel)
>wrote:
>
>>John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote:
>>
>>>
>>>
>>>We have a Spartan6/45 that's talking to 16 separate SPI A/D
>>>converters. The data we get back is different, but the clock and chip
>>>select timings are the same. To get the timing right, avoiding routing
>>>delays, we need our outgoing stuff to be reclocked by i/o cell
>>>flipflops.
>>>
>>>So what happens is that we have one state machine running all 16 SPI
>>>interfaces. We tell the software that we want the adc chip select
>>>flops in i/o cells. The compiler decides that all are seeing the same
>>>input signal, so reduces them to one flipflop. Then it concludes that
>>>that flipflop can't be in an i/o block, and builds it that way. The
>>>resulting routing delays are deadly.
>>>
>>>We couldn't find a way to force these 16 flops into IOBs. Really.
>>
>>Constraints usually help. In that case it should duplicate logic (if
>>this option is on) to meet timing specifications.
>
>Turns out, according to Xilinx, that IOB=TRUE (which is a suggestion
>to the compiler) works, but IOB=FORCE (which is supposed to be
>mandatory) doesn't. We just left the shift register in there.

Another way to force flipflops in an IOB is to specify a short delay
for the output flip-flop to pad path.

-- 
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
nico@nctdevpuntnl (punt=.)
--------------------------------------------------------------

Article: 148272
Subject: Re: fooling the compiler
From: "krw@att.bizzzzzzzzzzzz" <krw@att.bizzzzzzzzzzzz>
Date: Sat, 03 Jul 2010 12:46:42 -0500
Links: << >> << T >> << A >>

On Sat, 03 Jul 2010 17:32:28 GMT, nico@puntnl.niks (Nico Coesel) wrote:

>John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote:
>
>>On Fri, 25 Jun 2010 17:20:21 GMT, nico@puntnl.niks (Nico Coesel)
>>wrote:
>>
>>>John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote:
>>>
>>>>
>>>>
>>>>We have a Spartan6/45 that's talking to 16 separate SPI A/D
>>>>converters. The data we get back is different, but the clock and chip
>>>>select timings are the same. To get the timing right, avoiding routing
>>>>delays, we need our outgoing stuff to be reclocked by i/o cell
>>>>flipflops.
>>>>
>>>>So what happens is that we have one state machine running all 16 SPI
>>>>interfaces. We tell the software that we want the adc chip select
>>>>flops in i/o cells. The compiler decides that all are seeing the same
>>>>input signal, so reduces them to one flipflop. Then it concludes that
>>>>that flipflop can't be in an i/o block, and builds it that way. The
>>>>resulting routing delays are deadly.
>>>>
>>>>We couldn't find a way to force these 16 flops into IOBs. Really.
>>>
>>>Constraints usually help. In that case it should duplicate logic (if
>>>this option is on) to meet timing specifications.
>>
>>Turns out, according to Xilinx, that IOB=TRUE (which is a suggestion
>>to the compiler) works, but IOB=FORCE (which is supposed to be
>>mandatory) doesn't. We just left the shift register in there.
>
>Another way to force flipflops in an IOB is to specify a short delay
>for the output flip-flop to pad path.

The problem I see with that is that it can't be verified (what can, I suppose)
until after PAR is run. By using synthesis attributes the results can be seen
in the technology view (or whatever they call it today).

Article: 148273
Subject: Re: DMA operation to 64-bits PC platform
From: Michael S <already5chosen@yahoo.com>
Date: Sat, 3 Jul 2010 11:20:16 -0700 (PDT)
Links: << >> << T >> << A >>

On Jul 2, 11:35=A0pm, n...@puntnl.niks (Nico Coesel) wrote:
> Frank van Eijkelenburg <fei.technolut...@gmail.com> wrote:
>
>
>
> >On Jul 2, 2:19=3DA0am, Charles Gardiner <charles.gardi...@invalid.invali=
d>
> >wrote:
> >> Frank van Eijkelenburg schrieb:
>
> >> > Hi,
>
> >> > I have a custom made PCIe board with a Virtex 5 FPGA on which I
> >> > implemented a DMA unit which uses the PCIe endpoint block plus v1.14=
.
> >> > I also implemented simple read/write operations from the PC to the
> >> > board (the board responds with completion TLPs). The read/write
> >> > operations are working, DMA is not working
>
> >> > The board is inserted in a pc with Windows 7 64 bits platform. An
> >> > application allocates virtual memory and passes the memory block to
> >> > the driver. The driver locks the memory and converts the virtual
> >> > addresses into physical addresses. These physical addresses are
> >> > written to the FPGA.
>
> >> How are you doing this? Normally, an application requests a buffer usi=
ng =3D
> >malloc()
> >> or new() and gets a handle to the driver using CreateFile(). You then =
use
> >> WriteFile(hDevice, Buffer,...), ReadFile(hDevice, Buffer,....) or
> >> DeviceIoControl() to initiate a transfer to/from =3DA0the device. That=
s the
> >> application side.
>
> >> On the driver(kernel) side, I would strongly recommend that you write =
a K=3D
> >MDF based
> >> driver. Download the windows WDK, all it costs is your email. (You hav=
e t=3D
> >o log in
> >> over Microsoft Connect, last time I looked). There are lots of example=
s t=3D
> >here,
> >> including for PCI(e) based DMA. To (very quickly) summarise, your driv=
er =3D
> >requests
> >> the scatter/gather list describing the buffers (see
> >> WdfDmaTransactionInitializeUsingRequest() in the WDK API docs as a sta=
rti=3D
> >ng point)
> >> above and passes these to your hardware one-by-one which then does DMA=
 in=3D
> > or out.
> >> With a call to WdfRequestComplete the buffers are released by the kern=
el =3D
> >and your
> >> application can reuse them or free them up as required. (This is of co=
urs=3D
> >e all
> >> considerably more than a days work, by the way.)
>
> >> You do not have to explicitly lock down the buffer yourself. Windows d=
oes=3D
> > this for
> >> you while the I/O request is active. (Read/WriteFile from your app up =
to
> >> WdfRequestComplete from the driver)
>
> >> > When I start an DMA operation, I can see in chipscope the correct
> >> > physical addresses in the TLP header. However, I do not see the
> >> > correct values in the allocated memory. What can I do to check where
> >> > it is going wrong?
>
> >> In this case, I would first doubt whether the addresses are correct.
>
> >> > Another question is about the memory request TLPs. What should I use=
,
> >> > 32 or 64 bit write requests? Or do I have to check runtime if the
> >> > physical memory address is below or above the 4 GB (and use
> >> > respectively 32 and 64 bit requests)?
>
> >> The PCIe spec says: a transfer below 4 GB must use a 3 DWord header, a=
 tr=3D
> >ansfer
> >> above 4 GB must use a 4 DWord header. i.e. a four dword header wth add=
res=3D
> >s[63:32]
> >> set to zero is invalid.
>
> >> > Thanks in advance,
>
> >> > Frank
>
> >The way it works is as follows:
> >- the application allocates the memory (malloc).
> >- a pointer to this memory is passed to the driver (custom made
> >driver).
>
> I strongly doubt you can use a malloc pointer to a driver. Actually
> I'm quite sure this doesn't work. When the driver is active, the
> application memory may be swapped to the hard-drive. And the pointer
> must be translated to a physical address.
>


Nah, malloc() at application level is o.k.
If I/O operation is specified as DIRECT_IO then I/O manager takes care
of locking the pages.
If operation is specified as NEITHER then driver itself should call
MmProbeAndLockPages in user context (in this case you should never
install filter drivers in between your driver and app).
In both cases it is very important to not complete the IRP associated
with the user buffer until the finish all DMA activities.

If I/O operation is specified as BUFFERED_IO then I/O manager
allocates kernel buffer and passes it to the driver and copies the
results from kernel to user buffer after driver completed the IRP.
Obviously, BUFFERED_IO is not suitable for OPs case, since he want the
result back without completing the original I/O request.

> I'd go the other way around: have the driver allocate the memory and
> pass a pointer to this memory to the application (this will require
> some messing around with translation and access rights).
>
> --
> Failure does not prove something is impossible, failure simply
> indicates you are not using the right tools...
> nico@nctdevpuntnl (punt=3D.)
> --------------------------------------------------------------

On general-purpose system allocation of big buffers by driver is
rarely a good idea. On the system dedicated to just one task it could
be pragmatically o.k, but I still don't like it from pure theoretical
point of view.

Anyway, the discussion doesn't belong here. I recommend
http://groups.google.com/group/microsoft.public.development.device.drivers

Article: 148274
Subject: Re: DMA operation to 64-bits PC platform
From: Michael S <already5chosen@yahoo.com>
Date: Sat, 3 Jul 2010 11:34:54 -0700 (PDT)
Links: << >> << T >> << A >>

On Jul 2, 9:04=A0pm, Charles Gardiner <charles.gardi...@invalid.invalid>
wrote:
> Hi Frank,
>
>
>
> > I am not sure if we understand each other.
>
> Yes, it certainly sounds like that.
>
> > What do you mean by
> > completing the request with IoCompleteRequest? There is no request
> > from software point of view.
>
> I think this might clear up the reason why your data is missing. (See als=
o below
> about the type of DMA). I don't think the S/G list you are getting is des=
cribing
> your application buffer. This is best done by specifying DO_DIRECT_IO as =
the DMA
> method for your device. If you specify DO_BUFFERED_IO you will get an S/G=
 List
> describing an intermediate buffer in kernel space and this probably never=
 gets
> copied over to your application space buffer unless you terminate the req=
uest.
> I've never done the 'neither' method myself and from what I hear, it's a
> complicated beast.
>
> > The FPGA will do a DMA write (data from
> > FPGA to PC memory) at its own initiative. The allocated memory is used
> > as long as the software is running. I do not allocate new memory for
> > each new DMA transfer, but at startup a large piece of memory is
> > allocated and the physical addresses are written to the FPGA by the
> > driver software.
>
> Sounds like you are doing something like a circular buffer in memory whic=
h stays
> alive as long as your device does?
>
>
>
> > And yes, we use a DMA adapter in combination with the
> > GetScatterGatherList method. We already used this in another project
> > but that was PCI and DMA read (data from PC memory to FPGA).
>
> > By the way, where can I set the type of DMA?
>
> Typically, you set the DMA buffering method in your AddDevice function af=
ter you
> create your device object. Quoting from Oney's book,
>
> NTSTATUS AddDevice(..) {
> =A0 =A0PDEVICE_OBJECT =A0 =A0fdo;
>
> =A0 =A0IoCreateDevice(....., &fdo);
> =A0 =A0fdo->Flags |=3D DO_BUFFERED_IO;
> =A0 =A0 =A0 =A0 =A0 =A0 <or>
> =A0 =A0fdo->Flags |=3D DO_DIRECT_IO;
> =A0 =A0 =A0 =A0 =A0 =A0 <or>
> =A0 =A0fdo->Flags |=3D 0; =A0// i.e. neither Direct nor Buffered
>
> And, you can't change your mind afterwards.
>
> By the way if my assumption about the circular buffer in your design is c=
orrect,
> there is a slightly more standard solution (standard in the sense that ev=
erybody
> on the microsoft drivers newgroup seems to do it). It however requires tw=
o threads
> in your application. The first one requests a buffer (using new or malloc=
) and
> sets up an I/O Request ReadFile, WriteFile or DeviceIoControl referencing=
 this
> buffer. This is performed as an asynchronous request.
>

There is absolutely no need for a second thread. Just issue "submit
buffer" overlapped I/O request in a first thread and don't wait for
completion.

> The driver recognizes this request and pends it indefinitely, (typically =
terminate
> it when your driver is shutting down, otherwise windows will probably han=
g).

That's pretty bad advice. As a minimum, you should install the cancel
handler and complete the request on cancellation. Besides, normally
you should stop all DMA activities and complete the request in CLEANUP
routine.
Also it is always a good idea to have custom "stop DMA and withdraw
circular buffer" IOCTL for a normal user-initiated termination.

> Pending the request has the nice side effect that the buffer now becomes =
locked
> down permanently.
>
> Assuming you have set up your driver to use DO_DIRECT_IO DMA, you can get=
 the S/G
> list describing the application space buffer as you are currently doing a=
nd feed
> this to your FPGA.
>
> Using the second thread in your application you can constantly read data =
from the
> locked down pages (you app. space buffer) that are being written by your =
FPGA.
>
> Assuming the DO_DIRECT_IO solves your problem (I think there is a good ch=
ance), I
> would however still consider migrating to a KMDF based driver, particular=
ily if
> you are writing a new one. It's much easier to maintain and is probably m=
ore
> portable for future MS versions.
>
>
>
> > best regards,
>
> > Frank
>
> best regards,
> Charles

Yes,  KMDF MUCH is easier than plain WDM.

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search