Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Hi Frank, > The way it works is as follows: > - the application allocates the memory (malloc). > - a pointer to this memory is passed to the driver (custom made > driver). > - the driver creates a scatter-gather list by using the > GetScatterGatherList method from the DMA_ADAPTER object. You are aware of the following text from the Microsoft WDK Docu? Particularily the first line. >> GetScatterGatherList is not a system routine that can be called directly by name. This routine is callable only by pointer from the address returned in a DMA_OPERATIONS structure. Drivers obtain the address of this routine by calling IoGetDmaAdapter. As soon as the appropriate DMA channel and any necessary map registers are available, GetScatterGatherList creates a scatter/gather list, initializes the map registers, and then calls the driver-supplied AdapterListControl routine to carry out the I/O operation. GetScatterGatherList combines the actions of the AllocateAdapterChannel and MapTransfer routines for drivers that perform scatter/gather DMA. GetScatterGatherList determines how many map registers are required for the transfer, allocates the map registers, maps the buffers for DMA, and fills in the scatter/gather list. It then calls the supplied AdapterListControl routine, passing a pointer to the scatter/gather list in ScatterGather. The driver should retain this pointer for use when calling PutScatterGatherList. Note that GetScatterGatherList does not have the queuing restrictions that apply to AllocateAdapterChannel. In its AdapterListControl routine, the driver should perform the I/O. On return from the driver-supplied routine, GetScatterGatherList keeps the map registers but frees the DMA adapter structure. The driver must call PutScatterGatherList (which flushes the buffers) before it can access the data in the buffer. >> > - the driver writes each entry of the scatter-gather list (which > contains a physical address and length) to the FPGA. > - the FPGA receives data (though another interface) and writes this > data to the memory of the pc by use of DMA (just generates write > requests). > - after writing the data the FPGA generates an interrupt of PCIe (not > working yet, but we know when the FPGA finished a transaction). > > I now understand I have to verify runtime if the physical address is > below or above 4 GB and use a 3 DW respectively 4 DW TLP header. I > will change that in the FPGA and give it a try. > > About the addresses, these are correct. We did the following test: > write the virtual memory from the application and read the memory by > using the physical addresses in the driver. In the driver we read what > the application has written. > > Any other suggestions? If you are convinced the addresses are correct, I would look at two other things. 1) Is you driver completing the request properly IoCompleteRequest() 2) Are the data being cached somewhere, Here, I would try a zero length read (from the driver. PCIe TLP with length 1 and all BEs zero) on the last address transferred to memory. Just discard the resulting completion. The PCIe spec says the system must intrepret this as a flush. By the way which buffering method is your driver using for the DMA transfer (Buffered, Direct, Neither) > > FrankArticle: 148251
I have also designed async fifos with success. There is quite a lot of good info out there about them you just need to dig and put it together. Personally I hate using Coregen unless I really have to as it is so limiting when designing memories. Jon --------------------------------------- Posted through http://www.FPGARelated.comArticle: 148252
I have version 12.1 WebPack running on Vista 64. I've created a new project (the Stopwatch one in the Tutorial) and am trying to add source files. When the Add Source window appears all is OK but when I try to move to a different directory at the same level or above the one that was highlighted originally, the program crashes completely. Has anyone else seen this? Thanks.Article: 148253
Hi, I'm implementing an FPGA prototype to do some image processing such as dead pixel correction. The xilinx FPGA will be configured with an SPI flash memory,where the .bit configuration file inside as well as dead pixel locations are stored. I want to correct the dead pixel while receiving each pixel data, compare each location with the pre-stored dead pixel locations, if it turns out to be a dead pixel, I can directly correct it. The question are: 1How could I access the locations information? should I first read them out and store in the dualport SDRAM? 2How can I know if the current pixel location equals to one of the dead pixel locations? What I can do is to compare current location to all the locations one by one, however, if there are 100 bad pixel location, and my input pixel data rate is 25MHZ, I need to have pixel location data rate of 25MHZ * 100 to make sure I can compare them all while receiving one pixel data. This is impossibel. Thanksfor your helps!Article: 148254
It seems that everyone has it own point of view that reliabilty concerns. :( I wanna add my design to opencores, so it should be technology independent.Article: 148255
Sorry, for the 2nd question, I just forgot that I can use Lool Up Table, just use CASE statement to implement this. So the only problem is how can I access the location data from SPI flash, since these data should be always available for bad pixel correction while FPGA running.Article: 148256
You may want to check that ISE supports 64-bit Vista otherwise you could try running in compatibilty mode. Jon --------------------------------------- Posted through http://www.FPGARelated.comArticle: 148257
Gladys <yuhui.b@gmail.com> wrote: > Hi, I'm implementing an FPGA prototype to do some image processing > such as dead pixel correction. The xilinx FPGA will be configured with > an SPI flash memory,where the .bit configuration file inside as well > as dead pixel locations are stored. > I want to correct the dead pixel while receiving each pixel data, > compare each location with the pre-stored dead pixel locations, if it > turns out to be a dead pixel, I can directly correct it. > The question are: > 1How could I access the locations information? should I first read > them out and store in the dualport SDRAM? I would probably store inside the FPGA, either BRAM or LUT RAM. There aren't so many dead pixels that it won't fit, right? > 2How can I know if the current pixel location equals to one of the > dead pixel locations? What I can do is to compare current location to > all the locations one by one, however, if there are 100 bad pixel > location, and my input pixel data rate is 25MHZ, I need to have pixel > location data rate of 25MHZ * 100 to make sure I can compare them all > while receiving one pixel data. This is impossibel. If they are in the right order, you always know which one is coming next. Just compare one! Though many FPGA are big enough to do the compares in parallel, you would need to program in the maximum number such that enough comparators were generated. -- glenArticle: 148258
On Jul 2, 5:32=A0am, firefox3107 <firefox3...@gmail.com> wrote: > It seems that everyone has it own point of view that reliabilty > concerns. :( > > I wanna add my design to opencores, so it should be technology > independent. One clock latency for an empty flag is not easy to achieve. The standard method uses binary address counters followed by binary to Gray code conversion. This then needs to be registered in the original clock domain to avoid glitches at the comparator interface. To get lower latency you would need to build synchronous Gray code counters, which takes a bit more logic. The additional combinatorial latency for generating large Gray counters can then reduce the maximum operating frequency of the FIFO. Regards, GaborArticle: 148259
On Jul 2, 9:41=A0am, Charles Gardiner <charles.gardi...@invalid.invalid> wrote: > Hi Frank, > > > The way it works is as follows: > > - the application allocates the memory (malloc). > > - a pointer to this memory is passed to the driver (custom made > > driver). > > - the driver creates a scatter-gather list by using the > > GetScatterGatherList method from the DMA_ADAPTER object. > > You are aware of the following text from the Microsoft WDK Docu? Particul= arily the > first line. > > > > GetScatterGatherList is not a system routine that can be called directly = by name. > This routine is callable only by pointer from the address returned in a > DMA_OPERATIONS structure. Drivers obtain the address of this routine by c= alling > IoGetDmaAdapter. > > As soon as the appropriate DMA channel and any necessary map registers ar= e > available, GetScatterGatherList creates a scatter/gather list, initialize= s the map > registers, and then calls the driver-supplied AdapterListControl routine = to carry > out the I/O operation. > > GetScatterGatherList combines the actions of the AllocateAdapterChannel a= nd > MapTransfer routines for drivers that perform scatter/gather DMA. > GetScatterGatherList determines how many map registers are required for t= he > transfer, allocates the map registers, maps the buffers for DMA, and fill= s in the > scatter/gather list. It then calls the supplied AdapterListControl routin= e, > passing a pointer to the scatter/gather list in ScatterGather. The driver= should > retain this pointer for use when calling PutScatterGatherList. Note that > GetScatterGatherList does not have the queuing restrictions that apply to > AllocateAdapterChannel. > > In its AdapterListControl routine, the driver should perform the I/O. On = return > from the driver-supplied routine, GetScatterGatherList keeps the map regi= sters but > frees the DMA adapter structure. The driver must call PutScatterGatherLis= t (which > flushes the buffers) before it can access the data in the buffer. > > > > > > > - the driver writes each entry of the scatter-gather list (which > > contains a physical address and length) to the FPGA. > > - the FPGA receives data (though another interface) and writes this > > data to the memory of the pc by use of DMA (just generates write > > requests). > > - after writing the data the FPGA generates an interrupt of PCIe (not > > working yet, but we know when the FPGA finished a transaction). > > > I now understand I have to verify runtime if the physical address is > > below or above 4 GB and use a 3 DW respectively 4 DW TLP header. I > > will change that in the FPGA and give it a try. > > > About the addresses, these are correct. We did the following test: > > write the virtual memory from the application and read the memory by > > using the physical addresses in the driver. In the driver we read what > > the application has written. > > > Any other suggestions? > > If you are convinced the addresses are correct, I would look at two other= things. > > 1) Is you driver completing the request properly IoCompleteRequest() > > 2) Are the data being cached somewhere, Here, I would try a zero length r= ead (from > the driver. PCIe TLP with length 1 and all BEs zero) on the last address > transferred to memory. Just discard the resulting completion. The PCIe sp= ec says > the system must intrepret this as a flush. > > By the way which buffering method is your driver using for the DMA transf= er > (Buffered, Direct, Neither) > > > > > Frank Hi Charles, I am not sure if we understand each other. What do you mean by completing the request with IoCompleteRequest? There is no request from software point of view. The FPGA will do a DMA write (data from FPGA to PC memory) at its own initiative. The allocated memory is used as long as the software is running. I do not allocate new memory for each new DMA transfer, but at startup a large piece of memory is allocated and the physical addresses are written to the FPGA by the driver software. And yes, we use a DMA adapter in combination with the GetScatterGatherList method. We already used this in another project but that was PCI and DMA read (data from PC memory to FPGA). By the way, where can I set the type of DMA? best regards, FrankArticle: 148260
On Jul 2, 6:37=A0am, Gabor <ga...@alacron.com> wrote: > On Jul 2, 5:32=A0am, firefox3107 <firefox3...@gmail.com> wrote: > > > It seems that everyone has it own point of view that reliabilty > > concerns. :( > > > I wanna add my design to opencores, so it should be technology > > independent. > > One clock latency for an empty flag is not easy to achieve. =A0The > standard method uses binary address counters followed by binary > to Gray code conversion. =A0This then needs to be registered in > the original clock domain to avoid glitches at the comparator > interface. =A0To get lower latency you would need to build synchronous > Gray code counters, which takes a bit more logic. =A0The additional > combinatorial latency for generating large Gray counters can > then reduce the maximum operating frequency of the FIFO. > > Regards, > Gabor Please allow me to make some comments. I have been involved in FIFO design since 1970, starting with the Fairchild 3341, and ending with the industry-first hard-coded FPGA FIFO designs at Xilinx. Given true-dual-ported BlockRAMs with independent addressing, clocking, and control, most of a FIFO design seems to be trivial and obvious. The devil is in the generation of the control flags Full and Empty. These flags must be fast and also reliable, i.e. glitch-free. (The designs are symmetrical, so I will only cover Empty.) The flags are driven by an identity comparison of the two addresses, therefore it is wise to use Gray codes. It may also be desirable to have binary addresses available for arithmetic calculations (almost empty, Dipstick etc). The simplest solution is to count binary, but synchronously convert to Gray, so that the two results appear simultaneously at their respective flip-flop outputs. That means there is no latency or speed penalty whatsoever for the conversion. It just costs one extra flip-flop+gate per bit, negligible in a custom design. The onset of Empty is caused by the read clock, and is thus naturally synchronous to the Read side of the design. The trailing edge of Empty is, however, caused by the write clock. Without careful synchronization, Empty would have a non-synchronous trailing edge, which would inevitably lead to malfunction, sooner or later. Most of the sometimes acrimonious debates about safe design center around the synchronization of the trailing edge,and how to avoid metastability problems. Nobody should bemoan the fact that this synchronizer consumes time, that Empty thus has a tendency to linger on for, say, a clock period or even two. That delay does not really sacrifice performance. (Any delay of the leading edge of Empty would of course directly affect performance). We tested the Xilinx hard-coded design by writing at 200 MHz and asynchronously reading at approx. 500 MHz, so that the Empty flag was exercised 200 million times per second, and we monitored the address counters. There was not a single error in a week of testing (10 exp 14 operations) with a forever changing sub-femtosecond timing granularity between the two clocks. So much about reliability and metastabilty. (Yes, the Virtex 4 version has a separate unrelated subtle problem, solved with a well-documented work-around) Hope this helps somebody. Peter Alfke, formerly Xilinx Applications.Article: 148261
Hi Frank, > > I am not sure if we understand each other. Yes, it certainly sounds like that. > What do you mean by > completing the request with IoCompleteRequest? There is no request > from software point of view. I think this might clear up the reason why your data is missing. (See also below about the type of DMA). I don't think the S/G list you are getting is describing your application buffer. This is best done by specifying DO_DIRECT_IO as the DMA method for your device. If you specify DO_BUFFERED_IO you will get an S/G List describing an intermediate buffer in kernel space and this probably never gets copied over to your application space buffer unless you terminate the request. I've never done the 'neither' method myself and from what I hear, it's a complicated beast. > The FPGA will do a DMA write (data from > FPGA to PC memory) at its own initiative. The allocated memory is used > as long as the software is running. I do not allocate new memory for > each new DMA transfer, but at startup a large piece of memory is > allocated and the physical addresses are written to the FPGA by the > driver software. Sounds like you are doing something like a circular buffer in memory which stays alive as long as your device does? > > And yes, we use a DMA adapter in combination with the > GetScatterGatherList method. We already used this in another project > but that was PCI and DMA read (data from PC memory to FPGA). > > By the way, where can I set the type of DMA? Typically, you set the DMA buffering method in your AddDevice function after you create your device object. Quoting from Oney's book, NTSTATUS AddDevice(..) { PDEVICE_OBJECT fdo; IoCreateDevice(....., &fdo); fdo->Flags |= DO_BUFFERED_IO; <or> fdo->Flags |= DO_DIRECT_IO; <or> fdo->Flags |= 0; // i.e. neither Direct nor Buffered And, you can't change your mind afterwards. By the way if my assumption about the circular buffer in your design is correct, there is a slightly more standard solution (standard in the sense that everybody on the microsoft drivers newgroup seems to do it). It however requires two threads in your application. The first one requests a buffer (using new or malloc) and sets up an I/O Request ReadFile, WriteFile or DeviceIoControl referencing this buffer. This is performed as an asynchronous request. The driver recognises this request and pends it indefinitely, (typically terminate it when your driver is shutting down, otherwise windows will probably hang). Pending the request has the nice side effect that the buffer now becomes locked down permanently. Assuming you have set up your driver to use DO_DIRECT_IO DMA, you can get the S/G list describing the application space buffer as you are currently doing and feed this to your FPGA. Using the second thread in your application you can constantly read data from the locked down pages (you app. space buffer) that are being written by your FPGA. Assuming the DO_DIRECT_IO solves your problem (I think there is a good chance), I would however still consider migrating to a KMDF based driver, particularily if you are writing a new one. It's much easier to maintain and is probably more portable for future MS versions. > > best regards, > > Frank best regards, CharlesArticle: 148262
To further explain my problem: I use an NCO to generate stimulus of cos/sin pair, multiply with the QAM i/q symbols...then track the carrier as I change the NCO frequency in steps. When crossing zero frequency, tracking manages first few crosses then may fail but relocks. Looking at the waveforms at zero crossing point, I get into some doubts about my understanding in principle. I know we can get negative frequency from positive frequency by inverting the sine wave or by reading the LUT in reverse time. These two ways are not identical at the crossing point. My tracking is based on reversing LUT for -f and so is my NCO. With reverse LUT reading, I notice that phase can suffer variation at the crossing; for example if I hit the mid-poit of a symmetrical section of waves then read back to get -f then the continuity looks best. At other extreme I could be before the peak of cosine only to reverse back leading to anything between 0 ~ 360 phase variation of negative pair with respect to positive pair. The question is how can I expect tracking not to fail with such phase variation. Or in other words; how does a real RF upconverter/downconverter link final frequency cross the zero and how to model it best? Regards kadhiem --------------------------------------- Posted through http://www.FPGARelated.comArticle: 148263
Frank van Eijkelenburg <fei.technolution@gmail.com> wrote: >On Jul 2, 2:19=A0am, Charles Gardiner <charles.gardi...@invalid.invalid> >wrote: >> Frank van Eijkelenburg schrieb: >> >> > Hi, >> >> > I have a custom made PCIe board with a Virtex 5 FPGA on which I >> > implemented a DMA unit which uses the PCIe endpoint block plus v1.14. >> > I also implemented simple read/write operations from the PC to the >> > board (the board responds with completion TLPs). The read/write >> > operations are working, DMA is not working >> >> > The board is inserted in a pc with Windows 7 64 bits platform. An >> > application allocates virtual memory and passes the memory block to >> > the driver. The driver locks the memory and converts the virtual >> > addresses into physical addresses. These physical addresses are >> > written to the FPGA. >> >> How are you doing this? Normally, an application requests a buffer using = >malloc() >> or new() and gets a handle to the driver using CreateFile(). You then use >> WriteFile(hDevice, Buffer,...), ReadFile(hDevice, Buffer,....) or >> DeviceIoControl() to initiate a transfer to/from =A0the device. Thats the >> application side. >> >> On the driver(kernel) side, I would strongly recommend that you write a K= >MDF based >> driver. Download the windows WDK, all it costs is your email. (You have t= >o log in >> over Microsoft Connect, last time I looked). There are lots of examples t= >here, >> including for PCI(e) based DMA. To (very quickly) summarise, your driver = >requests >> the scatter/gather list describing the buffers (see >> WdfDmaTransactionInitializeUsingRequest() in the WDK API docs as a starti= >ng point) >> above and passes these to your hardware one-by-one which then does DMA in= > or out. >> With a call to WdfRequestComplete the buffers are released by the kernel = >and your >> application can reuse them or free them up as required. (This is of cours= >e all >> considerably more than a days work, by the way.) >> >> You do not have to explicitly lock down the buffer yourself. Windows does= > this for >> you while the I/O request is active. (Read/WriteFile from your app up to >> WdfRequestComplete from the driver) >> >> >> >> > When I start an DMA operation, I can see in chipscope the correct >> > physical addresses in the TLP header. However, I do not see the >> > correct values in the allocated memory. What can I do to check where >> > it is going wrong? >> >> In this case, I would first doubt whether the addresses are correct. >> >> > Another question is about the memory request TLPs. What should I use, >> > 32 or 64 bit write requests? Or do I have to check runtime if the >> > physical memory address is below or above the 4 GB (and use >> > respectively 32 and 64 bit requests)? >> >> The PCIe spec says: a transfer below 4 GB must use a 3 DWord header, a tr= >ansfer >> above 4 GB must use a 4 DWord header. i.e. a four dword header wth addres= >s[63:32] >> set to zero is invalid. >> >> >> >> > Thanks in advance, >> >> > Frank > >The way it works is as follows: >- the application allocates the memory (malloc). >- a pointer to this memory is passed to the driver (custom made >driver). I strongly doubt you can use a malloc pointer to a driver. Actually I'm quite sure this doesn't work. When the driver is active, the application memory may be swapped to the hard-drive. And the pointer must be translated to a physical address. I'd go the other way around: have the driver allocate the memory and pass a pointer to this memory to the application (this will require some messing around with translation and access rights). -- Failure does not prove something is impossible, failure simply indicates you are not using the right tools... nico@nctdevpuntnl (punt=.) --------------------------------------------------------------Article: 148264
On Fri, 2 Jul 2010 12:35:20 +0000 (UTC), glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote: >Gladys <yuhui.b@gmail.com> wrote: > >> Hi, I'm implementing an FPGA prototype to do some image processing >> such as dead pixel correction. The xilinx FPGA will be configured with >> an SPI flash memory,where the .bit configuration file inside as well >> as dead pixel locations are stored. > >> I want to correct the dead pixel while receiving each pixel data, >> compare each location with the pre-stored dead pixel locations, if it >> turns out to be a dead pixel, I can directly correct it. > >> The question are: >> 1How could I access the locations information? should I first read >> them out and store in the dualport SDRAM? > >I would probably store inside the FPGA, either BRAM or LUT RAM. >There aren't so many dead pixels that it won't fit, right? > >> 2How can I know if the current pixel location equals to one of the >> dead pixel locations? What I can do is to compare current location to >> all the locations one by one, however, if there are 100 bad pixel >> location, and my input pixel data rate is 25MHZ, I need to have pixel >> location data rate of 25MHZ * 100 to make sure I can compare them all >> while receiving one pixel data. This is impossibel. > >If they are in the right order, you always know which one is >coming next. Just compare one! > >Though many FPGA are big enough to do the compares in parallel, >you would need to program in the maximum number such that enough >comparators were generated. > >-- glen Or use a block ram as a direct bad-pixel map, assuming you have as many bits of ram available as pixels in the image. At load time, a short (x,y) list of bad pixels could be read from serial flash and use to set the tag bits in the ram. JohnArticle: 148265
John Larkin <jjlarkin@highnotlandthistechnologypart.com> wrote: (snip, I wrote) >>I would probably store inside the FPGA, either BRAM or LUT RAM. >>There aren't so many dead pixels that it won't fit, right? (snip) > Or use a block ram as a direct bad-pixel map, assuming you have as > many bits of ram available as pixels in the image. Yes you could do that. As the number of pixels per display is ever increasing, though, it might not be so convenient. > At load time, a short (x,y) list of bad pixels could be read from > serial flash and use to set the tag bits in the ram. OK, so parallel load a word from BRAM, and shift it at the pixel clock rate. It might be that is faster than a comparator, though the comparator could be pipelined if needed, for speed. The BRAM load would have to be carefully pipelined to arrive at the appropriate time. As far as I know, most set a low maximum for the number of bad pixels. I believe my monitor has none, though I might not notice a black pixel. -- glenArticle: 148266
I believe I finally sorted out the issue thanks to Matlab. A LUT based approach for frquency synthesis is ok for positive or negative frequency synthesis. However it is very glitchy at zero cross over. The right model is to generate 2 frequencies(one drifting down, the other up so that they criss-cross). The two frequency sources are then multiplied together to produce proper vector for zero crossing. Regards kadhiem --------------------------------------- Posted through http://www.FPGARelated.comArticle: 148267
On Wed, 30 Jun 2010 18:50:42 -0700 (PDT), Bryan <bryan.fletcher@avnet.com> wrote: >I work for Avnet, which seems not to be too popular with this crowd, >but I will share my experience anyway. I have a project with >XC6SLX16-2CSG324 and LPDDR that seems to work well with MIG 3.3 in ISE >11.4. Granted, we are only running at 200 MHz. We do not provide a >200 MHz input to the chip. We have a 66 MHz oscillator input to the >FPGA. It is true that by default, MIG generates a design that assumes >the native system clock is the same as the memory clock. The only >clocking customization that MIG allows is the choice between single- >ended or differential clock. However, since MIG provides all the HDL >sources for the clock infrastructure, it is possible to modify the >clocking structure to generate the correct memory clock given any >system clock that meets the specifications of the PLL. > >I have some instructions that explains step-by-step how to do this for >the Avnet board (www.em.avnet.com/spartan6lx-evl). If you are >interested, please contact Avnet Technical Support (www.em.avnet.com/ >techsupport). In addition to this LPDDR example, Xilinx provides >working hardware examples for DDR2 on the SP601 and DDR3 on the >SP605. Avnet has another board with DDR3 that has been proven out at >800 Mbps in hardware (www.em.avnet.com/spartan6lx150t-dev). > >The other critical thing to do with these DDR designs is proper PCB >layout and termination, without which the design will fail. Xilinx >provides some very specific layout guidelines in UG388 that need to be >followed if you want the full memory interface performance. > >Xilinx recently published revised specifications for the MCB. See >http://www.xilinx.com/support/answers/35818.htm >The Spartan-6 Memory Controller Block (MCB) has new data rate >specifications and performance modes for DDR2 and DDR3 interfaces as >specified in version 1.5 of the Spartan-6 FPGA Data Sheet (DS162): >http://www.xilinx.com/support/documentation/data_sheets/ds162.pdf > >You should also be aware of the MIG Design Advisory Answer Record. >http://www.xilinx.com/support/answers/33566.htm > >Bryan >Avnet Bryan, We (I work with Rob) have contacted our tech support guy at Avnet, and asked him for your doc. After about six interchanges, we still can't get our hands on it. JohnArticle: 148268
On Fri, 25 Jun 2010 17:20:21 GMT, nico@puntnl.niks (Nico Coesel) wrote: >John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote: > >> >> >>We have a Spartan6/45 that's talking to 16 separate SPI A/D >>converters. The data we get back is different, but the clock and chip >>select timings are the same. To get the timing right, avoiding routing >>delays, we need our outgoing stuff to be reclocked by i/o cell >>flipflops. >> >>So what happens is that we have one state machine running all 16 SPI >>interfaces. We tell the software that we want the adc chip select >>flops in i/o cells. The compiler decides that all are seeing the same >>input signal, so reduces them to one flipflop. Then it concludes that >>that flipflop can't be in an i/o block, and builds it that way. The >>resulting routing delays are deadly. >> >>We couldn't find a way to force these 16 flops into IOBs. Really. > >Constraints usually help. In that case it should duplicate logic (if >this option is on) to meet timing specifications. Turns out, according to Xilinx, that IOB=TRUE (which is a suggestion to the compiler) works, but IOB=FORCE (which is supposed to be mandatory) doesn't. We just left the shift register in there. JohnArticle: 148269
On Sat, 03 Jul 2010 08:13:53 -0700, John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote: >On Fri, 25 Jun 2010 17:20:21 GMT, nico@puntnl.niks (Nico Coesel) >wrote: > >>John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote: >> >>> >>> >>>We have a Spartan6/45 that's talking to 16 separate SPI A/D >>>converters. The data we get back is different, but the clock and chip >>>select timings are the same. To get the timing right, avoiding routing >>>delays, we need our outgoing stuff to be reclocked by i/o cell >>>flipflops. >>> >>>So what happens is that we have one state machine running all 16 SPI >>>interfaces. We tell the software that we want the adc chip select >>>flops in i/o cells. The compiler decides that all are seeing the same >>>input signal, so reduces them to one flipflop. Then it concludes that >>>that flipflop can't be in an i/o block, and builds it that way. The >>>resulting routing delays are deadly. >>> >>>We couldn't find a way to force these 16 flops into IOBs. Really. >> >>Constraints usually help. In that case it should duplicate logic (if >>this option is on) to meet timing specifications. > >Turns out, according to Xilinx, that IOB=TRUE (which is a suggestion >to the compiler) works, but IOB=FORCE (which is supposed to be >mandatory) doesn't. We just left the shift register in there. I'm surprised that works. It didn't a couple of years ago, when I last used Xilinx stuff. The other thing to watch is tristate forcing. I found they had to be in the top level of the hierarchy to work right. Maybe that was just a problem with Virtex4 and the PPC stuff, though.Article: 148270
kadhiem_ayob <kadhiem_ayob@n_o_s_p_a_m.n_o_s_p_a_m.yahoo.co.uk> wrote: > I believe I finally sorted out the issue thanks to Matlab. > A LUT based approach for frquency synthesis is ok for positive or negative > frequency synthesis. However it is very glitchy at zero cross over. The > right model is to generate 2 frequencies(one drifting down, the other up so > that they criss-cross). The two frequency sources are then multiplied > together to produce proper vector for zero crossing. Reminds me of an HP signal generator, I believe 3325B, which uses a combination of frequency generators and mixers to generate the range of frequencies from 1uHz to 20.999999999 MHz. I used to know more of the details about how it works, but part rembering and part looking on the web, it generates 30MHz more than the desired frequency, and then mixes down for the resulting output. -- glenArticle: 148271
John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote: >On Fri, 25 Jun 2010 17:20:21 GMT, nico@puntnl.niks (Nico Coesel) >wrote: > >>John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote: >> >>> >>> >>>We have a Spartan6/45 that's talking to 16 separate SPI A/D >>>converters. The data we get back is different, but the clock and chip >>>select timings are the same. To get the timing right, avoiding routing >>>delays, we need our outgoing stuff to be reclocked by i/o cell >>>flipflops. >>> >>>So what happens is that we have one state machine running all 16 SPI >>>interfaces. We tell the software that we want the adc chip select >>>flops in i/o cells. The compiler decides that all are seeing the same >>>input signal, so reduces them to one flipflop. Then it concludes that >>>that flipflop can't be in an i/o block, and builds it that way. The >>>resulting routing delays are deadly. >>> >>>We couldn't find a way to force these 16 flops into IOBs. Really. >> >>Constraints usually help. In that case it should duplicate logic (if >>this option is on) to meet timing specifications. > >Turns out, according to Xilinx, that IOB=TRUE (which is a suggestion >to the compiler) works, but IOB=FORCE (which is supposed to be >mandatory) doesn't. We just left the shift register in there. Another way to force flipflops in an IOB is to specify a short delay for the output flip-flop to pad path. -- Failure does not prove something is impossible, failure simply indicates you are not using the right tools... nico@nctdevpuntnl (punt=.) --------------------------------------------------------------Article: 148272
On Sat, 03 Jul 2010 17:32:28 GMT, nico@puntnl.niks (Nico Coesel) wrote: >John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote: > >>On Fri, 25 Jun 2010 17:20:21 GMT, nico@puntnl.niks (Nico Coesel) >>wrote: >> >>>John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote: >>> >>>> >>>> >>>>We have a Spartan6/45 that's talking to 16 separate SPI A/D >>>>converters. The data we get back is different, but the clock and chip >>>>select timings are the same. To get the timing right, avoiding routing >>>>delays, we need our outgoing stuff to be reclocked by i/o cell >>>>flipflops. >>>> >>>>So what happens is that we have one state machine running all 16 SPI >>>>interfaces. We tell the software that we want the adc chip select >>>>flops in i/o cells. The compiler decides that all are seeing the same >>>>input signal, so reduces them to one flipflop. Then it concludes that >>>>that flipflop can't be in an i/o block, and builds it that way. The >>>>resulting routing delays are deadly. >>>> >>>>We couldn't find a way to force these 16 flops into IOBs. Really. >>> >>>Constraints usually help. In that case it should duplicate logic (if >>>this option is on) to meet timing specifications. >> >>Turns out, according to Xilinx, that IOB=TRUE (which is a suggestion >>to the compiler) works, but IOB=FORCE (which is supposed to be >>mandatory) doesn't. We just left the shift register in there. > >Another way to force flipflops in an IOB is to specify a short delay >for the output flip-flop to pad path. The problem I see with that is that it can't be verified (what can, I suppose) until after PAR is run. By using synthesis attributes the results can be seen in the technology view (or whatever they call it today).Article: 148273
On Jul 2, 11:35=A0pm, n...@puntnl.niks (Nico Coesel) wrote: > Frank van Eijkelenburg <fei.technolut...@gmail.com> wrote: > > > > >On Jul 2, 2:19=3DA0am, Charles Gardiner <charles.gardi...@invalid.invali= d> > >wrote: > >> Frank van Eijkelenburg schrieb: > > >> > Hi, > > >> > I have a custom made PCIe board with a Virtex 5 FPGA on which I > >> > implemented a DMA unit which uses the PCIe endpoint block plus v1.14= . > >> > I also implemented simple read/write operations from the PC to the > >> > board (the board responds with completion TLPs). The read/write > >> > operations are working, DMA is not working > > >> > The board is inserted in a pc with Windows 7 64 bits platform. An > >> > application allocates virtual memory and passes the memory block to > >> > the driver. The driver locks the memory and converts the virtual > >> > addresses into physical addresses. These physical addresses are > >> > written to the FPGA. > > >> How are you doing this? Normally, an application requests a buffer usi= ng =3D > >malloc() > >> or new() and gets a handle to the driver using CreateFile(). You then = use > >> WriteFile(hDevice, Buffer,...), ReadFile(hDevice, Buffer,....) or > >> DeviceIoControl() to initiate a transfer to/from =3DA0the device. That= s the > >> application side. > > >> On the driver(kernel) side, I would strongly recommend that you write = a K=3D > >MDF based > >> driver. Download the windows WDK, all it costs is your email. (You hav= e t=3D > >o log in > >> over Microsoft Connect, last time I looked). There are lots of example= s t=3D > >here, > >> including for PCI(e) based DMA. To (very quickly) summarise, your driv= er =3D > >requests > >> the scatter/gather list describing the buffers (see > >> WdfDmaTransactionInitializeUsingRequest() in the WDK API docs as a sta= rti=3D > >ng point) > >> above and passes these to your hardware one-by-one which then does DMA= in=3D > > or out. > >> With a call to WdfRequestComplete the buffers are released by the kern= el =3D > >and your > >> application can reuse them or free them up as required. (This is of co= urs=3D > >e all > >> considerably more than a days work, by the way.) > > >> You do not have to explicitly lock down the buffer yourself. Windows d= oes=3D > > this for > >> you while the I/O request is active. (Read/WriteFile from your app up = to > >> WdfRequestComplete from the driver) > > >> > When I start an DMA operation, I can see in chipscope the correct > >> > physical addresses in the TLP header. However, I do not see the > >> > correct values in the allocated memory. What can I do to check where > >> > it is going wrong? > > >> In this case, I would first doubt whether the addresses are correct. > > >> > Another question is about the memory request TLPs. What should I use= , > >> > 32 or 64 bit write requests? Or do I have to check runtime if the > >> > physical memory address is below or above the 4 GB (and use > >> > respectively 32 and 64 bit requests)? > > >> The PCIe spec says: a transfer below 4 GB must use a 3 DWord header, a= tr=3D > >ansfer > >> above 4 GB must use a 4 DWord header. i.e. a four dword header wth add= res=3D > >s[63:32] > >> set to zero is invalid. > > >> > Thanks in advance, > > >> > Frank > > >The way it works is as follows: > >- the application allocates the memory (malloc). > >- a pointer to this memory is passed to the driver (custom made > >driver). > > I strongly doubt you can use a malloc pointer to a driver. Actually > I'm quite sure this doesn't work. When the driver is active, the > application memory may be swapped to the hard-drive. And the pointer > must be translated to a physical address. > Nah, malloc() at application level is o.k. If I/O operation is specified as DIRECT_IO then I/O manager takes care of locking the pages. If operation is specified as NEITHER then driver itself should call MmProbeAndLockPages in user context (in this case you should never install filter drivers in between your driver and app). In both cases it is very important to not complete the IRP associated with the user buffer until the finish all DMA activities. If I/O operation is specified as BUFFERED_IO then I/O manager allocates kernel buffer and passes it to the driver and copies the results from kernel to user buffer after driver completed the IRP. Obviously, BUFFERED_IO is not suitable for OPs case, since he want the result back without completing the original I/O request. > I'd go the other way around: have the driver allocate the memory and > pass a pointer to this memory to the application (this will require > some messing around with translation and access rights). > > -- > Failure does not prove something is impossible, failure simply > indicates you are not using the right tools... > nico@nctdevpuntnl (punt=3D.) > -------------------------------------------------------------- On general-purpose system allocation of big buffers by driver is rarely a good idea. On the system dedicated to just one task it could be pragmatically o.k, but I still don't like it from pure theoretical point of view. Anyway, the discussion doesn't belong here. I recommend http://groups.google.com/group/microsoft.public.development.device.driversArticle: 148274
On Jul 2, 9:04=A0pm, Charles Gardiner <charles.gardi...@invalid.invalid> wrote: > Hi Frank, > > > > > I am not sure if we understand each other. > > Yes, it certainly sounds like that. > > > What do you mean by > > completing the request with IoCompleteRequest? There is no request > > from software point of view. > > I think this might clear up the reason why your data is missing. (See als= o below > about the type of DMA). I don't think the S/G list you are getting is des= cribing > your application buffer. This is best done by specifying DO_DIRECT_IO as = the DMA > method for your device. If you specify DO_BUFFERED_IO you will get an S/G= List > describing an intermediate buffer in kernel space and this probably never= gets > copied over to your application space buffer unless you terminate the req= uest. > I've never done the 'neither' method myself and from what I hear, it's a > complicated beast. > > > The FPGA will do a DMA write (data from > > FPGA to PC memory) at its own initiative. The allocated memory is used > > as long as the software is running. I do not allocate new memory for > > each new DMA transfer, but at startup a large piece of memory is > > allocated and the physical addresses are written to the FPGA by the > > driver software. > > Sounds like you are doing something like a circular buffer in memory whic= h stays > alive as long as your device does? > > > > > And yes, we use a DMA adapter in combination with the > > GetScatterGatherList method. We already used this in another project > > but that was PCI and DMA read (data from PC memory to FPGA). > > > By the way, where can I set the type of DMA? > > Typically, you set the DMA buffering method in your AddDevice function af= ter you > create your device object. Quoting from Oney's book, > > NTSTATUS AddDevice(..) { > =A0 =A0PDEVICE_OBJECT =A0 =A0fdo; > > =A0 =A0IoCreateDevice(....., &fdo); > =A0 =A0fdo->Flags |=3D DO_BUFFERED_IO; > =A0 =A0 =A0 =A0 =A0 =A0 <or> > =A0 =A0fdo->Flags |=3D DO_DIRECT_IO; > =A0 =A0 =A0 =A0 =A0 =A0 <or> > =A0 =A0fdo->Flags |=3D 0; =A0// i.e. neither Direct nor Buffered > > And, you can't change your mind afterwards. > > By the way if my assumption about the circular buffer in your design is c= orrect, > there is a slightly more standard solution (standard in the sense that ev= erybody > on the microsoft drivers newgroup seems to do it). It however requires tw= o threads > in your application. The first one requests a buffer (using new or malloc= ) and > sets up an I/O Request ReadFile, WriteFile or DeviceIoControl referencing= this > buffer. This is performed as an asynchronous request. > There is absolutely no need for a second thread. Just issue "submit buffer" overlapped I/O request in a first thread and don't wait for completion. > The driver recognizes this request and pends it indefinitely, (typically = terminate > it when your driver is shutting down, otherwise windows will probably han= g). That's pretty bad advice. As a minimum, you should install the cancel handler and complete the request on cancellation. Besides, normally you should stop all DMA activities and complete the request in CLEANUP routine. Also it is always a good idea to have custom "stop DMA and withdraw circular buffer" IOCTL for a normal user-initiated termination. > Pending the request has the nice side effect that the buffer now becomes = locked > down permanently. > > Assuming you have set up your driver to use DO_DIRECT_IO DMA, you can get= the S/G > list describing the application space buffer as you are currently doing a= nd feed > this to your FPGA. > > Using the second thread in your application you can constantly read data = from the > locked down pages (you app. space buffer) that are being written by your = FPGA. > > Assuming the DO_DIRECT_IO solves your problem (I think there is a good ch= ance), I > would however still consider migrating to a KMDF based driver, particular= ily if > you are writing a new one. It's much easier to maintain and is probably m= ore > portable for future MS versions. > > > > > best regards, > > > Frank > > best regards, > Charles Yes, KMDF MUCH is easier than plain WDM.
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z