Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
On Oct 9, 7:37 am, LucienZ <lucien.zh...@gmail.com> wrote: > On Oct 9, 6:21 am, rickman <gnu...@gmail.com> wrote: > > > > > On Oct 7, 8:30 am, LucienZ <lucien.zh...@gmail.com> wrote: > > > > Many thanks Darron, Glen and many former repliers! I appreciate your > > > ideas. > > > > The mentioned algorithm is associated with stereo vision i.e. placing > > > two cameras like the two human eyes, estimate the depth information > > > from the image pair, and then interpolate a plausible view like taken > > > from an intermediate position...The algorithm has already been > > > implemented on PC (C version) but now can process only one such a pair > > > in one second (Pentium IV 3.0GHz). GPU/CUDA is much better (10+ fps @ > > > Nvidia Geforce 7900 GTX). The bosses have some plans to go embedded > > > and ASIC, and still hope to see competitive results like that on GPU. > > > > To achieve the best real-time performance, implementing the algorithm > > > directly using FPGA fabrics seems to be the only promising solution. > > > What leads me to think about multicores is their popularity nowadays, > > > and some of my professors are pioneers in the field of MPSoC > > > architecture and Network-on-Chips. I also see many similar projects > > > being carried out at some leading universities, e.g. the RAMP @ UC > > > Berkeleyhttp://ramp.eecs.berkeley.edu/index.php?about > > > > So regardless of my project, what do you guys think about the multi/ > > > many-core research everywhere? As FPGA techniques evolve, will this > > > approach have a bright future compared with customized logic design? > > > No need to be very serious if you like to say something :). > > > As others have indicated, it may be easy to implement CPUs in FPGAs, > > but they do not run nearly as fast as high end CPUs in fixed silicon. > > Martin indicated that if your algorithm is amenable to breaking it > > into many processes running in parallel, many processors can be used, > > each doing a part of the calculation with the load fairly balanced. > > > If you look at the DSP chips that include general purpose CPUs and/or > > specialized processing elements, you might learn something about how > > to partition your design. A general purpose CPU is good at the > > control portion of the algorithm which is making decisions and > > choices. A DSP like processor is good at calculations that involve > > vectors where the same operation is performed on arrays of data, like > > your video data. But a DSP processor still has the overhead of > > fetching instructions and accessing random access memory. Specialized > > processing elements are designed for a particular task so that it can > > function without a program (possibly) and may not use random access > > memory, but instead FIFOs and other structures. All of these can be > > implemented in an FPGA with the specialized processor taking the most > > advantage of what FPGAs have to offer in terms of flexibility. > > > If you are just going to use an FPGA as a way to provide a number of > > standard processors, you will likely find applicatin specific chips > > available that will do a better job. There are any number of > > multicore chips out there if you dig a bit. They also may not need to > > be anything like your PC. Actually, a GPU is a highly parallel design > > and it will be hard to outperform that in an FPGA unless you use it to > > tailor the hardware in very specific ways to your algorithm. > > > You might try by expressing your algorithm in a language that uses > > parallelism. If you can do that, it will likely teach you a lot about > > how your algorithm can be optimally implemented in an FPGA. This will > > be very different than a standard sequential program and will be key > > to any specialized hardware design. > > > Rick > > Thanks a lot Martin and rickman! > Martin, an introduction to the stereo matching can be found in this > slides:http://www.vision.deis.unibo.it/smatt/Seminars/StereoVision.pdf > and related people are gathering here:http://vision.middlebury.edu/ > I bet you already have backgrounds in the stereo matching topics. Each > of the stereo matching algorithms has its own features; however many > of them include intensive use Sum-of-Absolute-Difference (SAD) to find > out the best matches between the left and right image. Simply > speaking, if this is a pixel block in the left image (pixel value 5 is > the anchor): > L[][] = > 7 8 9 > 4 5 6 > 1 2 3 > and this is the search space in the right image: > R[][] = > 9 9 9 9 9 9 9 > 0 0 7 8 9 0 0 > 0 0 4 5 6 0 0 > 0 0 1 2 3 0 0 > 9 9 9 9 9 9 9 > The aim is to find out the best match, and in this case it is obvious. > But in practice there are many other problems, and in our case the > pixel block and search space are NOT rectangular. Yes, this is the "aim", but not an algorithm. You say the algorithm is "SAD", but I don't know what computations that involves. Can you describe the algorithm in a bit more detail? > Any recommended structures for this? I think the systolic array is a > promising solution (thanks to Glen); although by now it is not clear > to me how to do the mapping +_+. Thanks for you to mention the 'census > transform', I also bumped into the concept when looking into a > reference design (DeepSea Stereo Vision System) :). I will pay more > attention to it. We would have to understand the algorithm in a *lot* more detail before suggesting a hardware structure. In general, the sort of task specific processing an FPGA is suited to could be described by a data flow diagram. If you understand the computations to be done, you would construct circles for each task and connect them with arrows which are the data flowing between the tasks. Each task has to be broken down to a level that it can be implemented efficiently in hardware with a dedicated processing element. FPGAs have lots of elements, so you can have many, many tasks in your diagram, just keep them simple and the data will be able to flow through continuously. A systolic array actually has all elements operating in lockstep so that each task is one simple operation such as multiply or add. I suppose you could use this structure for more complex functions and move the data on an enable signal, but the power of a systolic array is usually in that it moves data at a regular, high rate so that no buffer memory is needed. In your case, I think you will need each task to access some portion of the image, not just one element, so an asynchronous data flow design might be better. In this construct chunks of data are moved between tasks through FIFOs. I once worked on a sonar system designed this way. They even invented a language to support programming it, called ECOS. It was a DOD program and died when DPS chips dropped in price and increased in functionality. Government programs often move too slow to get out of the way of the bus of commercial chip design. But the multi-processing and data flow aspects of it were very interesting. > I did some related literature studies these days (not complete yet...) > and the most impressive solution did use a direct hardware > implementation of the SADs. One claims achieving 600 FPS at 450*375 > resolution (of course more parameters are involved). As comparison, > the same algorithm with software optimizations (trade memory for fast > calculations) can only achieve 1.48 FPS (3GHz Pentium IV). Did they give any details of the algorithm? > A practical implementation should be a streamed, pipelined approach > which interfaces cameras as well as a stereoscopic display. At the > beginning I think some external memories should be used to provide > ground truth test data. I don't know what "ground truth test data" is. But if you want your design to run slowly, use external memory. If you want it to run fast, use the internal block ram of the FPGA. > I was just thinking too 'software' and 'sequential', even if when I > talked about parallel (simply divide the work load by the number of > processors). And I overestimated the computation power of embedded > processors (I took many related courses, paper work...). Yeah, dividing the work load sounds simple, but that is the crux of what makes parallel processing difficult. I have worked on array processors (what we used before they had DSP chips) and the most difficult part was keeping all of the compute elements running. Today they call that "load balancing". I wanted to try to port ECOS to one of those machines. It could have been done fairly easily, but there was no interest in DOD work at that company and no interest in that company in the DOD program. There is an interesting article in the September issue of IEEE Spectrum called "Data Monster" about using PC graphic chips (GPUs) for other applications such as data base search. In essence the GPU is a SIMD processor useful when you wish to do the same processing on a lot of data. This sounds like your application to me. A computer graphic card is pretty much every thing you need, a lot of processing elements connected to some *very* fast memory and fast I/O... well fast O anyway. Do they make video cards with camera inputs? That might need to go through the PCI-like bus from a video capture card. You could develop an algorithm that will run on one of these devices and watch your speeds increase as the chip technology increases! Or you could start with one of these guys and when you understand the parallel aspects of your algorithm, move up from having 128 processing elements in the GPU to having 4096 or more processing elements in an FPGA. There is also the complexity of using an FPGA to consider. In addition to learning about your algorithm, you will need to learn about the HDL you choose to use as well as the details of the FPGA you pick. It might be better to divide and conquer. It mostly depends on what you are comfortable with although I have no idea what tools are available for using GPUs for user applications. RickArticle: 143401
> > >LilacSkin wrote: >> Hello >> >> I have just the MCS file and I want to configure directly my FPGa with >> a bitstream. >> So I had to convert the MCS in BIT. >> >> What I have done: >> 1)MCS->HEX with promgen: promgen -r <promfile> -p hex >> 2)HEX->BIT with Hex2bits : hex2bits -k <hexfile> <bitfile> >> >> But iMPACT doen't work with the generated bitstream. >> Do you have any idea ? >> >> Thanks ! >> > >Xilinx bit files are LSB first, IIRC. You may need to flip the >bit sequence between bytes. Then, you need to send some extra >config clocks to the FPGA to complete the configuration sequence. >The bit file may have something at the end that causes iMPACT to >do that, and that something may have been trimmed when converting to >PROM format in the MCS file. Also, bit files have a header at the >beginning that specifies the FPGA part #, and the date and time of >creation of the file. Everything up to the 0xFF is header info, and can >be read with an ASCII editor or hex file viewer. If the header info is >not right, it may cause iMPACT to not work right. > >The MCS file has a checksum at the end of every line. Hopefully promgen >gets rid of that. > >Jon > > So I read this thread and followed the orders given, but my striphex cannot convert the 4098 line (supposedly the 4096 prom cell?) and so I am stuck. Any ideas of what should I do? I have to make a bit file out of it, it is very very important. Every help available...Article: 143402
Hi to all, I heard that xilinx no longer hold the license for the old foundation softwares so I assume that it's free to share (pls correct me if I wrong). Is there anyone who could share their old foundation 2.1 or 3.1 software (something that would work with my xc5215 device)? I'm starting to learn fpga and someone gave me the xc5215 device and instead of throwing it away, I decided to start learning from this. Appreciate if anyone could share or lead me to where to download it. Thank you, John.Article: 143403
On Oct 9, 11:40=A0am, "nanotech" <garyche...@yahoo.com> wrote: > Hi to all, > > I heard that xilinx no longer hold the license for the old foundation > softwares so I assume that it's free to share (pls correct me if I wrong)= . > > Is there anyone who could share their old foundation 2.1 or 3.1 software > (something that would work with my xc5215 device)? > > I'm starting to learn fpga and someone gave me the xc5215 device and > instead of throwing it away, I decided to start learning from this. By "Device" do you mean that someone gave you a chip? Or do you have a development board of some sort? > Appreciate if anyone could share or lead me to where to download it. Save yourself a whole bunch of aggravation and download the latest free WebPack software and get one of the inexpensive FPGA starter kits. -aArticle: 143404
On Oct 9, 2:40=A0pm, "nanotech" <garyche...@yahoo.com> wrote: > Hi to all, > > I heard that xilinx no longer hold the license for the old foundation > softwares so I assume that it's free to share (pls correct me if I wrong)= . > No. It means Xilinx is no longer authorized by the third-party tool provider (Aldec) to deliver any software containing their code. It is most certainly NOT free to share. > Is there anyone who could share their old foundation 2.1 or 3.1 software > (something that would work with my xc5215 device)? > > I'm starting to learn fpga and someone gave me the xc5215 device and > instead of throwing it away, I decided to start learning from this. > > Appreciate if anyone could share or lead me to where to download it. > > Thank you, John. Go to the xilinx website and get a download of ISE 4.2i in the "classics" section. That should support your device without violating any licenses. regards, GaborArticle: 143405
I am trying to understand if there is a well defined procedure to do ASIC prototyping using FPGA. I was thinking that I can divide down the ASIC design into multiple functional blocks. Such that each block represents asn ASIC. The design is in verilog HDL but it contains a lot ASIC lib specific instantiations. This is not understood by the FPGA software. Here are the things that I am trying to understand (1) How to convert the ASIC libreray specific primitive such that the FPGA software can understand? (2) The ASIC design uses LATCHES every now and then for time borrowing, I am not sure if it is OK to use the LATCH in FPGA design. (3) What is the exact procedure of analyzing large designs containing thousands of verilog modules? I ran experment on a few selected modules and dicovered some issues described in 1 and 2. I am sure a lot people use FPGAs for ASIC prototyping and do they go through the verilog moduel one at a time to make them work with the FPGA. (4) Deisgn of what complexity can be targeted for ASIC prototyping? The design I have may require multiple big FPGAs. I wish I could everything into one big FPGA. (5) Partitioning the design is also challenging that is why I was thinking that the design should be partioned based on functional blocks so there is clear division that matches the design. (6) Is there a company that provides syhtnesis/partitioning software and support for code synthesis that can run into above mentioned problem?Article: 143406
On Oct 9, 11:40=A0am, "nanotech" <garyche...@yahoo.com> wrote: > Hi to all, > > I heard that xilinx no longer hold the license for the old foundation > softwares so I assume that it's free to share (pls correct me if I wrong)= . > > Is there anyone who could share their old foundation 2.1 or 3.1 software > (something that would work with my xc5215 device)? > > I'm starting to learn fpga and someone gave me the xc5215 device and > instead of throwing it away, I decided to start learning from this. > > Appreciate if anyone could share or lead me to where to download it. > > Thank you, John. The copyright on this software is still owned by Xilinx and will be for the next 60 or so year so it is not "free to share". Xilinx does provide a free download of these older versions of software for devices that are no longer support in the current software versions. You can download these "Classic" releases here: http://www.xilinx.com/webpack/classics/wpclassic/index.htm Ed McGettigan -- Xilinx Inc.Article: 143407
Test01 <cpandya@yahoo.com> wrote: < I am trying to understand if there is a well defined procedure to do < ASIC prototyping using FPGA. I was thinking that I can divide down < the ASIC design into multiple functional blocks. Such that each block < represents asn ASIC. The design is in verilog HDL but it contains a < lot ASIC lib specific instantiations. This is not understood by the < FPGA software. Here are the things that I am trying to understand < (1) How to convert the ASIC libreray specific primitive such that < the FPGA software can understand? If they look like module references, you just need to write a verilog module that does the same function. Don't include it when using the ASIC library. < (2) The ASIC design uses LATCHES every now and then for time < borrowing, I am not sure if it is OK to use the LATCH in FPGA design. The usual FPGAs have an edge-triggered FF along with each LUT (look-up table). If that is what you mean by latch, then it should work fine. If you mean something more like a transparent latch then it will be implemented with LUTs, take up more cells and probably be slower. It should still work, though. < (3) What is the exact procedure of analyzing large designs containing < thousands of verilog modules? I ran experment on a few selected < modules and dicovered some issues described in 1 and 2. I am sure a < lot people use FPGAs for ASIC prototyping and do they go through the < verilog moduel one at a time to make them work with the FPGA. As I understand it, with the current mask costs many projects previously targetting ASICs are now targetting FPGAs. You might check the total cost (NRE, masks, etc.) Otherwise, FPGA arrays are known to have been used for debugging microprocessors. At that point, they can afford special software. < (4) Deisgn of what complexity can be targeted for ASIC prototyping? < The design I have may require multiple big FPGAs. I wish I could < everything into one big FPGA. I believe there are stories about some of the intel Pentium series prototyping in FPGA arrays. I haven't followed them recently, though. Is that big enough? < (5) Partitioning the design is also challenging that is why I was < thinking that the design should be partioned based on functional < blocks so there is clear division that matches the design. Well, partitioning is usually part of the design process. FPGA's are getting big enough that you won't have to partition all that much, though. < (6) Is there a company that provides syhtnesis/partitioning software < and support for code synthesis that can run into above mentioned < problem? I believe so, but it might be pretty expensive. -- glenArticle: 143408
>> Yes, this is the "aim", but not an algorithm. You say... The SAD computations should look more or less like this... //ry = 1 is known, geometry stuff... ry = 1; for(rx = 0; rx <= 4; rx++) { acc = 0; for(x = 0; x < 3; x++) { for(y = 0; y < 3; y++) { acc += abs( L[x][y] - R[rx + x][ry + y] ); } } SAD[rx] = acc; } //Then find out the min value in the SAD array... There are also algorithms (or aims) called SSD (Sum-of-Squared- Differences), only with the "acc +=" statement a bit different. Anyway I feel the real thing is too complex to be discussed right now. If my promoters would like to use dedicated hardware logic design, then my focus is going to find out which architecture fits the bill. I'd better ask again when encounter more specific problems :). >> Did they give any details of the algorithm? The only thing they mentioned about the algorithm is SAD :), with a math expression. The rest parts are all logic blocks. I think it is easier for those computer vision people to understand... >> I don't know what "ground truth test data" is. But if you want ... The "ground truth" is also often used by the algorithm developers. If you place two cameras like two human eyes, then the images taken by the two cameras are indeed different. Closer objects have a larger position shift in between while remote objects smaller. There is a 'true' difference (measured in pixels) for each pixel in the left image, compared with the right one; and the 'true' differences (so- called ground truth) are available from some test image pairs. None of the current algorithms is able to find out the 'true' differences, and one metric for evaluating an algorithm is to see how much truth being found. I did not intend to complicate things with these terms...I think people here have many different backgrounds and not everyone is interested in this. The usage of an external memory is to provide the ground truth data, and simulate a camera's streaming transmissions. They are useful to check the implementation of a logic block. Once successful, the external memory should be replaced by cameras to do the real stuff. The on-chip RAM is faster, but I don't think its capacity is large enough for storing a bunch of test data, nor represent a camera module from the outside. Regarding the GPU, there is already a group of people here working on it. That is another way of computation. But consumers will less likely to buy a Gphone equipped with a GPU, and I think that's why the promoters would like to go embedded. I did some search for solutions on ASICs, media SoCs, DSPs, whatever... I don't see any of them (as a single chip) have a sufficient horsepower for our computation. Though customized logic design is quite hard, I do see its advantage for such a low-level image processing. Anyway, more evaluations will be done to find out more pros and cons of each approach.Article: 143409
>I am trying to understand if there is a well defined procedure to do > ASIC prototyping using FPGA. A study by Dataquest in 2005 showed that nearly 40% of all ASIC companies prototype on FPGA's. I suspect this figure to be a lot higher these days and I wouldn't be surprised if all digital ASIC's are prototyped on FPGA's. >I was thinking that I can divide down > the ASIC design into multiple functional blocks. Such that each block > represents asn ASIC. The design is in verilog HDL but it contains a > lot ASIC lib specific instantiations. This is not understood by the > FPGA software. Depends, if the ASIC design contains DesignWare components then both Mentor's Precision and Synopsys Synplify can handle (some of) them. These tools can also handle gated clocks and SDC constraint files. >Here are the things that I am trying to understand > > (1) How to convert the ASIC libreray specific primitive such that the > FPGA software can understand? As mentioned above, look into DesignWare. > (2) The ASIC design uses LATCHES every now and then for time > borrowing, I am not sure if it is OK to use the LATCH in FPGA design. See Glen's answer. > (3) What is the exact procedure of analyzing large designs containing > thousands of verilog modules? I ran experment on a few selected > modules and dicovered some issues described in 1 and 2. I am sure a > lot people use FPGAs for ASIC prototyping and do they go through the > verilog moduel one at a time to make them work with the FPGA. If the design is very large then I would recommend a static/dynamic linting tool such as spyglass/designchecker. These tools can very quickly check for all sorts of issues such as latches, missing block, unused logic, un-synthesisable constructs, both clock edges used, combinatorial feedbacks, etc.I would also recommend a clock domain checkers such as 0-in's CDC if you have multiple clock domains. > (4) Deisgn of what complexity can be targeted for ASIC prototyping? In addition to Glen's answer I suspect that it will be highly unlikely your can't fit your design on say a Hardi/Enterpoint/ProDesign/etc board. Have a look at this board: http://www.enterpoint.co.uk/merrick/merrick1.html > The design I have may require multiple big FPGAs. I wish I could > everything into one big FPGA. > (5) Partitioning the design is also challenging that is why I was > thinking that the design should be partioned based on functional > blocks so there is clear division that matches the design. If speed is not an issue than I would recommend that as well. There are partitioners than can partition based on delays but this is normally a lot more difficult to do. > (6) Is there a company that provides syhtnesis/partitioning software > and support for code synthesis that can run into above mentioned > problem? I believe that Synopsys's Certify is the current market leader in partioning software but I have heard that Auspy ACE is more powerful but more difficult to use (they also have a dreadful website, see http://www.auspy.com/) Good luck, Hans www.ht-lab.comArticle: 143410
Joseph Yiu <joseph.yiu@somewhereinarm.com> wrote: >rickman wrote: > >> >> As others have indicated, it may be easy to implement CPUs in FPGAs, >> but they do not run nearly as fast as high end CPUs in fixed silicon. >> Martin indicated that if your algorithm is amenable to breaking it >> into many processes running in parallel, many processors can be used, >> each doing a part of the calculation with the load fairly balanced. >> > >Regarding speed, actually you could have better performance by running a >CPU in FPGA compared to standard microcontrollers. Flash memory usually >have a flash memory of 25MHz to 50MHz, while block RAM in FPGA can be >much faster. So you can run a processor at high clock speed on FPGA >(e.g. 100MHz) with zero wait state, compared to microcontroller products >running at 100MHz with 1 or 2 wait states on silicon. That's not true. Most of such microcontrollers have wider flash and use pre-fetch and branch prediction buffers (a small smart cache) to undo the effects of the slower flash. NXP has been doing this for years with the LPC2000 series and has included similar schemes on their new LPC1300 en LPC1700 series. -- Failure does not prove something is impossible, failure simply indicates you are not using the right tools... "If it doesn't fit, use a bigger hammer!" --------------------------------------------------------------Article: 143411
On Oct 9, 11:32=A0am, Uwe Bonnes <b...@elektron.ikp.physik.tu- darmstadt.de> wrote: > fab. <fabrizio.tapp...@gmail.com> wrote: > > Bus 002 Device 007: ID 04b4:7200 Cypress Semiconductor Corp. > > Is this the VID/PID that appears/disappears with plugging/unplugging the > board? =A0What board is this? It seems not to be a Xilinx one, as Xilinx > boards have Vendor ID 03fd. > > Did the board come with a Linux installation? It seems not.. > > So follow first the instructions onhttp://www.xilinx.com/support/answers/= 32657.htm > > Then you need the hex file for your board. The hexfile is buried deep ins= ide > the windows installer. Then you need to add a udev rule for above VID/PID= to > upload the hex file. Look at /etc/udev/rules.d/xusbdfwu.rule that should > have been installed by the Xilinx installer and create a similar rule. > > After un/replugging, the hexfile should be uploaded automatically and > VID/PID 03fd/0008 should appear. This VID/PID is used be Impact. > > You can try uploading the hexfile also with fxload. Be sure to upload a > fitting hexfile, perhaps bug the vendor of your board. > > Enough work fot you for the next hours ;-) > > And if you get it working, write some summary and post it for later > reference... > > -- > Uwe Bonnes =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0b...@elektron.ikp.physik.tu-dar= mstadt.de > > Institut fuer Kernphysik =A0Schlossgartenstrasse 9 =A064289 Darmstadt > --------- Tel. 06151 162516 -------- Fax. 06151 164321 ---------- Uwe, thanks a lot for your great help. The issue has been solved. The problem was that the Spartan-3A DSP 3400A Edition does not have a jtag programmer, I tried with the external xilinx usb jtag programmer and no problem at all. The command line lsusb is a good way to see what is currently connected to ur usb. I am not sure whether for linux mint and ISE 11.1 you need or not the usb driver solution described in: http://www.george-smart.co.uk/wiki/Xilinx_JTAG_Linux but for me it worked. regards fabrizioArticle: 143412
fab. <fabrizio.tappero@gmail.com> wrote: > thanks a lot for your great help. > The issue has been solved. The problem was that the Spartan-3A DSP > 3400A Edition does not have a jtag programmer, I tried with the > external xilinx usb jtag programmer and no problem at all. The command > line lsusb is a good way to see what is currently connected to ur usb. > I am not sure whether for linux mint and ISE 11.1 you need or not the > usb driver solution described in: > http://www.george-smart.co.uk/wiki/Xilinx_JTAG_Linux > but for me it worked. Just for reference: Recent IMPACT uses libusb by default. the rmdir library preload is only needed for the parallel port driver to remove the need to load the WinDrv kernel module. -- Uwe Bonnes bon@elektron.ikp.physik.tu-darmstadt.de Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt --------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------Article: 143413
gkonstan <paraharaktis@gmail.com> wrote: ... > So I read this thread and followed the orders given, but my striphex cannot > convert the 4098 line (supposedly the 4096 prom cell?) and so I am stuck. > Any ideas of what should I do? I have to make a bit file out of it, it is > very very important. Every help available... Try the bitparse utility from sourceforge xc3sprog like bitparse -i mcs -Oxxx.bit xxx.mcs The Bitfile has some information entries pprepended before the bitstream. -- Uwe Bonnes bon@elektron.ikp.physik.tu-darmstadt.de Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt --------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------Article: 143414
On Oct 10, 2:34=A0am, LucienZ <lucien.zh...@gmail.com> wrote: > >> Yes, this is the "aim", but not an algorithm. =A0You say... > > The SAD computations should look more or less like this... > > //ry =3D 1 is known, geometry stuff... > ry =3D 1; > for(rx =3D 0; rx <=3D 4; rx++) > { > =A0 acc =3D 0; > =A0 for(x =3D 0; x < 3; x++) > =A0 { > =A0 =A0 for(y =3D 0; y < 3; y++) > =A0 =A0 { > =A0 =A0 =A0 acc +=3D abs( L[x][y] - R[rx + x][ry + y] ); > =A0 =A0 } > =A0 } > > =A0 SAD[rx] =3D acc;} > > //Then find out the min value in the SAD array... > There are also algorithms (or aims) called SSD (Sum-of-Squared- > Differences), only with the "acc +=3D" statement a bit different. > > Anyway I feel the real thing is too complex to be discussed right now. > If my promoters would like to use dedicated hardware logic design, > then my focus is going to find out which architecture fits the bill. > I'd better ask again when encounter more specific problems :). > > >> Did they give any details of the algorithm? > > The only thing they mentioned about the algorithm is SAD :), with a > math expression. The rest parts are all logic blocks. I think it is > easier for those computer vision people to understand... > > >> I don't know what "ground truth test data" is. =A0But if you want ... > > The "ground truth" is also often used by the algorithm developers. If > you place two cameras like two human eyes, then the images taken by > the two cameras are indeed different. Closer objects have a larger > position shift in between while remote objects smaller. There is a > 'true' difference (measured in pixels) for each pixel in the left > image, compared with the right one; and the 'true' differences (so- > called ground truth) are available from some test image pairs. None of > the current algorithms is able to find out the 'true' differences, and > one metric for evaluating an algorithm is to see how much truth being > found. > > I did not intend to complicate things with these terms...I think > people here have many different backgrounds and not everyone is > interested in this. > > The usage of an external memory is to provide the ground truth data, > and simulate a camera's streaming transmissions. They are useful to > check the implementation of a logic block. Once successful, the > external memory should be replaced by cameras to do the real stuff. > The on-chip RAM is faster, but I don't think its capacity is large > enough for storing a bunch of test data, nor represent a camera module > from the outside. > > Regarding the GPU, there is already a group of people here working on > it. That is another way of computation. But consumers will less likely > to buy a Gphone equipped with a GPU, and I think that's why the > promoters would like to go embedded. I did some search for solutions > on ASICs, media SoCs, DSPs, whatever... I don't see any of them (as a > single chip) have a sufficient horsepower for our computation. Though > customized logic design is quite hard, I do see its advantage for such > a low-level image processing. Anyway, more evaluations will be done to > find out more pros and cons of each approach. I noticed that I've made something wrong in the coding...though not very important for this discussion. If someone is interested in the details, we can discuss through Emails :).Article: 143415
I have a hobby project which consists of developing a complete computer system from the ground up. With complete I mean that it should have character display capabilities, keyboard input capabilities and mass storage capabilities. Graphics and networking might come in the future, but I feel that adding these is probably more a team effort than a single person effort, and I need time first to implement what should be a reasonable system stack. This is what I mean with from the ground up : a system stack consisting of an ISA, a simulator for the ISA and processor architecture, basic display and keyboard capabilities for the simulator, and system software which is based on Lisp. I have already the simulator running, together with the IO capabilities and an assembler. A nice link to give you a background, and what motivated me, is the Homebuilt CPU's Webring, starting at http://www.homebrewcpu.com/. However, I am not really motivated to build anything in TTL (although I did design in 2006 and 2007 a 12-bit CPU, which I simulated at the ALU/register level on a wire/bus basis) and doing wirewrap (expensive) or etching boards (no laboratory space available). What I am looking at for the future is a board where I of course can build a CPU on, with VGA output and USB interface for both keyboard and mass storage. Currently I have some target boards, which I find interesting both in price and possibilities : - Digilent Nexsys2 board - Digilent Nexsys board - Spartan-3 starter kit board - Spartan-3E starter kit I am fascinated by the possibilities these boards might give. However, are there other boards that you know of so that I might compare them, price less then 200 EUR ? I have also been following the discussion below about the ARM cores, so I know that there might be drawbacks on implementing a CPU (or a more or less complete computer system) by means of an FPGA. However, for me it seems the most logical way, because these boards offer some basic possibilities which are difficult to achieve in any other way. Also, since my interest goes to the total system (ISA/implementation/SW/applications), this gives me the possibility to create something in a reasonable time, say about 2 years. That is basically it, I think. Since such a project has many details, I have possibly already forgotten some things, so do not hesitate to ask. Regards, JurgenArticle: 143416
Have a look at our new Spartan-6 board Drigmorn3 http://www.enterpoint.co.uk/component_replacements/drigmorn3.html. It's on the edge of your budget but there is also another board Drigmorn2 (Spartan-3A) that is about to launch and is a little simpler and cheaper. John Adair Enterpoint Ltd. On 10 Oct, 17:50, Jurgen Defurne <jurgen.defu...@telenet.be> wrote: > I have a hobby project which consists of developing a complete > computer system from the ground up. With complete I mean that it > should have character display capabilities, keyboard input > capabilities and mass storage capabilities. Graphics and networking > might come in the future, but I feel that adding these is probably > more a team effort than a single person effort, and I need time first > to implement what should be a reasonable system stack. > > This is what I mean with from the ground up : a system stack > consisting of an ISA, a simulator for the ISA and processor > architecture, basic display and keyboard capabilities for the > simulator, and system software which is based on Lisp. I have already > the simulator running, together with the IO capabilities and an > assembler. > > A nice link to give you a background, and what motivated me, is the > Homebuilt CPU's Webring, starting athttp://www.homebrewcpu.com/. However, I am not really motivated to > build anything in TTL (although I did design in 2006 and 2007 a 12-bit > CPU, which I simulated at the ALU/register level on a wire/bus basis) > and doing wirewrap (expensive) or etching boards (no laboratory space > available). > > What I am looking at for the future is a board where I of course can > build a CPU on, with VGA output and USB interface for both keyboard > and mass storage. > > Currently I have some target boards, which I find interesting both in > price and possibilities : > > - Digilent Nexsys2 board > - Digilent Nexsys board > - Spartan-3 starter kit board > - Spartan-3E starter kit > > I am fascinated by the possibilities these boards might give. However, > are there other boards that you know of so that I might compare them, > price less then 200 EUR ? > > I have also been following the discussion below about the ARM cores, > so I know that there might be drawbacks on implementing a CPU (or a > more or less complete computer system) by means of an FPGA. However, > for me it seems the most logical way, because these boards offer some > basic possibilities which are difficult to achieve in any other > way. Also, since my interest goes to the total system > (ISA/implementation/SW/applications), this gives me the possibility to > create something in a reasonable time, say about 2 years. > > That is basically it, I think. Since such a project has many details, > I have possibly already forgotten some things, so do not hesitate to > ask. > > Regards, > > JurgenArticle: 143417
Save up and go for one of these: http://www.altera.com/products/devkits/altera/kit-cyc3-embedded.html A bit more expensive, but it has all the toys. Cheers, JonArticle: 143418
On Oct 11, 5:50=A0am, Jurgen Defurne <jurgen.defu...@telenet.be> wrote: > I have a hobby project which consists of developing a complete > computer system from the ground up. With complete I mean that it > should have character display capabilities, keyboard input > capabilities and mass storage capabilities. Graphics and networking > might come in the future Depends if your interest is more in the SW, HW, or final usable system. Look also at the propellor chip > I have also been following the discussion below about the ARM cores, > so I know that there might be drawbacks on implementing a CPU (or a > more or less complete computer system) by means of an FPGA. An FPGA solution is going to win on flexibility, but will always cost more to get even a medium speed CPU in a FPGA, than as real silicon, and you will not get very high performance single CPU in a FPGA, So, I'd suggest you also look at the Low cost Eval Board, for Std CPU's. eg Atmel's ATNGW100 is just $89, and NXP have just released a useful development platform for $60 : see http://mbed.org/ or check out the bottom end Atom's ... or for silicon working harder, perhaps this ? http://www.belogic.com/uzebox/index.htm - could be respun into xMega ... -jgArticle: 143419
Hi, I managed to figure this thing out. The problem with the above was that I was not shifting the carry in each iteration. Also there was some problems with the proper sign extensions. Bewick's 1994 thesis on "Fast Multiplication : Algorithms and Implementation" and "General Data-path organization of a MAC unit for VLSI Implementation of DSP processors" by Farooqui/Oklobdzjia came in very handy in understanding algorithm better, in particular the sign extension part. Here is my code: http://www.pastey.net/126355 Unsigned/Signed Mult + 32-bit accumulate all work. I'm still trying to get 64-bit accumulate working but should be OK. The result gets computed in 5 cycles. I would have liked to have done it in 4 but the unsigned case requires an extra multiplicand to be added at the end if the MSB of multiplicand is 1. Hence, the extra cycle. I suppose this could be pushed into the CSA array but it would get slightly messy. I could try to implement early termination once I get 64-bit accumulate working. Thanks for your help. Kind regardsArticle: 143420
On Oct 10, 5:51=A0am, n...@puntnl.niks (Nico Coesel) wrote: > Joseph Yiu <joseph....@somewhereinarm.com> wrote: > >rickman wrote: > > >> As others have indicated, it may be easy to implement CPUs in FPGAs, > >> but they do not run nearly as fast as high end CPUs in fixed silicon. > >> Martin indicated that if your algorithm is amenable to breaking it > >> into many processes running in parallel, many processors can be used, > >> each doing a part of the calculation with the load fairly balanced. > > >Regarding speed, actually you could have better performance by running a > >CPU in FPGA compared to standard microcontrollers. Flash memory usually > >have a flash memory of 25MHz to 50MHz, while block RAM in FPGA can be > >much faster. So you can run a processor at high clock speed on FPGA > >(e.g. 100MHz) with zero wait state, compared to microcontroller products > >running at 100MHz with 1 or 2 wait states on silicon. > > That's not true. Most of such microcontrollers have wider flash and > use pre-fetch and branch prediction buffers (a small smart cache) to > undo the effects of the slower flash. NXP has been doing this for > years with the LPC2000 series and has included similar schemes on > their new LPC1300 en LPC1700 series. Yes, but they only get speeds up to roughly 100 MHz. An FPGA CPU with internal memory can run 200 MHz or faster, depending on the CPU. The only ARM parts that run much faster than 100 MHz are not using Flash at that speed or have cache. However, clock speed is not the determining factor of processor speed, as we all know. Most FPGA CPUs are rather simple CPUs with simple instruction sets and get lower MIPS per MHz. The real advantage of a soft CPU is the flexibility you have to integrate it with peripherals or possibly the fact that you can eliminate cost by not having a separate CPU. In other cases you can connect multiple CPUs with high speed interfaces. It all comes down to the cost and the flexibility, whichever is important to you. RickArticle: 143421
I'd like to get started in writing some simple fpga applications(LED blinker, etc...). I would like to use a C++ based design. I see that SystemC is C++ based and is free but I'm unsure how all the different steps work. Do I simply write code in my prefered language(VHDL, Verilog, SystemC, etc...) and use the appropriate tool to "compile" it then bring that into an appropriate program to create a hex dump for sending to the fpga(or send directly from the program)? I have some proASIC's I brought a while back that I would like to use. The libero IDE just crashes on my comp so I can't use it. Can someone give me very simple and straight forward process for how one goes about getting code to machine? For example, with microchip pics: 1. Write code 2. Compile code to binary using MPLab or other supported IDE. (2.5. Simulate if necessary) 3. Program chip using a programmer(or directly from MPLab) (4. Debug) Most likely the fpga's are pretty much identical but I have not found any good simple explanation on how to do this. Right now I don't have a programmer but if it's simply done through JTAG and any appropriate JTAG interface can do it then it shouldn't be a problem. I'll just program a pic to dump the data to the fpga? (similar to programming a pic from the pc, for example.) Main thing right now is that I just need to get coding... wether it's SystemC or not... But I would like to be able to code independently of the fpga so I don't lock myself into one(at this point at least).Article: 143422
On Oct 11, 11:43=A0am, "Jon Slaughter" <Jon_Slaugh...@Hotmail.com> wrote: > I'd like to get started in writing some simple fpga applications(LED > blinker, etc...). > > I would like to use a C++ based design. I see that SystemC is C++ based a= nd > is free but I'm unsure how all the different steps work. > > Do I simply write code in my prefered language(VHDL, Verilog, SystemC, > etc...) and use the appropriate tool to "compile" it then bring that into= an > appropriate program to create a hex dump for sending to the fpga(or send > directly from the program)? > > I have some proASIC's I brought a while back that I would like to use. Th= e > libero IDE just crashes on my comp so I can't use it. > > Can someone give me very simple and straight forward process for how one > goes about getting code to machine? > > For example, with microchip pics: > > 1. Write code > 2. Compile code to binary using MPLab or other supported IDE. > (2.5. Simulate if necessary) > 3. Program chip using a programmer(or directly from MPLab) > (4. Debug) > > Most likely the fpga's are pretty much identical but I have not found any > good simple explanation on how to do this. > > Right now I don't have a programmer but if it's simply done through JTAG = and > any appropriate JTAG interface can do it then it shouldn't be a problem. > I'll just program a pic to dump the data to the fpga? (similar to > programming a pic from the pc, for example.) > > Main thing right now is that I just need to get coding... wether it's > SystemC or not... But I would like to be able to code independently of th= e > fpga so I don't lock myself into one(at this point at least). yes, you can implement a PIC in FPGA and use MPLAB to write application for your own PIC-FPGA but if you really want todo something with FPGA please try to understand they are not processors and it doesnt really matter what design enty method you use, without understanding what the FPGA tools generate as result you have little chance of doing anything useful AnttiArticle: 143423
On Sun, 11 Oct 2009 03:43:14 -0500, "Jon Slaughter" <Jon_Slaughter@Hotmail.com> wrote: >I'd like to get started in writing some simple fpga applications(LED >blinker, etc...). > >I would like to use a C++ based design. I see that SystemC is C++ based and >is free but I'm unsure how all the different steps work. If you have $50k or so for a SystemC toolchain, consider it; for a reliable tool flow at price:free, stick with VHDL (or Verilog. The language wars rage, and basically reduce to: some prefer Verilog because it's more like C; I strongly prefer VHDL because it's more like Pascal.) >Do I simply write code in my prefered language(VHDL, Verilog, SystemC, >etc...) and use the appropriate tool to "compile" it then bring that into an >appropriate program to create a hex dump for sending to the fpga(or send >directly from the program)? Not quite. >I have some proASIC's I brought a while back that I would like to use. The >libero IDE just crashes on my comp so I can't use it. The tools are probably more important than the chips, at least for learning. When you need the ultimate performance, or the last cent on the BOM, then worry about the chip. Xilinx Webpack on Linux or Windows (my choice) or Altera Quartus (Windows only for the free Web edition) are the way to go. (Others may me incomplete; no simulator? or simply not work...) >Can someone give me very simple and straight forward process for how one >goes about getting code to machine? > >For example, with microchip pics: > >1. Write code Ditto but the code looks different. e.g. for a LED blinker: (1) identify a clock input to your FPGA (from the board's documentation e.g. a 50MHz crystal oscillator, and a LED for output. (2) write a synchronous divide by 50M counter in VHDL; see "counter" and "clocked process" in the synthesis style guide supplied with the tools), and connect the clock and counter MSB(*) to signals named "Clock" and "LED" (3) write a constraint file (.ucf for Xilinx) which connects "Clock" and "LED" to the appropriate FPGA pins, and sets the I/O standards, e.g. LED drive strength. Also tell it your clock rate (see "timing constraints"). Stage 2 "compile" tools will meet that speed target, or tell you why it can't. (*) for equal on/off times you want a /25M counter followed by a /2 stage 1.5 Simulate. For a LED blinker you don't want to simulate 1 second... Simulation is slow. So make the counter division ratio a constant, and change it between 50M for operation, and (say) 50 for simulation. More generally; think about testability; i.e. how to test in reasonable time. Write a testbench to supply a clock. (The tools have templates for this) For a LED flasher I wouldn't bother writing a tester for the output signal; just watch it toggling in the wave window. >2. Compile code to binary using MPLab or other supported IDE. Synthesize; fix any synth errors. Back end tools; (Translate/Map/Place&Route for Xilinx); fix any errors. (e.g. rewrite for faster logic if it can't meet your speed target. Unlikely at 50MHz; probable at 200MHz) Generate bitfile. >(2.5. Simulate if necessary) Rarely if ever do you need to simulate after Stage 2. It can OCCASIONALLY be useful for identifying missing timing constraints, or chasing synth tool bugs. But it's MUCH slower than "behavioural simulation" on the source. >3. Program chip using a programmer(or directly from MPLab) Depends on the board. (1) Program ("configure") via JTAG to chip (2) Program FLASH memory, maybe via JTAG, and configure from that (3) load bitfile into CompactFlash card and transfer that to socket on board. (4) etc. Some cards are configurable across the PCI bus. >(4. Debug) Simulate at 1.5; there may be board level issues to debug, but not many. >Right now I don't have a programmer but if it's simply done through JTAG and >any appropriate JTAG interface can do it then it shouldn't be a problem. >I'll just program a pic to dump the data to the fpga? (similar to >programming a pic from the pc, for example.) USB to JTAG cable is common. (From Xilinx for a price. Other versions from other people for less. Antti and others will know far more on the specifics) >Main thing right now is that I just need to get coding... wether it's >SystemC or not... But I would like to be able to code independently of the >fpga so I don't lock myself into one(at this point at least). Download Webpack/Quartus and get going. The constraint file will be device-specific, but that's all. Pure VHDL is pretty much portable, barring tool bugs. (Vendor specific models, e.g. for DSP blocks, RAM blocks, PowerPCs, clock PLLs are a different matter) You can get 90% done before deciding on the hardware... Boards: look at www.enterpoint.co.uk among others. - BrianArticle: 143424
Jurgen Defurne <jurgen.defurne@telenet.be> wrote: > I have a hobby project which consists of developing a complete > computer system from the ground up. With complete I mean that it > should have character display capabilities, keyboard input > capabilities and mass storage capabilities. Graphics and networking > might come in the future, but I feel that adding these is probably > more a team effort than a single person effort, and I need time first > to implement what should be a reasonable system stack. > This is what I mean with from the ground up : a system stack > consisting of an ISA, a simulator for the ISA and processor > architecture, basic display and keyboard capabilities for the > simulator, and system software which is based on Lisp. I have already > the simulator running, together with the IO capabilities and an > assembler. > A nice link to give you a background, and what motivated me, is the > Homebuilt CPU's Webring, starting at > http://www.homebrewcpu.com/. However, I am not really motivated to > build anything in TTL (although I did design in 2006 and 2007 a 12-bit > CPU, which I simulated at the ALU/register level on a wire/bus basis) > and doing wirewrap (expensive) or etching boards (no laboratory space > available). > What I am looking at for the future is a board where I of course can > build a CPU on, with VGA output and USB interface for both keyboard > and mass storage. > Currently I have some target boards, which I find interesting both in > price and possibilities : > - Digilent Nexsys2 board > - Digilent Nexsys board > - Spartan-3 starter kit board > - Spartan-3E starter kit If you want the experiment with free CPU, look for a board without DDR2. There is no open DDR2 core yet. Otherwise look at the project you want to run and look at the supported boards. Bye -- Uwe Bonnes bon@elektron.ikp.physik.tu-darmstadt.de Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt --------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z