Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Ed McGettigan <ed.mcgettigan@xilinx.com> wrote: ... > These files should have been posted online shortly after the release > of 11.3. I will look into this and get these up there. Any news about your effort? Thanks -- Uwe Bonnes bon@elektron.ikp.physik.tu-darmstadt.de Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt --------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------Article: 143376
Many thanks Darron, Glen and many former repliers! I appreciate your ideas. The mentioned algorithm is associated with stereo vision i.e. placing two cameras like the two human eyes, estimate the depth information from the image pair, and then interpolate a plausible view like taken from an intermediate position...The algorithm has already been implemented on PC (C version) but now can process only one such a pair in one second (Pentium IV 3.0GHz). GPU/CUDA is much better (10+ fps @ Nvidia Geforce 7900 GTX). The bosses have some plans to go embedded and ASIC, and still hope to see competitive results like that on GPU. To achieve the best real-time performance, implementing the algorithm directly using FPGA fabrics seems to be the only promising solution. What leads me to think about multicores is their popularity nowadays, and some of my professors are pioneers in the field of MPSoC architecture and Network-on-Chips. I also see many similar projects being carried out at some leading universities, e.g. the RAMP @ UC Berkeley http://ramp.eecs.berkeley.edu/index.php?about So regardless of my project, what do you guys think about the multi/ many-core research everywhere? As FPGA techniques evolve, will this approach have a bright future compared with customized logic design? No need to be very serious if you like to say something :). Thanks and Kind Regards, LucienArticle: 143377
LucienZ <lucien.zhang@gmail.com> wrote: < The mentioned algorithm is associated with stereo vision i.e. placing < two cameras like the two human eyes, estimate the depth information < from the image pair, and then interpolate a plausible view like taken < from an intermediate position...The algorithm has already been < implemented on PC (C version) but now can process only one such a pair < in one second (Pentium IV 3.0GHz). GPU/CUDA is much better (10+ fps @ < Nvidia Geforce 7900 GTX). The bosses have some plans to go embedded < and ASIC, and still hope to see competitive results like that on GPU. Yes, this looks like it should work well as a systolic array. There is not a lot of literature on systolic arrays, but there is some. See if you can find some of it. For this problem, it would seem that you need to do comparisons with the two shifted by different amounts, and with the comparison (hopefully) relatively simple. Consider a linear array of such comparators with two inputs (the two images) and one output. Propagate one image down the array with one clock cycle delay between each comparator. Propagate the other down, but with two clock cycles of delay. That should get you started, but then you have to consider the tradeoff between time and space. How fast does it need to be, and how many can you process per clock cycle. It might be that you want to run the array faster than the input data such that each comparator is used more than once. Those are the complications that are needed to build affordable arrays. -- glenArticle: 143378
On Aug 30, 2:15=A0pm, "maxascent" <maxasc...@yahoo.co.uk> wrote: > I would like to implement a HDMI transmitter with a Virtex 5. Does anyone > know if there is a chip that converts LVDS to TMDS? > > Thanks > > Jon Do you really want to mess with HDMI directly inside the V5? Analog and TI (and possibly others) have a bunch of chips that decode HDMI and DVI straight into parallel RGB or YCrCb. See for example here: http://www.analog.com/en/audiovideo-products/analoghdmidvi-interfaces/produ= cts/index.html# ChrisArticle: 143379
qamrul wrote: > Hi All, > > I need to provide 8 clock outs shifted by 45 degree, > > clk_0 -> 0 degree phase shift > clk_1 -> 45 degree phase shift > clk_1 -> 90 degree phase shift > clk_1 -> 135 degree phase shift > clk_1 -> 180 degree phase shift > clk_1 -> 225 degree phase shift > clk_1 -> 270 degree phase shift > clk_1 -> 315 degree phase shift > clk_1 -> 360 degree phase shift > > Is it at all possible? > > Thanks in advance for your feed back. > > Qamrul > > > As discussed you can interleave two DCMs. Now you can tune the delay of the second one. On a Spartan3 you have 256 steps for a period. Setting delay to 32 should give you 1/8 period delay. I read somewhere about a resolution in the range of 50ps IIRC. That's well above the 1064MHz discussed in this thread. On later Spartan3 families you have more effort to calculate the delay. Maybe the software assists you. You have RTFM on DCMs. Regards ThomasArticle: 143380
Hi, how fast is the serializer? What is maximum bit clock speed. I did not find it in the documentation. Thanks ThomasArticle: 143381
On Oct 7, 4:45=A0pm, Thomas Rudloff <thomasREMOVE_rudloffREM...@gmx.net> wrote: > qamrul wrote: > > Hi All, > > > I need to provide 8 clock outs shifted by 45 degree, > > > clk_0 -> 0 degree phase shift > > clk_1 -> 45 degree phase shift > > clk_1 -> 90 degree phase shift > > clk_1 -> 135 degree phase shift > > clk_1 -> 180 degree phase shift > > clk_1 -> 225 degree phase shift > > clk_1 -> 270 degree phase shift > > clk_1 -> 315 degree phase shift > > clk_1 -> 360 degree phase shift > > > Is it at all possible? > > > Thanks in advance for your feed back. > > > Qamrul > > As discussed you can interleave two DCMs. Now you can tune the delay of > the second one. On a Spartan3 you have 256 steps for a period. > Setting delay to 32 should give you 1/8 period delay. > I read somewhere about a resolution in the range of 50ps IIRC. > That's well above the 1064MHz discussed in this thread. > > On later Spartan3 families you have more effort to calculate the delay. > Maybe the software assists you. You have RTFM on DCMs. > > Regards > Thomas Thomas its no issue to have 2 DCMs interleaved they will have proper 8 phases too, thats fine but you need to have those 8 clocks to some flip flops and this route will destroy some of the timing unless you route it fully in manual mode using DIRT constraints. its not the issue of having 8 phases of clock on DCM outs, that no problem AnttiArticle: 143382
Rudolf, The Spartan 6 Fmax of the I/O serdes is not fully characterized yet, so it isn't in the datasheet. There are some timing numbers for one of the speed grades in the datasheet, and you can see from that (setup/ hold, delay) what kind of speeds you are probably looking at (compare these numbers with a fully characterized IO serdes, like that in V5 to get a ball park estimate...). For example, the S6 -3 OSERDES DDR LVDS (datawidth 2 thru 8) is rated at 1.05 Gbs. http://www.xilinx.com/support/documentation/data_sheets/ds162.pdf page 13 Each speed grade is ~7.5 to 10%, so I would guess a -1 might be at least 20% slower, or ~800 Mbs DDR LVDS best case. The IO serdes features can't run any faster than the IO's can run (and do anything useful). So, if the IO runs at 200 Mbs in a simulation for the signal integrity, then that is as fast as the IOserdes will run. The IO serdes is made from a hardened custom block in each bank, so it will not generally be the limiter of the speed. That said, contact your local FAE to get advanced characterization information.Article: 143383
Nvidia Geforce 7900 GTX? I don't see that on the list of CUDA-enabled cards. How many multiprocessors does it have? For comparison, my laptop's Quatro FX 1600M has 4 multiprocessors, for 32 single precision float ops in parallel. My desktop's GTS 280 has 30 multiprocessors, for 240 ops in parallel. That's a 7.5X speed improvement, if the algorithm is perfectly parallelized. Some Nvidia cards only have 1 multiprocessor... so it'd be 30X faster on a decent card! Beware of NVIDIA's marketing speak... my laptop's card spec page says "32 parallel processors" even though that's really 4 SIMD- like processors @ 8 ops at once. Anyway, it probably matters little if the ultimate goal is ASIC. My interpretation of that RAMP project page is that they are providing a platform for researching TOOL and LANGUAGE improvements. Using FPGAs to provide multiple cores is mentioned as adequate for that purpose. It doesn't mean it's in any way fast. I would guess that they don't care so much about making a fast multiprocessor system... they care about how developers are going to write code for those systems once they appear. FPGAs can do some incredible things to specific hardware algorithm designs. However, because of all the long routing it's going to be quite slow compared to a pure ASIC implementation of the same thing. So, an FPGA soft processor is always going to be dismally slow compared to the same processor in real silicon using the same technology level. Now, a true expert can do a very good job with things like pipelining and possibly even manual placing of LUTs and routing to get every last bit of speed out of an FPGA. That can close the gap somewhat, but there's still a good gap. I don't see FPGAs proportionally improving compared to direct silicon implementations because of these routing concerns. Better placement algorithms could help a good deal perhaps, but ultimately there's a whole lot of routing logic that wastes a lot of time. It's likely that tools to automatically convert algorithms to hardware for you will improve a lot... things like C-to-hardware is a start, but not all that usable right now. Easier ways to use IP blocks would help, too. (think of simply making a function call like FFT() instead of all the hassle of wizards and wiring things up) Right now, many front end algorithm tools are more like using BASIC instead of assembly on early PCs... very few serious developers are going to go that route for professional designs. As the tools and languages improve, that will change. Maybe if there's a radically new FPGA technology, things could be different. Imagine some kind of phase-change semiconductor that could be changed from P to N or back with an external multilayer writer, like DVD. THAT would remove all that crazy routing and absolutely kick butt. Of course, getting a burner to work at 32nm or whatever the current density level is could be harder than the material tech itself. DarronArticle: 143384
On Oct 7, 6:15=A0pm, Darron <darron.bl...@gmail.com> wrote: > Nvidia Geforce 7900 GTX? =A0I don't see that on the list of CUDA-enabled > cards. =A0How many multiprocessors does it have? =A0For comparison, my > laptop's Quatro FX 1600M has 4 multiprocessors, for 32 single > precision float ops in parallel. =A0My desktop's GTS 280 has 30 > multiprocessors, for 240 ops in parallel. =A0That's a 7.5X speed > improvement, if the algorithm is perfectly parallelized. =A0Some Nvidia > cards only have 1 multiprocessor... =A0so it'd be 30X faster on a decent > card! =A0Beware of NVIDIA's marketing speak... =A0my laptop's card spec > page says "32 parallel processors" even though that's really 4 SIMD- > like processors @ 8 ops at once. My statement was confusing, sorry. The documented GPU implementation is based on Geforce 7900GTX, which features 8 vertex and 24 pixel processors. Although it is not a CUDA device, some tricks were used on the algorithm to accelerate its execution on the GPU. A better performance is expected on a CUDA-enabled GPU, but I am not sure which one is being used by the GPU guys. The results are not published yet.Article: 143385
Hi there, I would recommend you to watch some of the basic tutorials on the Xilinx.com website just to get the gist of it. Then Install the ise tool and try a simple verilog port implementation like this: http://www.youtube.com/watch?v=3DW1NZ01EEXvc good luck fabrizio On Oct 7, 1:53=A0pm, "mr16" <hk...@163.com> wrote: > Hi , > > really need help ... > > i have a project image scalar. but i was very new in this > and found a topic here and many expertise give suggestion ... > > so try to get help here .... > > my project need to transform RGB (640x480) to RGB(1024X960) > all i have is just a spartan 3E board and google..... > > i am trying to use verilog to write a scalar by linear interpolation > > if i finished this , how can i put this in the board ?? > > and how to read the image and output it ? > > thanks!!!Article: 143386
LucienZ <lucien.zhang@gmail.com> writes: > Many thanks Darron, Glen and many former repliers! I appreciate your > ideas. > > The mentioned algorithm is associated with stereo vision i.e. placing > two cameras like the two human eyes, estimate the depth information > from the image pair, and then interpolate a plausible view like taken > from an intermediate position...The algorithm has already been > implemented on PC (C version) but now can process only one such a pair > in one second (Pentium IV 3.0GHz). GPU/CUDA is much better (10+ fps @ > Nvidia Geforce 7900 GTX). The bosses have some plans to go embedded > and ASIC, and still hope to see competitive results like that on GPU. Can you tell a bit more about the algorithm? > > To achieve the best real-time performance, implementing the algorithm > directly using FPGA fabrics seems to be the only promising solution. If it's amenable to a streaming approach, then it'll suit FPGAs well. If you are jumping around the framebuffer all over the place, you are potentially memory bandwidth/latency limited, and sorting that out will likely help more (even in your current implementation). Apologies if you already know this! Personally, I think stereo matching is more suitable for a low-level "do it in logic" approach than multicore (especially in FPGA, where the processors are not that quick). For example - the biggest Spartan 3A DSP (XC3SD3400A) device can fit 4 microblazes each with independent DDR interfaces and some cache and small "internal-to-the-processor" BRAM memory, with FSL links to communicate. They'll clock at ~60MHz, so that only gives you a couple of hundred MIPS or so to play with (between them). I know this as I have a board with such an FPGA+RAM on it :) The architecture described is for another project which is non-video, it needs 4 processors (one running Petalinux) for task isolation. But the board also has the option of two cameras on it, which I am using a very different architecture for processing. Looking at the lowest level stereo-matching part, it's extremely parallelisable. If you use something like the census transform, then there's lots of single bit arithmetic, which suits the FPGA much better. Using a bunch of block rams as line buffers, I reckon you could run the pipeline at 5x pixel rate (assuming a 25MHz pixel clock), which would match a 5x5 census transform quite nicely. You'd then be getting many hundreds of Mops/sec, although they wouldn't match processor operations, so it's not an apples to apples comparison. Cheers, Martin -- martin.j.thompson@trw.com TRW Conekt - Consultancy in Engineering, Knowledge and Technology http://www.conekt.net/electronics.htmlArticle: 143387
LucienZ wrote: > Hi everyone, I am a master student and this is my first post in this > group. My research group is looking for a multicore embedded platform > for deploying an in-house developed computer vision algorithm. I've > checked some available development boards and now still weigh the > ideas in my mind. > > One solution that interests me is 'synthesizable' processor cores on a > FPGA chip, where I can parallelize the data processing on different > cores. As far as I know, this solution is based on 'synthesizable' > soft-cores, e.g. MicroBlaze, Nios or ARM Cortex-M1 etc. But I've seen > one design article (carried out at NXP, the Netherlands) that claims > they have implemented two ARM926EJ-S processors on a Xilinx Virtex 4 > FPGA chip. I am wondering what technologies enable this > implementation. > > My current knowledge only reaches the level of HDL-based hardware > design on FPGAs (and some higher abstraction levels concerning > software), but I am not very familiar with the 'macrocells', 'hard > core IP' and digital ASIC design. I see some Virtex 4 products come > with embedded hard PowerPC blocks, but I have not seen ARM...So I > would like to ask you experienced scientists these questions: > > 1. How to implement one or more such ARM926EJ-S cores on a FPGA chip > (detailed information on the NXP design article is not available)? I > need some key words in this field and better with some recommended > design articles. > 2. How to interpret the word 'synthesizable' with respect to soft- > cores and macrocells, respectively? > 3. If someone has experiences on multicore parallel processing > development, I would be grateful if you can suggest some nice > development platforms (real-time performance is our top concern). > Probably I need to make a new post later describing the requirements… > > Thanks very much for your attention! > Lucien Hi Lucien, ARM has a university program that allow reserchers to access to ARM technologies. (http://www.arm.com/community/university/) If you want to know the details I can ask the university program team to contact you. I know a number of univerisities are using our processor in their R&D projects. The Microcontroller Prototyping System (MPS) is available mainly for microcontroller prototyping, or early software developmenet before silicon is available. It is also useful for our customers to evaluation our processor cores or other IP. It is running at 50MHz because it contains two FPGAs and there is a bus connection running between them. If you are developing your own FPGA system you can get the processor running at higher clock speed. Currently it has Cortex-M3 and Cortex-M0 versions. Cortex-M1 version will also be available very soon. Cortex-M1 : You can access to Cortex-M1 by - Altera SOPC builder - Actel Libero - Synplicity ReadyIP programme - ARM licensing The processor is optimized for FPGA architecture. Cortex-M3, ARM926EJ-S or other ARM processors : You can access to Cortex-M3 by ARM (university program). You can implement your multi-core design on FPGA, and transfer the design to ASIC. Synthesisable means the design is in form of RTL source code. You can synthesis it to FPGA or ASIC. Please contact our university program team (you can find the email addrees on the "ASK ARM" icon of the following web page. http://www.arm.com/community/university/index.html regards, JosephArticle: 143388
Also SVN usage is preferred, I updated the binary release of xc3sprog on sourceforge to r401. Some prominent changes: - Windows parallel port should work again - XC2C programming from jedec when map files are available (usally in ../Xilinx/xx.x/ISE/xbr/data/*.map) or from a bit.file read back from a device or translated from the jedec file before. - Handle more SPI Flash types with ISF programming - Print unique numbers (DNA, SPI Flash) if such numbers are given - More bscan_spi Bitfiles, e.g. xc6s_cs324.ucf - Allow to choose some input and output file format *.bit, .mcs, .bin) on some devices ( somehow untested) - Test mode now continious reads the IDCODES via the IDCODE command for scope debugging - Given cable subtype automatically sets the cable type - Olimex/Amontec subtypes - javr tested with AT90CAN128 - many more fixes and probably some new bugs... Use at your own risk! Please give feedback! Probably nobody tried 216 on Win32, as nobody told me that with r216 the ioparport driver was broken on windows :-( Bye -- Uwe Bonnes bon@elektron.ikp.physik.tu-darmstadt.de Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt --------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------Article: 143389
Dear All, I am trying to program a Spartan-3A DSP 3400A Edition board on Linux Mint via the usb cable and I am having some problems with IMPACT. I have installed usb-driver following the instructions from here: http://www.george-smart.co.uk/wiki/Xilinx_JTAG_Linux What I'd like to do is to understand if the drivers are working ok. so the question is how? this is what i have tried: With the board connected to the PC (some intel pentium 4 with Linux Mint) via a usb cable, when i start IMPACT from the command line and than I double click to Boundary Scan - right click on the white page and "Initialize chain", I get this: INFO:iMPACT - Open file (null) error. ERROR:iMPACT:2778 - Fail to open log file /home/fabrizio/Desktop/ _impact.log. Welcome to iMPACT iMPACT Version: 11.3 GUI --- Auto connect to cable... AutoDetecting cable. Please wait. PROGRESS_START - Starting Operation. Reusing A001C001 key. Reusing 2401C001 key. OS platform = i686. Connecting to cable (Usb Port - USB21). Checking cable driver. File version of /opt/Xilinx/11.1/ISE/bin/lin/xusbdfwu.hex = 1030. File version of /usr/share/xusbdfwu.hex = 1030. Using libusb. Kernel release = 2.6.28-11-generic. Cable connection failed. Reusing 7801C001 key. Reusing FC01C001 key. OS platform = i686. Connecting to cable (Parallel Port - parport0). libusb-driver.so version: 2009-10-08 18:47:25. parport0: baseAddress=0x0, ecpAddress=0x400 LPT base address = 0000h. ECP base address = 0400h. LPT port is already in use. rc = FFFFFFFFh Cable connection failed. Reusing 7901C001 key. Reusing FD01C001 key. OS platform = i686. Connecting to cable (Parallel Port - parport1). libusb-driver.so version: 2009-10-08 18:47:25. Cable connection failed. Reusing 7A01C001 key. Reusing FE01C001 key. OS platform = i686. Connecting to cable (Parallel Port - parport2). libusb-driver.so version: 2009-10-08 18:47:25. Cable connection failed. Reusing 7B01C001 key. Reusing FF01C001 key. OS platform = i686. Connecting to cable (Parallel Port - parport3). libusb-driver.so version: 2009-10-08 18:47:25. Cable connection failed. PROGRESS_END - End Operation. Elapsed time = 1 sec. Cable autodetection failed. WARNING:iMPACT:923 - Can not find cable, check cable setup ! am I missing something? does the board (jumpers) need to be configured in any particular way? I would appreciate very much any help possible. Especially considering that I have spent about 5 hours on it.... thank you for your time fabrizioArticle: 143390
fab. <fabrizio.tappero@gmail.com> wrote: ... > am I missing something? does the board (jumpers) need to be configured > in any particular way? > I would appreciate very much any help possible. Especially considering > that I have spent about 5 hours on it.... Try first things first. What does lsusb tell? Do you see the initial PID 0007/0009/000d/000f/0013/005 or the final PID 0008? Perhaps you didn't set up things right and udev isn't set up. Bye -- Uwe Bonnes bon@elektron.ikp.physik.tu-darmstadt.de Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt --------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------Article: 143391
thanks !!! >Hi there, >I would recommend you to watch some of the basic tutorials on the >Xilinx.com website just to get the gist of it. >Then Install the ise tool and try a simple verilog port implementation >like this: >http://www.youtube.com/watch?v=3DW1NZ01EEXvc > >good luck >fabrizio > > > >On Oct 7, 1:53=A0pm, "mr16" <hk...@163.com> wrote: >> Hi , >> >> really need help ... >> >> i have a project image scalar. but i was very new in this >> and found a topic here and many expertise give suggestion ... >> >> so try to get help here .... >> >> my project need to transform RGB (640x480) to RGB(1024X960) >> all i have is just a spartan 3E board and google..... >> >> i am trying to use verilog to write a scalar by linear interpolation >> >> if i finished this , how can i put this in the board ?? >> >> and how to read the image and output it ? >> >> thanks!!! > > --------------------------------------- This message was sent using the comp.arch.fpga web interface on http://www.FPGARelated.comArticle: 143392
On Oct 7, 8:30 am, LucienZ <lucien.zh...@gmail.com> wrote: > Many thanks Darron, Glen and many former repliers! I appreciate your > ideas. > > The mentioned algorithm is associated with stereo vision i.e. placing > two cameras like the two human eyes, estimate the depth information > from the image pair, and then interpolate a plausible view like taken > from an intermediate position...The algorithm has already been > implemented on PC (C version) but now can process only one such a pair > in one second (Pentium IV 3.0GHz). GPU/CUDA is much better (10+ fps @ > Nvidia Geforce 7900 GTX). The bosses have some plans to go embedded > and ASIC, and still hope to see competitive results like that on GPU. > > To achieve the best real-time performance, implementing the algorithm > directly using FPGA fabrics seems to be the only promising solution. > What leads me to think about multicores is their popularity nowadays, > and some of my professors are pioneers in the field of MPSoC > architecture and Network-on-Chips. I also see many similar projects > being carried out at some leading universities, e.g. the RAMP @ UC > Berkeleyhttp://ramp.eecs.berkeley.edu/index.php?about > > So regardless of my project, what do you guys think about the multi/ > many-core research everywhere? As FPGA techniques evolve, will this > approach have a bright future compared with customized logic design? > No need to be very serious if you like to say something :). As others have indicated, it may be easy to implement CPUs in FPGAs, but they do not run nearly as fast as high end CPUs in fixed silicon. Martin indicated that if your algorithm is amenable to breaking it into many processes running in parallel, many processors can be used, each doing a part of the calculation with the load fairly balanced. If you look at the DSP chips that include general purpose CPUs and/or specialized processing elements, you might learn something about how to partition your design. A general purpose CPU is good at the control portion of the algorithm which is making decisions and choices. A DSP like processor is good at calculations that involve vectors where the same operation is performed on arrays of data, like your video data. But a DSP processor still has the overhead of fetching instructions and accessing random access memory. Specialized processing elements are designed for a particular task so that it can function without a program (possibly) and may not use random access memory, but instead FIFOs and other structures. All of these can be implemented in an FPGA with the specialized processor taking the most advantage of what FPGAs have to offer in terms of flexibility. If you are just going to use an FPGA as a way to provide a number of standard processors, you will likely find applicatin specific chips available that will do a better job. There are any number of multicore chips out there if you dig a bit. They also may not need to be anything like your PC. Actually, a GPU is a highly parallel design and it will be hard to outperform that in an FPGA unless you use it to tailor the hardware in very specific ways to your algorithm. You might try by expressing your algorithm in a language that uses parallelism. If you can do that, it will likely teach you a lot about how your algorithm can be optimally implemented in an FPGA. This will be very different than a standard sequential program and will be key to any specialized hardware design. RickArticle: 143393
On Oct 8, 9:18=A0pm, Uwe Bonnes <b...@elektron.ikp.physik.tu- darmstadt.de> wrote: > fab. <fabrizio.tapp...@gmail.com> wrote: > > ... > > > am I missing something? does the board (jumpers) need to be configured > > in any particular way? > > I would appreciate very much any help possible. Especially considering > > that I have spent about 5 hours on it.... > > Try first things first. > What does lsusb tell? Do you see the initial PID > 0007/0009/000d/000f/0013/005 or the final PID 0008? Perhaps you didn't se= t > up things right and udev isn't set up. > > Bye > -- > Uwe Bonnes =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0b...@elektron.ikp.physik.tu-dar= mstadt.de > > Institut fuer Kernphysik =A0Schlossgartenstrasse 9 =A064289 Darmstadt > --------- Tel. 06151 162516 -------- Fax. 06151 164321 ---------- Uwe, thanks for your reply. Could you please be a little more precise about what I should do? You clearly assume I know more than what I do. ;) thanks a lot fabrizioArticle: 143394
fab. <fabrizio.tappero@gmail.com> wrote: > On Oct 8, 9:18 pm, Uwe Bonnes <b...@elektron.ikp.physik.tu- > darmstadt.de> wrote: > > fab. <fabrizio.tapp...@gmail.com> wrote: > > > > ... > > > > > am I missing something? does the board (jumpers) need to be configured > > > in any particular way? > > > I would appreciate very much any help possible. Especially considering > > > that I have spent about 5 hours on it.... > > > > Try first things first. > > What does lsusb tell? Do you see the initial PID > > 0007/0009/000d/000f/0013/005 or the final PID 0008? Perhaps you didn't set > > up things right and udev isn't set up. > > ... > Uwe, > thanks for your reply. Could you please be a little more precise about > what I should do? You clearly assume I know more than what I do. ;) > thanks a lot > fabrizio Well, at least I assume you google around for these hints I gave and ask precise questions and not "what should I do". But first, start br installing the original Xilinx drivers as root. The libusb workaround is not needed any longer for recent Impact, as you can read in the release notes (for the parallel port dongle, some workaround is still needed to not have the WinDrv requirement). -- Uwe Bonnes bon@elektron.ikp.physik.tu-darmstadt.de Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt --------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------Article: 143395
On Oct 9, 10:56=A0am, Uwe Bonnes <b...@elektron.ikp.physik.tu- darmstadt.de> wrote: > fab. <fabrizio.tapp...@gmail.com> wrote: > > On Oct 8, 9:18=A0pm, Uwe Bonnes <b...@elektron.ikp.physik.tu- > > darmstadt.de> wrote: > > > fab. <fabrizio.tapp...@gmail.com> wrote: > > > > ... > > > > > am I missing something? does the board (jumpers) need to be configu= red > > > > in any particular way? > > > > I would appreciate very much any help possible. Especially consider= ing > > > > that I have spent about 5 hours on it.... > > > > Try first things first. > > > What does lsusb tell? Do you see the initial PID > > > 0007/0009/000d/000f/0013/005 or the final PID 0008? Perhaps you didn'= t set > > > up things right and udev isn't set up. > > ... > > > Uwe, > > thanks for your reply. Could you please be a little more precise about > > what I should do? You clearly assume I know more than what I do. ;) > > thanks a lot > > fabrizio > > Well, at least I assume you google around for these hints I gave and ask > precise questions and not "what should I do". > > But first, start br installing the original Xilinx drivers as root. The > libusb workaround is not needed any longer for recent Impact, as you can > read in the release notes (for the parallel port dongle, some workaround = is > still needed to not have the WinDrv requirement). > > -- > Uwe Bonnes =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0b...@elektron.ikp.physik.tu-dar= mstadt.de > > Institut fuer Kernphysik =A0Schlossgartenstrasse 9 =A064289 Darmstadt > --------- Tel. 06151 162516 -------- Fax. 06151 164321 ---------- Uwe, thank you for your reply. I have worked on this issue for about 5 hours before posting a question here. I believe I tried to do my best before bothering anybody on this forum. by executing lsusb this is what I get: Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 004 Device 005: ID 046d:c016 Logitech, Inc. M-UV69a/HP M-UV96 Optical Wheel Mouse Bus 004 Device 003: ID 413c:2003 Dell Computer Corp. Keyboard Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 002 Device 007: ID 04b4:7200 Cypress Semiconductor Corp. Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub could you please be so kind to point me to where it is explained how to apply those workarounds you mention? regards fabrizioArticle: 143396
fab. <fabrizio.tappero@gmail.com> wrote: > Bus 002 Device 007: ID 04b4:7200 Cypress Semiconductor Corp. Is this the VID/PID that appears/disappears with plugging/unplugging the board? What board is this? It seems not to be a Xilinx one, as Xilinx boards have Vendor ID 03fd. Did the board come with a Linux installation? It seems not.. So follow first the instructions on http://www.xilinx.com/support/answers/32657.htm Then you need the hex file for your board. The hexfile is buried deep inside the windows installer. Then you need to add a udev rule for above VID/PID to upload the hex file. Look at /etc/udev/rules.d/xusbdfwu.rule that should have been installed by the Xilinx installer and create a similar rule. After un/replugging, the hexfile should be uploaded automatically and VID/PID 03fd/0008 should appear. This VID/PID is used be Impact. You can try uploading the hexfile also with fxload. Be sure to upload a fitting hexfile, perhaps bug the vendor of your board. Enough work fot you for the next hours ;-) And if you get it working, write some summary and post it for later reference... -- Uwe Bonnes bon@elektron.ikp.physik.tu-darmstadt.de Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt --------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------Article: 143397
rickman wrote: > > As others have indicated, it may be easy to implement CPUs in FPGAs, > but they do not run nearly as fast as high end CPUs in fixed silicon. > Martin indicated that if your algorithm is amenable to breaking it > into many processes running in parallel, many processors can be used, > each doing a part of the calculation with the load fairly balanced. > Regarding speed, actually you could have better performance by running a CPU in FPGA compared to standard microcontrollers. Flash memory usually have a flash memory of 25MHz to 50MHz, while block RAM in FPGA can be much faster. So you can run a processor at high clock speed on FPGA (e.g. 100MHz) with zero wait state, compared to microcontroller products running at 100MHz with 1 or 2 wait states on silicon. But of course, if you are comparing the FPGA to really high-end microcontrollers with cache and running at over 200MHz, then the FPGA won't be able to beat that. JosephArticle: 143398
On Oct 9, 6:21=A0am, rickman <gnu...@gmail.com> wrote: > On Oct 7, 8:30 am, LucienZ <lucien.zh...@gmail.com> wrote: > > > > > Many thanks Darron, Glen and many former repliers! I appreciate your > > ideas. > > > The mentioned algorithm is associated with stereo vision i.e. placing > > two cameras like the two human eyes, estimate the depth information > > from the image pair, and then interpolate a plausible view like taken > > from an intermediate position...The algorithm has already been > > implemented on PC (C version) but now can process only one such a pair > > in one second (Pentium IV 3.0GHz). GPU/CUDA is much better (10+ fps @ > > Nvidia Geforce 7900 GTX). The bosses have some plans to go embedded > > and ASIC, and still hope to see competitive results like that on GPU. > > > To achieve the best real-time performance, implementing the algorithm > > directly using FPGA fabrics seems to be the only promising solution. > > What leads me to think about multicores is their popularity nowadays, > > and some of my professors are pioneers in the field of MPSoC > > architecture and Network-on-Chips. I also see many similar projects > > being carried out at some leading universities, e.g. the RAMP @ UC > > Berkeleyhttp://ramp.eecs.berkeley.edu/index.php?about > > > So regardless of my project, what do you guys think about the multi/ > > many-core research everywhere? As FPGA techniques evolve, will this > > approach have a bright future compared with customized logic design? > > No need to be very serious if you like to say something :). > > As others have indicated, it may be easy to implement CPUs in FPGAs, > but they do not run nearly as fast as high end CPUs in fixed silicon. > Martin indicated that if your algorithm is amenable to breaking it > into many processes running in parallel, many processors can be used, > each doing a part of the calculation with the load fairly balanced. > > If you look at the DSP chips that include general purpose CPUs and/or > specialized processing elements, you might learn something about how > to partition your design. =A0A general purpose CPU is good at the > control portion of the algorithm which is making decisions and > choices. =A0A DSP like processor is good at calculations that involve > vectors where the same operation is performed on arrays of data, like > your video data. =A0But a DSP processor still has the overhead of > fetching instructions and accessing random access memory. =A0Specialized > processing elements are designed for a particular task so that it can > function without a program (possibly) and may not use random access > memory, but instead FIFOs and other structures. =A0All of these can be > implemented in an FPGA with the specialized processor taking the most > advantage of what FPGAs have to offer in terms of flexibility. > > If you are just going to use an FPGA as a way to provide a number of > standard processors, you will likely find applicatin specific chips > available that will do a better job. =A0There are any number of > multicore chips out there if you dig a bit. =A0They also may not need to > be anything like your PC. =A0Actually, a GPU is a highly parallel design > and it will be hard to outperform that in an FPGA unless you use it to > tailor the hardware in very specific ways to your algorithm. > > You might try by expressing your algorithm in a language that uses > parallelism. =A0If you can do that, it will likely teach you a lot about > how your algorithm can be optimally implemented in an FPGA. =A0This will > be very different than a standard sequential program and will be key > to any specialized hardware design. > > Rick Thanks a lot Martin and rickman! Martin, an introduction to the stereo matching can be found in this slides: http://www.vision.deis.unibo.it/smatt/Seminars/StereoVision.pdf and related people are gathering here: http://vision.middlebury.edu/ I bet you already have backgrounds in the stereo matching topics. Each of the stereo matching algorithms has its own features; however many of them include intensive use Sum-of-Absolute-Difference (SAD) to find out the best matches between the left and right image. Simply speaking, if this is a pixel block in the left image (pixel value 5 is the anchor): L[][] =3D 7 8 9 4 5 6 1 2 3 and this is the search space in the right image: R[][] =3D 9 9 9 9 9 9 9 0 0 7 8 9 0 0 0 0 4 5 6 0 0 0 0 1 2 3 0 0 9 9 9 9 9 9 9 The aim is to find out the best match, and in this case it is obvious. But in practice there are many other problems, and in our case the pixel block and search space are NOT rectangular. Any recommended structures for this? I think the systolic array is a promising solution (thanks to Glen); although by now it is not clear to me how to do the mapping +_+. Thanks for you to mention the 'census transform', I also bumped into the concept when looking into a reference design (DeepSea Stereo Vision System) :). I will pay more attention to it. I did some related literature studies these days (not complete yet...) and the most impressive solution did use a direct hardware implementation of the SADs. One claims achieving 600 FPS at 450*375 resolution (of course more parameters are involved). As comparison, the same algorithm with software optimizations (trade memory for fast calculations) can only achieve 1.48 FPS (3GHz Pentium IV). A practical implementation should be a streamed, pipelined approach which interfaces cameras as well as a stereoscopic display. At the beginning I think some external memories should be used to provide ground truth test data. I was just thinking too 'software' and 'sequential', even if when I talked about parallel (simply divide the work load by the number of processors). And I overestimated the computation power of embedded processors (I took many related courses, paper work...).Article: 143399
On Oct 8, 1:03=A0pm, Joseph Yiu <joseph....@somewhereinarm.com> wrote: > LucienZ wrote: > > Hi everyone, I am a master student and this is my first post in this > > group. My research group is looking for a multicore embedded platform > > for deploying an in-house developed computer vision algorithm. I've > > checked some available development boards and now still weigh the > > ideas in my mind. > > > One solution that interests me is 'synthesizable' processor cores on a > > FPGA chip, where I can parallelize the data processing on different > > cores. As far as I know, this solution is based on 'synthesizable' > > soft-cores, e.g. MicroBlaze, Nios or ARM Cortex-M1 etc. But I've seen > > one design article (carried out at NXP, the Netherlands) that claims > > they have implemented two ARM926EJ-S processors on a Xilinx Virtex 4 > > FPGA chip. I am wondering what technologies enable this > > implementation. > > > My current knowledge only reaches the level of HDL-based hardware > > design on FPGAs (and some higher abstraction levels concerning > > software), but I am not very familiar with the 'macrocells', 'hard > > core IP' and digital ASIC design. I see some Virtex 4 products come > > with embedded hard PowerPC blocks, but I have not seen ARM...So I > > would like to ask you experienced scientists these questions: > > > 1. How to implement one or more such ARM926EJ-S cores on a FPGA chip > > (detailed information on the NXP design article is not available)? I > > need some key words in this field and better with some recommended > > design articles. > > 2. How to interpret the word 'synthesizable' with respect to soft- > > cores and macrocells, respectively? > > 3. If someone has experiences on multicore parallel processing > > development, I would be grateful if you can suggest some nice > > development platforms (real-time performance is our top concern). > > Probably I need to make a new post later describing the requirements=85 > > > Thanks very much for your attention! > > Lucien > > Hi Lucien, > > ARM has a university program that allow reserchers to access to ARM > technologies. (http://www.arm.com/community/university/) > If you want to know the details I can ask the university program team to > contact you. =A0I know a number of univerisities are using our processor > in their R&D projects. > > The Microcontroller Prototyping System (MPS) is available mainly for > microcontroller prototyping, or early software developmenet before > silicon is available. =A0It is also useful for our customers to evaluatio= n > our processor cores or other IP. It is running at 50MHz because it > contains two FPGAs and there is a bus connection running between them. > If you are developing your own FPGA system you can get the processor > running at higher clock speed. =A0Currently it has Cortex-M3 and Cortex-M= 0 > versions. Cortex-M1 version will also be available very soon. > > Cortex-M1 : You can access to Cortex-M1 by > - Altera SOPC builder > - Actel Libero > - Synplicity ReadyIP programme > - ARM licensing > The processor is optimized for FPGA architecture. > > Cortex-M3, ARM926EJ-S or other ARM processors : You can access to > Cortex-M3 by ARM (university program). You can implement your multi-core > design on FPGA, and transfer the design to ASIC. > > Synthesisable means the design is in form of RTL source code. You can > synthesis it to FPGA or ASIC. > > Please contact our university program team (you can find the email > addrees on the "ASK ARM" icon of the following web page.http://www.arm.co= m/community/university/index.html > > regards, > Joseph Joseph you answered my very beginning questions, thanks very much. I just joined the research team for one month, and not very clear about the road-maps. One way is to build a FPGA prototype with fast computations which outperform PC CPU and GPU, and then go to an ASIC implementation. Another direction is to do some MPSoC related research, while the stereo matching is a potential application on it, though with moderate performance. Both of them are active here, but I bet the algorithm developers would prefer to see a result based on the first approach. As some of you indicated, later on one or two general-purpose processors like ARM could be employed for running OS and management tasks. I am sure to contact the ARM university program once the promoters decide to use some.
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z