Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
On 5/18/2017 6:10 PM, Tom Gardner wrote: > On 18/05/17 18:05, rickman wrote: >> On 5/18/2017 12:08 PM, lasselangwadtchristensen@gmail.com wrote: >>> Den torsdag den 18. maj 2017 kl. 15.48.19 UTC+2 skrev Theo Markettos: >>>> Tim Wescott <tim@seemywebsite.really> wrote: >>>>> So, you have two separate implementations of the system -- how do you >>>>> know that they aren't both identically buggy? >>>> >>>> Is that the problem with any testing framework? >>>> Quis custodiet ipsos custodes? >>>> Who tests the tests? >>> >>> the test? >>> >>> if two different implementations agree, it adds a bit more confidence >>> that an >>> implementation agreeing with itself. >> >> The point is if both designs were built with the same misunderstanding >> of the >> requirements, they could both be wrong. While not common, this is not >> unheard >> of. It could be caused by cultural biases (each company is a culture) >> or a >> poorly written specification. > > The prior question is whether the specification is correct. > > Or more realistically, to what extent it is/isn't correct, > and the best set of techniques and processes for reducing > the imperfection. > > And that leads to XP/Agile concepts, to deal with the suboptimal > aspects of Waterfall Development. > > Unfortunately the zealots can't accept that what you gain > on the swings you lose on the roundabouts. I'm sure you know exactly what you meant. :) -- Rick CArticle: 160076
On 5/18/2017 6:06 PM, Tom Gardner wrote: > On 18/05/17 18:01, rickman wrote: >> On 5/18/2017 12:14 PM, Tom Gardner wrote: >> >>> My preference is anything that avoids deeply nested >>> if/the/else/switch statements, since they rapidly >>> become a maintenance nightmare. (I've seen nesting >>> 10 deep!). >> >> Such deep layering likely indicates a poor problem decomposition, but >> it is hard >> to say without looking at the code. > > It was a combination of technical and personnel factors. > The overriding business imperative was, at each stage, > to make the smallest and /incrementally/ cheapest modification. > > The road to hell is paved with good intentions. If we are bandying about platitudes I will say, penny wise, pound foolish. >> Normally there is a switch for the state variable and conditionals >> within each >> case to evaluate inputs. Typically this is not so complex. > > This was an inherently complex task that was ineptly > implemented. I'm not going to define how ineptly, > because you wouldn't believe it. I only believe it > because I saw it, and boggled. Good design is about simplifying the complex. Ineptitude is a separate issue and can ruin even simple designs. >>> Also, design patterns that enable logging of events >>> and states should be encouraged and left in the code >>> at runtime. I've found them /excellent/ techniques for >>> correctly deflecting blame onto the other party :) >>> >>> Should you design in a proper FSM style/language >>> and autogenerate the executable source code, or code >>> directly in the source language? Difficult, but there >>> are very useful OOP design patterns that make it easy. >> >> Designing in anything other than the HDL you are using increases the >> complexity >> of backing up your tools. In addition to source code, it can be >> important to be >> able to restore the development environment. I don't bother with FSM >> tools >> other than tools that help me think. > > Very true. I use that argument, and more, to caution > people against inventing Domain Specific Languages > when they should be inventing Domain Specific Libraries. > > Guess which happened in the case I alluded to above. An exception to that rule is programming in Forth. It is a language where programming *is* extending the language. There are many situations where the process ends up with programs written what appears to be a domain specific language, but working quite well. So don't throw the baby out with the bath when trying to save designers from themselves. >>> And w.r.t. TDD, should your tests demonstrate the >>> FSM's design is correct or that the implementation >>> artefacts are correct? >> >> I'll have to say that is a new term to me, "implementation >> artefacts[sic]". Can >> you explain? > > Nothing non-obvious. An implementation artefact is > something that is part of /a/ specific design implementation, > as opposed to something that is an inherent part of > /the/ problem. Why would I want to test design artifacts? The tests in TDD are developed from the requirements, not the design, right? >> I test behavior. Behavior is what is specified for a design, so why >> would you >> test anything else? > > Clearly you haven't practiced XP/Agile/Lean development > practices. > > You sound like a 20th century hardware engineer, rather > than a 21st century software "engineer". You must learn > to accept that all new things are, in every way, better > than the old ways. > > Excuse me while I go and wash my mouth out with soap. Lol -- Rick CArticle: 160077
On 19/05/17 01:53, rickman wrote: > On 5/18/2017 6:06 PM, Tom Gardner wrote: >> On 18/05/17 18:01, rickman wrote: >>> On 5/18/2017 12:14 PM, Tom Gardner wrote: >>>> Also, design patterns that enable logging of events >>>> and states should be encouraged and left in the code >>>> at runtime. I've found them /excellent/ techniques for >>>> correctly deflecting blame onto the other party :) >>>> >>>> Should you design in a proper FSM style/language >>>> and autogenerate the executable source code, or code >>>> directly in the source language? Difficult, but there >>>> are very useful OOP design patterns that make it easy. >>> >>> Designing in anything other than the HDL you are using increases the >>> complexity >>> of backing up your tools. In addition to source code, it can be >>> important to be >>> able to restore the development environment. I don't bother with FSM >>> tools >>> other than tools that help me think. >> >> Very true. I use that argument, and more, to caution >> people against inventing Domain Specific Languages >> when they should be inventing Domain Specific Libraries. >> >> Guess which happened in the case I alluded to above. > > An exception to that rule is programming in Forth. It is a language where > programming *is* extending the language. There are many situations where the > process ends up with programs written what appears to be a domain specific > language, but working quite well. So don't throw the baby out with the bath > when trying to save designers from themselves. I see why you are saying that, but I disagree. The Forth /language/ is pleasantly simple. The myriad Forth words (e.g. cmove, catch, canonical etc) in most Forth environments are part of the "standard library", not the language per se. Forth words are more-or-less equivalent to functions in a trad language. Defining new words is therefore like defining a new function. Just as defining new words "looks like" defining a DSL, so - at the "application level" - defining new functions also looks like defining a new DSL. Most importantly, both new functions and new words automatically have the invaluable tools support without having to do anything. With a new DSL, all the tools (from parsers to browsers) also have to be built. >>>> And w.r.t. TDD, should your tests demonstrate the >>>> FSM's design is correct or that the implementation >>>> artefacts are correct? >>> >>> I'll have to say that is a new term to me, "implementation >>> artefacts[sic]". Can >>> you explain? >> >> Nothing non-obvious. An implementation artefact is >> something that is part of /a/ specific design implementation, >> as opposed to something that is an inherent part of >> /the/ problem. > > Why would I want to test design artifacts? The tests in TDD are developed from > the requirements, not the design, right? Ideally, but only to some extent. TDD frequently used at a much lower level, where it is usually divorced from specs. TDD is also frequently used with - and implemented in the form of - unit tests, which are definitely divorced from the spec. Hence, in the real world, there is bountiful opportunity for diversion from the obvious pure sane course. And Murphy's Law definitely applies. Having said that, both TDD and Unit Testing are valuable additions to a the designer's toolchest. But they must be used intelligently[1], and are merely codifications of things most of us have been doing for decades. No change there, then. [1] be careful of external consultants proselytising the teaching courses they are selling. They have a hammer, and everything /does/ look like a nail.Article: 160078
On 5/19/2017 4:59 AM, Tom Gardner wrote: > On 19/05/17 01:53, rickman wrote: >> On 5/18/2017 6:06 PM, Tom Gardner wrote: >>> On 18/05/17 18:01, rickman wrote: >>>> On 5/18/2017 12:14 PM, Tom Gardner wrote: >>>>> Also, design patterns that enable logging of events >>>>> and states should be encouraged and left in the code >>>>> at runtime. I've found them /excellent/ techniques for >>>>> correctly deflecting blame onto the other party :) >>>>> >>>>> Should you design in a proper FSM style/language >>>>> and autogenerate the executable source code, or code >>>>> directly in the source language? Difficult, but there >>>>> are very useful OOP design patterns that make it easy. >>>> >>>> Designing in anything other than the HDL you are using increases the >>>> complexity >>>> of backing up your tools. In addition to source code, it can be >>>> important to be >>>> able to restore the development environment. I don't bother with FSM >>>> tools >>>> other than tools that help me think. >>> >>> Very true. I use that argument, and more, to caution >>> people against inventing Domain Specific Languages >>> when they should be inventing Domain Specific Libraries. >>> >>> Guess which happened in the case I alluded to above. >> >> An exception to that rule is programming in Forth. It is a language >> where >> programming *is* extending the language. There are many situations >> where the >> process ends up with programs written what appears to be a domain >> specific >> language, but working quite well. So don't throw the baby out with >> the bath >> when trying to save designers from themselves. > > I see why you are saying that, but I disagree. The > Forth /language/ is pleasantly simple. The myriad > Forth words (e.g. cmove, catch, canonical etc) in most > Forth environments are part of the "standard library", > not the language per se. > > Forth words are more-or-less equivalent to functions > in a trad language. Defining new words is therefore > like defining a new function. I can't find a definition for "trad language". > Just as defining new words "looks like" defining > a DSL, so - at the "application level" - defining > new functions also looks like defining a new DSL. > > Most importantly, both new functions and new words > automatically have the invaluable tools support without > having to do anything. With a new DSL, all the tools > (from parsers to browsers) also have to be built. I have no idea what distinction you are trying to make. Why is making new tools a necessary part of defining a domain specific language? If it walks like a duck... FRONT LED ON TURN That could be the domain specific language under Forth for turning on the front LED of some device. Sure looks like a language to me. I have considered writing a parser for a type of XML file simply by defining the syntax as Forth words. So rather than "process" the file with an application program, the Forth compiler would "compile" the file. I'd call that a domain specific language. >>>>> And w.r.t. TDD, should your tests demonstrate the >>>>> FSM's design is correct or that the implementation >>>>> artefacts are correct? >>>> >>>> I'll have to say that is a new term to me, "implementation >>>> artefacts[sic]". Can >>>> you explain? >>> >>> Nothing non-obvious. An implementation artefact is >>> something that is part of /a/ specific design implementation, >>> as opposed to something that is an inherent part of >>> /the/ problem. >> >> Why would I want to test design artifacts? The tests in TDD are >> developed from >> the requirements, not the design, right? > > Ideally, but only to some extent. TDD frequently used > at a much lower level, where it is usually divorced > from specs. There is a failure in the specification process. The projects I have worked on which required a formal requirements development process applied it to every level. So every piece of code that would be tested had requirements which defined the tests. > TDD is also frequently used with - and implemented in > the form of - unit tests, which are definitely divorced > from the spec. They are? How then are the tests generated? > Hence, in the real world, there is bountiful opportunity > for diversion from the obvious pure sane course. And > Murphy's Law definitely applies. > > Having said that, both TDD and Unit Testing are valuable > additions to a the designer's toolchest. But they must > be used intelligently[1], and are merely codifications of > things most of us have been doing for decades. > > No change there, then. > > [1] be careful of external consultants proselytising > the teaching courses they are selling. They have a > hammer, and everything /does/ look like a nail. -- Rick CArticle: 160079
On 05/17/2017 11:33 AM, Tim Wescott wrote: snip > > It's basically a bit of structure on top of some common-sense > methodologies (i.e., design from the top down, then code from the bottom > up, and test the hell out of each bit as you code it). > Other than occasional test fixtures, most of my FPGA work in recent years has been FPGA verification of the digital sections of mixed signal ASICs. Your description sounds exactly like the methodology used on both the product ASIC side and the verification FPGA side. After the FPGA is built and working, you test the hell out of the FPGA system and the product ASIC with completely separate tools and techniques. When problems are discovered, you often fall back to either the ASIC or FPGA simulation test benches to isolate the issue. The importance of good, detailed, self checking, top level test benches cannot be over-stressed. For mid and low level blocks that are complex or likely to see significant iterations (due to design spec changes) self checking test benches are worth the effort. My experience with manual checking test benches is that the first time you go through it, you remember to examine all the important spots, the thoroughness of the manual checking on subsequent runs falls off fast. Giving a manual check test bench to someone else, is a waste of both of your time. BobHArticle: 160080
I've solved the problem with setting up a new project for each testbench by= not using any projects. Vivado has a non project mode when you write a sim= ple tcl script which tells vivado what sources to use and what to do with t= hem. I have a source directory with hdl files in our repository and dozens of sc= ripts.Each script takes sources from the same directory and creates its own= temp working directory and runs its test there. I also have a script which= runs all the tests at once without GUI. I run it right before coming home.= When I come at work in the next morning I run a script which analyses repo= rts looking for errors. If there is an error somewhere, I run the correspon= ding test script with GUI switched on to look at waveforms. Non-project mode not only allows me to run different tests simultaneously f= or the same sources, but also allows me to run multiple synthesis for them.= =20 I use only this mode for more then 2 years and absolutely happy with that. = Highly recommend!Article: 160081
On 5/19/2017 6:31 PM, Ilya Kalistru wrote: > I've solved the problem with setting up a new project for each testbench by not using any projects. Vivado has a non project mode when you write a simple tcl script which tells vivado what sources to use and what to do with them. > > I have a source directory with hdl files in our repository and dozens of scripts.Each script takes sources from the same directory and creates its own temp working directory and runs its test there. I also have a script which runs all the tests at once without GUI. I run it right before coming home. When I come at work in the next morning I run a script which analyses reports looking for errors. If there is an error somewhere, I run the corresponding test script with GUI switched on to look at waveforms. > > Non-project mode not only allows me to run different tests simultaneously for the same sources, but also allows me to run multiple synthesis for them. > > I use only this mode for more then 2 years and absolutely happy with that. Highly recommend! Interesting. Vivado is what, Xilinx? -- Rick CArticle: 160082
Den l=C3=B8rdag den 20. maj 2017 kl. 00.57.24 UTC+2 skrev rickman: > On 5/19/2017 6:31 PM, Ilya Kalistru wrote: > > I've solved the problem with setting up a new project for each testbenc= h by not using any projects. Vivado has a non project mode when you write a= simple tcl script which tells vivado what sources to use and what to do wi= th them. > > > > I have a source directory with hdl files in our repository and dozens o= f scripts.Each script takes sources from the same directory and creates its= own temp working directory and runs its test there. I also have a script w= hich runs all the tests at once without GUI. I run it right before coming h= ome. When I come at work in the next morning I run a script which analyses = reports looking for errors. If there is an error somewhere, I run the corre= sponding test script with GUI switched on to look at waveforms. > > > > Non-project mode not only allows me to run different tests simultaneous= ly for the same sources, but also allows me to run multiple synthesis for t= hem. > > > > I use only this mode for more then 2 years and absolutely happy with th= at. Highly recommend! >=20 > Interesting. Vivado is what, Xilinx? yes=20Article: 160083
Yes. It is xilinx vivado. Another important advantage of non-project mode is that it is fully compati= ble with source control systems. When you don't have projects, you don't ha= ve piles of junk files of unknown purpose that changes every time you open = a project or run a simulation. In non-project mode you have only hdl source= s and tcl scripts. Therefore all information is stored in source control sy= stem but when you commit changes you commit only changes you have done, not= random changes of unknown project files. In this situation work with IP cores a bit trickier, but not much. Consider= ing that you don't change ip's very often, it's not a problem at all. I see that very small number of hdl designers know and use this mode. Maybe= I should write an article about it. Where it would be appropriate to publi= sh it?Article: 160084
On 5/20/2017 3:11 AM, Ilya Kalistru wrote: > Yes. It is xilinx vivado. > > Another important advantage of non-project mode is that it is fully compatible with source control systems. When you don't have projects, you don't have piles of junk files of unknown purpose that changes every time you open a project or run a simulation. In non-project mode you have only hdl sources and tcl scripts. Therefore all information is stored in source control system but when you commit changes you commit only changes you have done, not random changes of unknown project files. > > In this situation work with IP cores a bit trickier, but not much. Considering that you don't change ip's very often, it's not a problem at all. > > I see that very small number of hdl designers know and use this mode. Maybe I should write an article about it. Where it would be appropriate to publish it? Doesn't the tool still generate all the intermediate files? The Lattice tool (which uses Synplify for synthesis) creates a huge number of files that only the tools look at. They aren't really project files, they are various intermediate files. Living in the project main directory they really get in the way. -- Rick CArticle: 160085
On 20/05/17 08:11, Ilya Kalistru wrote: > Yes. It is xilinx vivado. > > Another important advantage of non-project mode is that it is fully compatible with source control systems. When you don't have projects, you don't have piles of junk files of unknown purpose that changes every time you open a project or run a simulation. In non-project mode you have only hdl sources and tcl scripts. Therefore all information is stored in source control system but when you commit changes you commit only changes you have done, not random changes of unknown project files. > > In this situation work with IP cores a bit trickier, but not much. Considering that you don't change ip's very often, it's not a problem at all. > > I see that very small number of hdl designers know and use this mode. Maybe I should write an article about it. Where it would be appropriate to publish it? That would be useful; the project mode is initially appealing, but the splattered files and SCCS give me the jitters. Publish it everywhere! Any blog and bulletin board you can find, not limited to those dedicated to Xilinx.Article: 160086
Ilya Kalistru <stebanoid@gmail.com> wrote: > I've solved the problem with setting up a new project for each testbench > by not using any projects. Vivado has a non project mode when you write a > simple tcl script which tells vivado what sources to use and what to do > with them. Something similar is possible with Intel FPGA (Altera) Quartus. You need one tcl file for settings, and building is a few commands which we run from a Makefile. All our builds run in continuous integration, which extracts logs and timing/area numbers. The bitfiles then get downloaded and booted on FPGA, then the test suite and benchmarks are run automatically to monitor performance. Numbers then come back to continuous integration for graphing. TheoArticle: 160087
On Saturday, May 20, 2017 at 11:17:19 AM UTC+3, rickman wrote: > On 5/20/2017 3:11 AM, Ilya Kalistru wrote: > > Yes. It is xilinx vivado. > > > > Another important advantage of non-project mode is that it is fully com= patible with source control systems. When you don't have projects, you don'= t have piles of junk files of unknown purpose that changes every time you o= pen a project or run a simulation. In non-project mode you have only hdl so= urces and tcl scripts. Therefore all information is stored in source contro= l system but when you commit changes you commit only changes you have done,= not random changes of unknown project files. > > > > In this situation work with IP cores a bit trickier, but not much. Cons= idering that you don't change ip's very often, it's not a problem at all. > > > > I see that very small number of hdl designers know and use this mode. M= aybe I should write an article about it. Where it would be appropriate to p= ublish it? >=20 > Doesn't the tool still generate all the intermediate files? The Lattice= =20 > tool (which uses Synplify for synthesis) creates a huge number of files= =20 > that only the tools look at. They aren't really project files, they are= =20 > various intermediate files. Living in the project main directory they=20 > really get in the way. >=20 > --=20 >=20 > Rick C It does. You can tell the tool where to generate these files and I do it in= a special directory. It is easy to delete them, and you don't have to add = them to your source control system. As all your important stuff is in src d= ir and all your junk is in sim_* dirs it is easy to manage them. That's what I have in my repository Project_name \sim_Test1NameDir *sim_test2NameDir *sim_test3NameDir |\ | *sim_report.log | *other_junk *synth_Module1Dir *synth_Module2Dir |\ | *Results | | \ | | *Reports | | *bitfiles | *Some_junk *src |\ | *DesignChunk1SrcDir | *DesignChunk2SrcDir *sim_test1.tcl *sim_test2.tcl *sim_test3.tcl *synth_Module1.tcl *synth_Module2.tclArticle: 160088
> All our builds run in continuous integration, which extracts logs and > timing/area numbers. The bitfiles then get downloaded and booted on FPGA, > then the test suite and benchmarks are run automatically to monitor > performance. Numbers then come back to continuous integration for graphing. > > Theo Nice!Article: 160089
Hello, here is my question: Purpose: realize face detection on zynq-7020 SoC Platform: Zedboard with OV5640 camera Completed work: capturing video from camera, writing into DDR for storage and reading from DDR for display Question: how to realize a face detection IP and its throughput can reach 30fps(pixel 320*240) Here are my jobs: Base on the Viola Jones algorithm, using HLS(high level synthesis) tool to realize hardware IP from a C++ design And this is my reference: https://github.com/cornell-zhang/facedetect-fpga I have simulate and synthesize it into hardware IP, but its throughput does not reach the goal because the interval and latency are very large. (latency is 338(min) to 576593236(max), interval is 336(min) to 142310514002(max)) Looking into the code, I find the latency is mainly caused by the following for loops, but I don't know how to optimize the latency compromising between area. So you may help me a lot with these: 1.Another way to realize face detection on zynq-7020? 2.How to test the throughput of my system and the relation between real throughput and the synthesis result? 3.Any way to optimize the following for loops? Looking forward to your reply. Please feel free to contact me at anytime. Thanks. ----loop1: imageScalerL1: for ( i = 0 ; i < IMAGE_HEIGHT ; i++ ){ imageScalerL1_1: for (j=0;j < IMAGE_WIDTH ;j++) { #pragma HLS pipeline if ( j < w2 && i < h2 ) IMG1_data[i][j] = Data[(i*y_ratio)>>16][(j*x_ratio)>>16]; } } ----loop2: Pixely: for( y = 0; y < sum_row; y++ ){ Pixelx: for ( x = 0; x < sum_col; x++ ){ /* Updates for Integral Image Window Buffer (I) */ SetIIu: for ( u = 0; u < WINDOW_SIZE; u++){ #pragma HLS unroll SetIIj: for ( v = 0; v < WINDOW_SIZE; v++ ){ #pragma HLS unroll II[u][v] = II[u][v] + ( I[u][v+1] - I[u][0] ); } } /* Updates for Square Image Window Buffer (SI) */ SII[0][0] = SII[0][0] + ( SI[0][1] - SI[0][0] ); SII[0][1] = SII[0][1] + ( SI[0][WINDOW_SIZE] - SI[0][0] ); SII[1][0] = SII[1][0] + ( SI[WINDOW_SIZE-1][1] - SI[WINDOW_SIZE-1][0] ); SII[1][1] = SII[1][1] + ( SI[WINDOW_SIZE-1][WINDOW_SIZE] - SI[WINDOW_SIZE-1][0] ); /* Updates for Image Window Buffer (I) and Square Image Window Bufer (SI) */ SetIj: for( j = 0; j < 2*WINDOW_SIZE-1; j++){ #pragma HLS unroll SetIi: for( i = 0; i < WINDOW_SIZE; i++ ){ #pragma HLS unroll if( i+j != 2*WINDOW_SIZE-1 ){ I[i][j] = I[i][j+1]; SI[i][j] = SI[i][j+1]; } else if ( i > 0 ){ I[i][j] = I[i][j+1] + I[i-1][j+1]; SI[i][j] = SI[i][j+1] + SI[i-1][j+1]; } } } // Last column of the I[][] and SI[][] matrix Ilast: for( i = 0; i < WINDOW_SIZE-1; i++ ){ #pragma HLS unroll I[i][2*WINDOW_SIZE-1] = L[i][x]; SI[i][2*WINDOW_SIZE-1] = L[i][x]*L[i][x]; } I[WINDOW_SIZE-1][2*WINDOW_SIZE-1] = IMG1_data[y][x]; SI[WINDOW_SIZE-1][2*WINDOW_SIZE-1] = IMG1_data[y][x]*IMG1_data[y][x]; /** Updates for Image Line Buffer (L) **/ LineBuf: for( k = 0; k < WINDOW_SIZE-2; k++ ){ #pragma HLS unroll L[k][x] = L[k+1][x]; } L[WINDOW_SIZE-2][x] = IMG1_data[y][x]; /* Pass the Integral Image Window buffer through Cascaded Classifier. Only pass * when the integral image window buffer has flushed out the initial garbage data */ if ( element_counter >= ( ( (WINDOW_SIZE-1)*sum_col + WINDOW_SIZE ) + WINDOW_SIZE -1 ) ) { /* Sliding Window should not go beyond the boundary */ if ( x_index < ( sum_col - (WINDOW_SIZE-1) ) && y_index < ( sum_row - (WINDOW_SIZE-1) ) ){ p.x = x_index; p.y = y_index; result = cascadeClassifier ( p, II, SII ); if ( result > 0 ) { MyRect r = {myRound(p.x*factor), myRound(p.y*factor), winSize.width, winSize.height}; AllCandidates_x[*AllCandidates_size]=r.x; AllCandidates_y[*AllCandidates_size]=r.y; AllCandidates_w[*AllCandidates_size]=r.width; AllCandidates_h[*AllCandidates_size]=r.height; *AllCandidates_size=*AllCandidates_size+1; } }// inner if if ( x_index < sum_col-1 ) x_index = x_index + 1; else{ x_index = 0; y_index = y_index + 1; } } // outer if element_counter +=1; } }Article: 160090
yuning he wrote: > Hello, here is my question: > Purpose: realize face detection on zynq-7020 SoC > Platform: Zedboard with OV5640 camera > Completed work: capturing video from camera, writing into DDR for storage and reading from DDR for display > Question: how to realize a face detection IP and its throughput can reach 30fps(pixel 320*240) > > Here are my jobs: > Base on the Viola Jones algorithm, using HLS(high level synthesis) tool to realize hardware IP from a C++ design > And this is my reference: https://github.com/cornell-zhang/facedetect-fpga > > I have simulate and synthesize it into hardware IP, but its throughput does not reach the goal because the interval and latency are very large. (latency is 338(min) to 576593236(max), interval is 336(min) to 142310514002(max)) > Looking into the code, I find the latency is mainly caused by the following for loops, but I don't know how to optimize the latency compromising between area. > > So you may help me a lot with these: > 1.Another way to realize face detection on zynq-7020? > 2.How to test the throughput of my system and the relation between real throughput and the synthesis result? > 3.Any way to optimize the following for loops? > > Looking forward to your reply. Please feel free to contact me at anytime. > Thanks. > > ----loop1: > imageScalerL1: for ( i = 0 ; i < IMAGE_HEIGHT ; i++ ){ > imageScalerL1_1: for (j=0;j < IMAGE_WIDTH ;j++) { > #pragma HLS pipeline > if ( j < w2 && i < h2 ) > IMG1_data[i][j] = Data[(i*y_ratio)>>16][(j*x_ratio)>>16]; > } > } > ----loop2: > Pixely: for( y = 0; y < sum_row; y++ ){ > Pixelx: for ( x = 0; x < sum_col; x++ ){ > /* Updates for Integral Image Window Buffer (I) */ > SetIIu: for ( u = 0; u < WINDOW_SIZE; u++){ > #pragma HLS unroll > SetIIj: for ( v = 0; v < WINDOW_SIZE; v++ ){ > #pragma HLS unroll > II[u][v] = II[u][v] + ( I[u][v+1] - I[u][0] ); > } > } > > /* Updates for Square Image Window Buffer (SI) */ > SII[0][0] = SII[0][0] + ( SI[0][1] - SI[0][0] ); > SII[0][1] = SII[0][1] + ( SI[0][WINDOW_SIZE] - SI[0][0] ); > SII[1][0] = SII[1][0] + ( SI[WINDOW_SIZE-1][1] - SI[WINDOW_SIZE-1][0] ); > SII[1][1] = SII[1][1] + ( SI[WINDOW_SIZE-1][WINDOW_SIZE] - SI[WINDOW_SIZE-1][0] ); > > /* Updates for Image Window Buffer (I) and Square Image Window Bufer (SI) */ > SetIj: for( j = 0; j < 2*WINDOW_SIZE-1; j++){ > #pragma HLS unroll > SetIi: for( i = 0; i < WINDOW_SIZE; i++ ){ > #pragma HLS unroll > if( i+j != 2*WINDOW_SIZE-1 ){ > I[i][j] = I[i][j+1]; > SI[i][j] = SI[i][j+1]; > } > else if ( i > 0 ){ > I[i][j] = I[i][j+1] + I[i-1][j+1]; > SI[i][j] = SI[i][j+1] + SI[i-1][j+1]; > } > } > } > // Last column of the I[][] and SI[][] matrix > Ilast: for( i = 0; i < WINDOW_SIZE-1; i++ ){ > #pragma HLS unroll > I[i][2*WINDOW_SIZE-1] = L[i][x]; > SI[i][2*WINDOW_SIZE-1] = L[i][x]*L[i][x]; > } > I[WINDOW_SIZE-1][2*WINDOW_SIZE-1] = IMG1_data[y][x]; > SI[WINDOW_SIZE-1][2*WINDOW_SIZE-1] = IMG1_data[y][x]*IMG1_data[y][x]; > > /** Updates for Image Line Buffer (L) **/ > LineBuf: for( k = 0; k < WINDOW_SIZE-2; k++ ){ > #pragma HLS unroll > L[k][x] = L[k+1][x]; > } > L[WINDOW_SIZE-2][x] = IMG1_data[y][x]; > > /* Pass the Integral Image Window buffer through Cascaded Classifier. Only pass > * when the integral image window buffer has flushed out the initial garbage data */ > if ( element_counter >= ( ( (WINDOW_SIZE-1)*sum_col + WINDOW_SIZE ) + WINDOW_SIZE -1 ) ) { > > /* Sliding Window should not go beyond the boundary */ > if ( x_index < ( sum_col - (WINDOW_SIZE-1) ) && y_index < ( sum_row - (WINDOW_SIZE-1) ) ){ > p.x = x_index; > p.y = y_index; > > result = cascadeClassifier ( p, II, SII ); > > if ( result > 0 ) > { > MyRect r = {myRound(p.x*factor), myRound(p.y*factor), winSize.width, winSize.height}; > AllCandidates_x[*AllCandidates_size]=r.x; > AllCandidates_y[*AllCandidates_size]=r.y; > AllCandidates_w[*AllCandidates_size]=r.width; > AllCandidates_h[*AllCandidates_size]=r.height; > *AllCandidates_size=*AllCandidates_size+1; > } > }// inner if > if ( x_index < sum_col-1 ) > x_index = x_index + 1; > else{ > x_index = 0; > y_index = y_index + 1; > } > } // outer if > element_counter +=1; > } > } Verilog is not my forte, but I think arrays are arrays. In the initial loop you are retrieving the data from Data[(i*y_ratio)>>16][(j*x_ratio)>>16]. The range of one index is 0 to IMAGE_HEIGHT-1 and the other is 0 to IMAGE_WIDTH-1. I can never recall which is the inner index and which is the outer, but the math involved in calculating the address of the data is simpler if the inner index range is a binary power. Is that the case? If not, you can achieve the simplification by declaring the inner index to have a range which is a binary power, but only use a subrange that you need. The cost is wasted memory, but it will improve performance and size because the address calculation will not require multiplication, but rather shifts which are done by mapping the index to the right address lines. -- Rick CArticle: 160091
> Looking into the code, I find the latency is mainly caused by the followi= ng for loops, but I don't know how to optimize the latency compromising bet= ween area. >=20 You're not meeting timing, which means you probably need to go look at the = schematic of the critical paths. How many levels are they? How are the mu= ltipliers being synthesized? Are they using DSP48s or fabric? As a rule, = the more abstract a synthesis tool is, the worse the synthesis results will= be. Also, where is "Data[]"? Is that a blockRAM? Or is it DRAM? If you're ac= cessing DRAM directly without a cache, you might have problems.Article: 160092
> Another important advantage of non-project mode is that it is fully compa= tible with source control systems. When you don't have projects, you don't = have piles of junk files of unknown purpose that changes every time you ope= n a project or run a simulation. In non-project mode you have only hdl sour= ces and tcl scripts. Therefore all information is stored in source control = system but when you commit changes you commit only changes you have done, n= ot random changes of unknown project files. >=20 > In this situation work with IP cores a bit trickier, but not much. Consid= ering that you don't change ip's very often, it's not a problem at all. >=20 > I see that very small number of hdl designers know and use this mode. May= be I should write an article about it. Where it would be appropriate to pub= lish it? I would like to know more about this. When I used ISE I only used scripts = (shell scripts) and when I transitioned to Vivado I promised I would use TC= L scripts but I've never done that and I'm still just using the GUI. I nee= d to use the GUI to look at schematics of critical paths or to look at plac= ement, but I'd like to use scripts to do all the PAR and timing and everyth= ing else.Article: 160093
On Tuesday, May 23, 2017 at 9:26:26 PM UTC+3, Kevin Neilson wrote: > > Another important advantage of non-project mode is that it is fully com= patible with source control systems. When you don't have projects, you don'= t have piles of junk files of unknown purpose that changes every time you o= pen a project or run a simulation. In non-project mode you have only hdl so= urces and tcl scripts. Therefore all information is stored in source contro= l system but when you commit changes you commit only changes you have done,= not random changes of unknown project files. > >=20 > > In this situation work with IP cores a bit trickier, but not much. Cons= idering that you don't change ip's very often, it's not a problem at all. > >=20 > > I see that very small number of hdl designers know and use this mode. M= aybe I should write an article about it. Where it would be appropriate to p= ublish it? >=20 > I would like to know more about this. When I used ISE I only used script= s (shell scripts) and when I transitioned to Vivado I promised I would use = TCL scripts but I've never done that and I'm still just using the GUI. I n= eed to use the GUI to look at schematics of critical paths or to look at pl= acement, but I'd like to use scripts to do all the PAR and timing and every= thing else. I am writing an article about that. I'll post it here. I examine timing reports in logs of Vivado, but if I have bad timings somew= here, I often use GUI as well. It's just easier to understand what part of = code creates bad timing if you investigate it visually. I just open vivado, do open_checkpoint post_place.cpt Then I examine schematics of the paths and and their placement. Non-project= mode doesn't prevent using GUI when you need it. They work fine together.Article: 160094
On 05/23/2017 11:21 AM, Kevin Neilson wrote: > As a rule, the more abstract a synthesis tool is, the worse the synthesis results will be. > Sorry, I just walked back from lunch through a small horde of web developers, and was trying to envision the looks on their faces as C was referred to as being much too high level. -- Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix.Article: 160095
Thank you for your reply. The timing can meet when I slow down the clk to be 100MHz. The mutipliers are synthesized by DSP48s automatically. And "Data[]" is saved by BlockRAM dircetly.Article: 160096
=E5=9C=A8 2017=E5=B9=B45=E6=9C=8823=E6=97=A5=E6=98=9F=E6=9C=9F=E4=BA=8C UTC= +8=E4=B8=8B=E5=8D=883:11:40=EF=BC=8Crickman=E5=86=99=E9=81=93=EF=BC=9A > yuning he wrote: > > Hello, here is my question: > > Purpose: realize face detection on zynq-7020 SoC > > Platform: Zedboard with OV5640 camera > > Completed work: capturing video from camera, writing into DDR for stora= ge and reading from DDR for display > > Question: how to realize a face detection IP and its throughput can rea= ch 30fps(pixel 320*240) > > > > Here are my jobs: > > Base on the Viola Jones algorithm, using HLS(high level synthesis) tool= to realize hardware IP from a C++ design > > And this is my reference: https://github.com/cornell-zhang/facedetect-f= pga > > > > I have simulate and synthesize it into hardware IP, but its throughput = does not reach the goal because the interval and latency are very large. (l= atency is 338(min) to 576593236(max), interval is 336(min) to 142310514002(= max)) > > Looking into the code, I find the latency is mainly caused by the follo= wing for loops, but I don't know how to optimize the latency compromising b= etween area. > > > > So you may help me a lot with these: > > 1.Another way to realize face detection on zynq-7020? > > 2.How to test the throughput of my system and the relation between real= throughput and the synthesis result? > > 3.Any way to optimize the following for loops? > > > > Looking forward to your reply. Please feel free to contact me at anytim= e. > > Thanks. > > > > ----loop1: > > imageScalerL1: for ( i =3D 0 ; i < IMAGE_HEIGHT ; i++ ){ > > imageScalerL1_1: for (j=3D0;j < IMAGE_WIDTH ;j++) { > > #pragma HLS pipeline > > if ( j < w2 && i < h2 ) > > IMG1_data[i][j] =3D Data[(i*y_ratio)>>16][(j*x_ratio)>>16]; > > } > > } > > ----loop2: > > Pixely: for( y =3D 0; y < sum_row; y++ ){ > > Pixelx: for ( x =3D 0; x < sum_col; x++ ){ > > /* Updates for Integral Image Window Buffer (I) */ > > SetIIu: for ( u =3D 0; u < WINDOW_SIZE; u++){ > > #pragma HLS unroll > > SetIIj: for ( v =3D 0; v < WINDOW_SIZE; v++ ){ > > #pragma HLS unroll > > II[u][v] =3D II[u][v] + ( I[u][v+1] - I[u][0] ); > > } > > } > > > > /* Updates for Square Image Window Buffer (SI) */ > > SII[0][0] =3D SII[0][0] + ( SI[0][1] - SI[0][0] ); > > SII[0][1] =3D SII[0][1] + ( SI[0][WINDOW_SIZE] - SI[0][0] ); > > SII[1][0] =3D SII[1][0] + ( SI[WINDOW_SIZE-1][1] - SI[WINDOW_SIZE= -1][0] ); > > SII[1][1] =3D SII[1][1] + ( SI[WINDOW_SIZE-1][WINDOW_SIZE] - SI[W= INDOW_SIZE-1][0] ); > > > > /* Updates for Image Window Buffer (I) and Square Image Window Bu= fer (SI) */ > > SetIj: for( j =3D 0; j < 2*WINDOW_SIZE-1; j++){ > > #pragma HLS unroll > > SetIi: for( i =3D 0; i < WINDOW_SIZE; i++ ){ > > #pragma HLS unroll > > if( i+j !=3D 2*WINDOW_SIZE-1 ){ > > I[i][j] =3D I[i][j+1]; > > SI[i][j] =3D SI[i][j+1]; > > } > > else if ( i > 0 ){ > > I[i][j] =3D I[i][j+1] + I[i-1][j+1]; > > SI[i][j] =3D SI[i][j+1] + SI[i-1][j+1]; > > } > > } > > } > > // Last column of the I[][] and SI[][] matrix > > Ilast: for( i =3D 0; i < WINDOW_SIZE-1; i++ ){ > > #pragma HLS unroll > > I[i][2*WINDOW_SIZE-1] =3D L[i][x]; > > SI[i][2*WINDOW_SIZE-1] =3D L[i][x]*L[i][x]; > > } > > I[WINDOW_SIZE-1][2*WINDOW_SIZE-1] =3D IMG1_data[y][x]; > > SI[WINDOW_SIZE-1][2*WINDOW_SIZE-1] =3D IMG1_data[y][x]*IMG1_data[= y][x]; > > > > /** Updates for Image Line Buffer (L) **/ > > LineBuf: for( k =3D 0; k < WINDOW_SIZE-2; k++ ){ > > #pragma HLS unroll > > L[k][x] =3D L[k+1][x]; > > } > > L[WINDOW_SIZE-2][x] =3D IMG1_data[y][x]; > > > > /* Pass the Integral Image Window buffer through Cascaded Classif= ier. Only pass > > * when the integral image window buffer has flushed out the init= ial garbage data */ > > if ( element_counter >=3D ( ( (WINDOW_SIZE-1)*sum_col + WINDOW_SI= ZE ) + WINDOW_SIZE -1 ) ) { > > > > /* Sliding Window should not go beyond the boundary */ > > if ( x_index < ( sum_col - (WINDOW_SIZE-1) ) && y_index < ( su= m_row - (WINDOW_SIZE-1) ) ){ > > p.x =3D x_index; > > p.y =3D y_index; > > > > result =3D cascadeClassifier ( p, II, SII ); > > > > if ( result > 0 ) > > { > > MyRect r =3D {myRound(p.x*factor), myRound(p.y*factor), wi= nSize.width, winSize.height}; > > AllCandidates_x[*AllCandidates_size]=3Dr.x; > > AllCandidates_y[*AllCandidates_size]=3Dr.y; > > AllCandidates_w[*AllCandidates_size]=3Dr.width; > > AllCandidates_h[*AllCandidates_size]=3Dr.height; > > *AllCandidates_size=3D*AllCandidates_size+1; > > } > > }// inner if > > if ( x_index < sum_col-1 ) > > x_index =3D x_index + 1; > > else{ > > x_index =3D 0; > > y_index =3D y_index + 1; > > } > > } // outer if > > element_counter +=3D1; > > } > > } >=20 > Verilog is not my forte, but I think arrays are arrays. In the initial= =20 > loop you are retrieving the data from=20 > Data[(i*y_ratio)>>16][(j*x_ratio)>>16]. The range of one index is 0 to= =20 > IMAGE_HEIGHT-1 and the other is 0 to IMAGE_WIDTH-1. I can never recall= =20 > which is the inner index and which is the outer, but the math involved=20 > in calculating the address of the data is simpler if the inner index=20 > range is a binary power. Is that the case? If not, you can achieve the= =20 > simplification by declaring the inner index to have a range which is a=20 > binary power, but only use a subrange that you need. The cost is wasted= =20 > memory, but it will improve performance and size because the address=20 > calculation will not require multiplication, but rather shifts which are= =20 > done by mapping the index to the right address lines. >=20 > --=20 >=20 > Rick C Thank you for your reply. Here IMAGE_HEIGHT equals to 240, and IMAGE_WIDTH equals to 320.According to= your advice, I can change the inner index of the array to be a binary powe= r to accelerate the address access. Is this right?Article: 160097
yuning he wrote: > 在 2017年5月23日星期二 UTC+8下午3:11:40,rickman写道: >> yuning he wrote: >>> Hello, here is my question: >>> Purpose: realize face detection on zynq-7020 SoC >>> Platform: Zedboard with OV5640 camera >>> Completed work: capturing video from camera, writing into DDR for storage and reading from DDR for display >>> Question: how to realize a face detection IP and its throughput can reach 30fps(pixel 320*240) >>> >>> Here are my jobs: >>> Base on the Viola Jones algorithm, using HLS(high level synthesis) tool to realize hardware IP from a C++ design >>> And this is my reference: https://github.com/cornell-zhang/facedetect-fpga >>> >>> I have simulate and synthesize it into hardware IP, but its throughput does not reach the goal because the interval and latency are very large. (latency is 338(min) to 576593236(max), interval is 336(min) to 142310514002(max)) >>> Looking into the code, I find the latency is mainly caused by the following for loops, but I don't know how to optimize the latency compromising between area. >>> >>> So you may help me a lot with these: >>> 1.Another way to realize face detection on zynq-7020? >>> 2.How to test the throughput of my system and the relation between real throughput and the synthesis result? >>> 3.Any way to optimize the following for loops? >>> >>> Looking forward to your reply. Please feel free to contact me at anytime. >>> Thanks. >>> >>> ----loop1: >>> imageScalerL1: for ( i = 0 ; i < IMAGE_HEIGHT ; i++ ){ >>> imageScalerL1_1: for (j=0;j < IMAGE_WIDTH ;j++) { >>> #pragma HLS pipeline >>> if ( j < w2 && i < h2 ) >>> IMG1_data[i][j] = Data[(i*y_ratio)>>16][(j*x_ratio)>>16]; >>> } >>> } >>> ----loop2: >>> Pixely: for( y = 0; y < sum_row; y++ ){ >>> Pixelx: for ( x = 0; x < sum_col; x++ ){ >>> /* Updates for Integral Image Window Buffer (I) */ >>> SetIIu: for ( u = 0; u < WINDOW_SIZE; u++){ >>> #pragma HLS unroll >>> SetIIj: for ( v = 0; v < WINDOW_SIZE; v++ ){ >>> #pragma HLS unroll >>> II[u][v] = II[u][v] + ( I[u][v+1] - I[u][0] ); >>> } >>> } >>> >>> /* Updates for Square Image Window Buffer (SI) */ >>> SII[0][0] = SII[0][0] + ( SI[0][1] - SI[0][0] ); >>> SII[0][1] = SII[0][1] + ( SI[0][WINDOW_SIZE] - SI[0][0] ); >>> SII[1][0] = SII[1][0] + ( SI[WINDOW_SIZE-1][1] - SI[WINDOW_SIZE-1][0] ); >>> SII[1][1] = SII[1][1] + ( SI[WINDOW_SIZE-1][WINDOW_SIZE] - SI[WINDOW_SIZE-1][0] ); >>> >>> /* Updates for Image Window Buffer (I) and Square Image Window Bufer (SI) */ >>> SetIj: for( j = 0; j < 2*WINDOW_SIZE-1; j++){ >>> #pragma HLS unroll >>> SetIi: for( i = 0; i < WINDOW_SIZE; i++ ){ >>> #pragma HLS unroll >>> if( i+j != 2*WINDOW_SIZE-1 ){ >>> I[i][j] = I[i][j+1]; >>> SI[i][j] = SI[i][j+1]; >>> } >>> else if ( i > 0 ){ >>> I[i][j] = I[i][j+1] + I[i-1][j+1]; >>> SI[i][j] = SI[i][j+1] + SI[i-1][j+1]; >>> } >>> } >>> } >>> // Last column of the I[][] and SI[][] matrix >>> Ilast: for( i = 0; i < WINDOW_SIZE-1; i++ ){ >>> #pragma HLS unroll >>> I[i][2*WINDOW_SIZE-1] = L[i][x]; >>> SI[i][2*WINDOW_SIZE-1] = L[i][x]*L[i][x]; >>> } >>> I[WINDOW_SIZE-1][2*WINDOW_SIZE-1] = IMG1_data[y][x]; >>> SI[WINDOW_SIZE-1][2*WINDOW_SIZE-1] = IMG1_data[y][x]*IMG1_data[y][x]; >>> >>> /** Updates for Image Line Buffer (L) **/ >>> LineBuf: for( k = 0; k < WINDOW_SIZE-2; k++ ){ >>> #pragma HLS unroll >>> L[k][x] = L[k+1][x]; >>> } >>> L[WINDOW_SIZE-2][x] = IMG1_data[y][x]; >>> >>> /* Pass the Integral Image Window buffer through Cascaded Classifier. Only pass >>> * when the integral image window buffer has flushed out the initial garbage data */ >>> if ( element_counter >= ( ( (WINDOW_SIZE-1)*sum_col + WINDOW_SIZE ) + WINDOW_SIZE -1 ) ) { >>> >>> /* Sliding Window should not go beyond the boundary */ >>> if ( x_index < ( sum_col - (WINDOW_SIZE-1) ) && y_index < ( sum_row - (WINDOW_SIZE-1) ) ){ >>> p.x = x_index; >>> p.y = y_index; >>> >>> result = cascadeClassifier ( p, II, SII ); >>> >>> if ( result > 0 ) >>> { >>> MyRect r = {myRound(p.x*factor), myRound(p.y*factor), winSize.width, winSize.height}; >>> AllCandidates_x[*AllCandidates_size]=r.x; >>> AllCandidates_y[*AllCandidates_size]=r.y; >>> AllCandidates_w[*AllCandidates_size]=r.width; >>> AllCandidates_h[*AllCandidates_size]=r.height; >>> *AllCandidates_size=*AllCandidates_size+1; >>> } >>> }// inner if >>> if ( x_index < sum_col-1 ) >>> x_index = x_index + 1; >>> else{ >>> x_index = 0; >>> y_index = y_index + 1; >>> } >>> } // outer if >>> element_counter +=1; >>> } >>> } >> >> Verilog is not my forte, but I think arrays are arrays. In the initial >> loop you are retrieving the data from >> Data[(i*y_ratio)>>16][(j*x_ratio)>>16]. The range of one index is 0 to >> IMAGE_HEIGHT-1 and the other is 0 to IMAGE_WIDTH-1. I can never recall >> which is the inner index and which is the outer, but the math involved >> in calculating the address of the data is simpler if the inner index >> range is a binary power. Is that the case? If not, you can achieve the >> simplification by declaring the inner index to have a range which is a >> binary power, but only use a subrange that you need. The cost is wasted >> memory, but it will improve performance and size because the address >> calculation will not require multiplication, but rather shifts which are >> done by mapping the index to the right address lines. >> >> -- >> >> Rick C > > Thank you for your reply. > Here IMAGE_HEIGHT equals to 240, and IMAGE_WIDTH equals to 320.According to your advice, I can change the inner index of the array to be a binary power to accelerate the address access. Is this right? I believe so. To minimize the waste of memory, I would make the 240 the inner index with a range of 256. Then the multiplication becomes a matter of shifting the outer index by 8 bits and adding to the inner index. I don't know for sure, but the tools should figure this out automatically. Keep your loop range as 0 to 239 and everything will still work as you expect skipping over 16 array values at each increment of the outer index. You will need to be consistent in all accesses to the memory. -- Rick CArticle: 160098
i have spartan6 atlys(LX45) board, can anyone suggest me how to interface zigbee to this board to communicate with pc.thnxArticle: 160099
On 05/24/2017 10:16 PM, srirameee09@gmail.com wrote: > i have spartan6 atlys(LX45) board, can anyone suggest me how to interface zigbee to this board to communicate with pc.thnx > Buy a Zigbee module and implement whatever physical layer it needs. Freescale/NXP probably offers some. Texas Instruments may also. Any particular reason for using Zigbee? Bluetooth would probably be easier to find modules for and probably has better support on the PC end. BobH
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z