Messages from 160075

Article: 160075
Subject: Re: Test Driven Design?
From: rickman <gnuarm@gmail.com>
Date: Thu, 18 May 2017 20:42:01 -0400
Links: << >> << T >> << A >>

On 5/18/2017 6:10 PM, Tom Gardner wrote:
> On 18/05/17 18:05, rickman wrote:
>> On 5/18/2017 12:08 PM, lasselangwadtchristensen@gmail.com wrote:
>>> Den torsdag den 18. maj 2017 kl. 15.48.19 UTC+2 skrev Theo Markettos:
>>>> Tim Wescott <tim@seemywebsite.really> wrote:
>>>>> So, you have two separate implementations of the system -- how do you
>>>>> know that they aren't both identically buggy?
>>>>
>>>> Is that the problem with any testing framework?
>>>> Quis custodiet ipsos custodes?
>>>> Who tests the tests?
>>>
>>> the test?
>>>
>>> if two different implementations agree, it adds a bit more confidence
>>> that an
>>> implementation agreeing with itself.
>>
>> The point is if both designs were built with the same misunderstanding
>> of the
>> requirements, they could both be wrong.  While not common, this is not
>> unheard
>> of.  It could be caused by cultural biases (each company is a culture)
>> or a
>> poorly written specification.
>
> The prior question is whether the specification is correct.
>
> Or more realistically, to what extent it is/isn't correct,
> and the best set of techniques and processes for reducing
> the imperfection.
>
> And that leads to XP/Agile concepts, to deal with the suboptimal
> aspects of Waterfall Development.
>
> Unfortunately the zealots can't accept that what you gain
> on the swings you lose on the roundabouts.

I'm sure you know exactly what you meant.  :)

-- 

Rick C

Article: 160076
Subject: Re: Test Driven Design?
From: rickman <gnuarm@gmail.com>
Date: Thu, 18 May 2017 20:53:51 -0400
Links: << >> << T >> << A >>

On 5/18/2017 6:06 PM, Tom Gardner wrote:
> On 18/05/17 18:01, rickman wrote:
>> On 5/18/2017 12:14 PM, Tom Gardner wrote:
>>
>>> My preference is anything that avoids deeply nested
>>> if/the/else/switch statements, since they rapidly
>>> become a maintenance nightmare. (I've seen nesting
>>> 10 deep!).
>>
>> Such deep layering likely indicates a poor problem decomposition, but
>> it is hard
>> to say without looking at the code.
>
> It was a combination of technical and personnel factors.
> The overriding business imperative was, at each stage,
> to make the smallest and /incrementally/ cheapest modification.
>
> The road to hell is paved with good intentions.

If we are bandying about platitudes I will say, penny wise, pound foolish.


>> Normally there is a switch for the state variable and conditionals
>> within each
>> case to evaluate inputs.  Typically this is not so complex.
>
> This was an inherently complex task that was ineptly
> implemented. I'm not going to define how ineptly,
> because you wouldn't believe it. I only believe it
> because I saw it, and boggled.

Good design is about simplifying the complex.  Ineptitude is a separate 
issue and can ruin even simple designs.


>>> Also, design patterns that enable logging of events
>>> and states should be encouraged and left in the code
>>> at runtime. I've found them /excellent/ techniques for
>>> correctly deflecting blame onto the other party :)
>>>
>>> Should you design in a proper FSM style/language
>>> and autogenerate the executable source code, or code
>>> directly in the source language? Difficult, but there
>>> are very useful OOP design patterns that make it easy.
>>
>> Designing in anything other than the HDL you are using increases the
>> complexity
>> of backing up your tools.  In addition to source code, it can be
>> important to be
>> able to restore the development environment.  I don't bother with FSM
>> tools
>> other than tools that help me think.
>
> Very true. I use that argument, and more, to caution
> people against inventing Domain Specific Languages
> when they should be inventing Domain Specific Libraries.
>
> Guess which happened in the case I alluded to above.

An exception to that rule is programming in Forth.  It is a language 
where programming *is* extending the language.  There are many 
situations where the process ends up with programs written what appears 
to be a domain specific language, but working quite well.  So don't 
throw the baby out with the bath when trying to save designers from 
themselves.


>>> And w.r.t. TDD, should your tests demonstrate the
>>> FSM's design is correct or that the implementation
>>> artefacts are correct?
>>
>> I'll have to say that is a new term to me, "implementation
>> artefacts[sic]".  Can
>> you explain?
>
> Nothing non-obvious. An implementation artefact is
> something that is part of /a/ specific design implementation,
> as opposed to something that is an inherent part of
> /the/ problem.

Why would I want to test design artifacts?  The tests in TDD are 
developed from the requirements, not the design, right?


>> I test behavior.  Behavior is what is specified for a design, so why
>> would you
>> test anything else?
>
> Clearly you haven't practiced XP/Agile/Lean development
> practices.
>
> You sound like a 20th century hardware engineer, rather
> than a 21st century software "engineer". You must learn
> to accept that all new things are, in every way, better
> than the old ways.
>
> Excuse me while I go and wash my mouth out with soap.

Lol

-- 

Rick C

Article: 160077
Subject: Re: Test Driven Design?
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Fri, 19 May 2017 09:59:55 +0100
Links: << >> << T >> << A >>

On 19/05/17 01:53, rickman wrote:
> On 5/18/2017 6:06 PM, Tom Gardner wrote:
>> On 18/05/17 18:01, rickman wrote:
>>> On 5/18/2017 12:14 PM, Tom Gardner wrote:
>>>> Also, design patterns that enable logging of events
>>>> and states should be encouraged and left in the code
>>>> at runtime. I've found them /excellent/ techniques for
>>>> correctly deflecting blame onto the other party :)
>>>>
>>>> Should you design in a proper FSM style/language
>>>> and autogenerate the executable source code, or code
>>>> directly in the source language? Difficult, but there
>>>> are very useful OOP design patterns that make it easy.
>>>
>>> Designing in anything other than the HDL you are using increases the
>>> complexity
>>> of backing up your tools.  In addition to source code, it can be
>>> important to be
>>> able to restore the development environment.  I don't bother with FSM
>>> tools
>>> other than tools that help me think.
>>
>> Very true. I use that argument, and more, to caution
>> people against inventing Domain Specific Languages
>> when they should be inventing Domain Specific Libraries.
>>
>> Guess which happened in the case I alluded to above.
>
> An exception to that rule is programming in Forth.  It is a language where
> programming *is* extending the language.  There are many situations where the
> process ends up with programs written what appears to be a domain specific
> language, but working quite well.  So don't throw the baby out with the bath
> when trying to save designers from themselves.

I see why you are saying that, but I disagree. The
Forth /language/ is pleasantly simple. The myriad
Forth words (e.g. cmove, catch, canonical etc) in most
Forth environments are part of the "standard library",
not the language per se.

Forth words are more-or-less equivalent to functions
in a trad language. Defining new words is therefore
like defining a new function.

Just as defining new words "looks like" defining
a DSL, so - at the "application level" - defining
new functions also looks like defining a new DSL.

Most importantly, both new functions and new words
automatically have the invaluable tools support without
having to do anything. With a new DSL, all the tools
(from parsers to browsers) also have to be built.

>>>> And w.r.t. TDD, should your tests demonstrate the
>>>> FSM's design is correct or that the implementation
>>>> artefacts are correct?
>>>
>>> I'll have to say that is a new term to me, "implementation
>>> artefacts[sic]".  Can
>>> you explain?
>>
>> Nothing non-obvious. An implementation artefact is
>> something that is part of /a/ specific design implementation,
>> as opposed to something that is an inherent part of
>> /the/ problem.
>
> Why would I want to test design artifacts?  The tests in TDD are developed from
> the requirements, not the design, right?

Ideally, but only to some extent. TDD frequently used
at a much lower level, where it is usually divorced
from specs.

TDD is also frequently used with - and implemented in
the form of - unit tests, which are definitely divorced
from the spec.

Hence, in the real world, there is bountiful opportunity
for diversion from the obvious pure sane course. And
Murphy's Law definitely applies.

Having said that, both TDD and Unit Testing are valuable
additions to a the designer's toolchest. But they must
be used intelligently[1], and are merely codifications of
things most of us have been doing for decades.

No change there, then.

[1] be careful of external consultants proselytising
the teaching courses they are selling. They have a
hammer, and everything /does/ look like a nail.

Article: 160078
Subject: Re: Test Driven Design?
From: rickman <gnuarm@gmail.com>
Date: Fri, 19 May 2017 13:25:11 -0400
Links: << >> << T >> << A >>

On 5/19/2017 4:59 AM, Tom Gardner wrote:
> On 19/05/17 01:53, rickman wrote:
>> On 5/18/2017 6:06 PM, Tom Gardner wrote:
>>> On 18/05/17 18:01, rickman wrote:
>>>> On 5/18/2017 12:14 PM, Tom Gardner wrote:
>>>>> Also, design patterns that enable logging of events
>>>>> and states should be encouraged and left in the code
>>>>> at runtime. I've found them /excellent/ techniques for
>>>>> correctly deflecting blame onto the other party :)
>>>>>
>>>>> Should you design in a proper FSM style/language
>>>>> and autogenerate the executable source code, or code
>>>>> directly in the source language? Difficult, but there
>>>>> are very useful OOP design patterns that make it easy.
>>>>
>>>> Designing in anything other than the HDL you are using increases the
>>>> complexity
>>>> of backing up your tools.  In addition to source code, it can be
>>>> important to be
>>>> able to restore the development environment.  I don't bother with FSM
>>>> tools
>>>> other than tools that help me think.
>>>
>>> Very true. I use that argument, and more, to caution
>>> people against inventing Domain Specific Languages
>>> when they should be inventing Domain Specific Libraries.
>>>
>>> Guess which happened in the case I alluded to above.
>>
>> An exception to that rule is programming in Forth.  It is a language
>> where
>> programming *is* extending the language.  There are many situations
>> where the
>> process ends up with programs written what appears to be a domain
>> specific
>> language, but working quite well.  So don't throw the baby out with
>> the bath
>> when trying to save designers from themselves.
>
> I see why you are saying that, but I disagree. The
> Forth /language/ is pleasantly simple. The myriad
> Forth words (e.g. cmove, catch, canonical etc) in most
> Forth environments are part of the "standard library",
> not the language per se.
>
> Forth words are more-or-less equivalent to functions
> in a trad language. Defining new words is therefore
> like defining a new function.

I can't find a definition for "trad language".


> Just as defining new words "looks like" defining
> a DSL, so - at the "application level" - defining
> new functions also looks like defining a new DSL.
>
> Most importantly, both new functions and new words
> automatically have the invaluable tools support without
> having to do anything. With a new DSL, all the tools
> (from parsers to browsers) also have to be built.

I have no idea what distinction you are trying to make.  Why is making 
new tools a necessary part of defining a domain specific language?

If it walks like a duck...

FRONT LED ON TURN

That could be the domain specific language under Forth for turning on 
the front LED of some device.  Sure looks like a language to me.

I have considered writing a parser for a type of XML file simply by 
defining the syntax as Forth words.  So rather than "process" the file 
with an application program, the Forth compiler would "compile" the 
file.  I'd call that a domain specific language.


>>>>> And w.r.t. TDD, should your tests demonstrate the
>>>>> FSM's design is correct or that the implementation
>>>>> artefacts are correct?
>>>>
>>>> I'll have to say that is a new term to me, "implementation
>>>> artefacts[sic]".  Can
>>>> you explain?
>>>
>>> Nothing non-obvious. An implementation artefact is
>>> something that is part of /a/ specific design implementation,
>>> as opposed to something that is an inherent part of
>>> /the/ problem.
>>
>> Why would I want to test design artifacts?  The tests in TDD are
>> developed from
>> the requirements, not the design, right?
>
> Ideally, but only to some extent. TDD frequently used
> at a much lower level, where it is usually divorced
> from specs.

There is a failure in the specification process.  The projects I have 
worked on which required a formal requirements development process 
applied it to every level.  So every piece of code that would be tested 
had requirements which defined the tests.


> TDD is also frequently used with - and implemented in
> the form of - unit tests, which are definitely divorced
> from the spec.

They are?  How then are the tests generated?


> Hence, in the real world, there is bountiful opportunity
> for diversion from the obvious pure sane course. And
> Murphy's Law definitely applies.
>
> Having said that, both TDD and Unit Testing are valuable
> additions to a the designer's toolchest. But they must
> be used intelligently[1], and are merely codifications of
> things most of us have been doing for decades.
>
> No change there, then.
>
> [1] be careful of external consultants proselytising
> the teaching courses they are selling. They have a
> hammer, and everything /does/ look like a nail.


-- 

Rick C

Article: 160079
Subject: Re: Test Driven Design?
From: BobH <wanderingmetalhead.nospam.please@yahoo.com>
Date: Fri, 19 May 2017 11:47:24 -0700
Links: << >> << T >> << A >>

On 05/17/2017 11:33 AM, Tim Wescott wrote:
snip
> 
> It's basically a bit of structure on top of some common-sense
> methodologies (i.e., design from the top down, then code from the bottom
> up, and test the hell out of each bit as you code it).
> 

Other than occasional test fixtures, most of my FPGA work in recent 
years has been FPGA verification of the digital sections of mixed signal 
ASICs. Your description sounds exactly like the methodology used on both 
the product ASIC side and the verification FPGA side. After the FPGA is 
built and working, you test the hell out of the FPGA system and the 
product ASIC with completely separate tools and techniques. When 
problems are discovered, you often fall back to either the ASIC or FPGA 
simulation test benches to isolate the issue.

The importance of good, detailed, self checking, top level test benches 
cannot be over-stressed. For mid and low level blocks that are complex 
or likely to see significant iterations (due to design spec changes) 
self checking test benches are worth the effort. My experience with 
manual checking test benches is that the first time you go through it, 
you remember to examine all the important spots, the thoroughness of the 
manual checking on subsequent runs falls off fast. Giving a manual check 
test bench to someone else, is a waste of both of your time.

BobH

Article: 160080
Subject: Re: Test Driven Design?
From: Ilya Kalistru <stebanoid@gmail.com>
Date: Fri, 19 May 2017 15:31:10 -0700 (PDT)
Links: << >> << T >> << A >>

I've solved the problem with setting up a new project for each testbench by=
 not using any projects. Vivado has a non project mode when you write a sim=
ple tcl script which tells vivado what sources to use and what to do with t=
hem.

I have a source directory with hdl files in our repository and dozens of sc=
ripts.Each script takes sources from the same directory and creates its own=
 temp working directory and runs its test there. I also have a script which=
 runs all the tests at once without GUI. I run it right before coming home.=
 When I come at work in the next morning I run a script which analyses repo=
rts looking for errors. If there is an error somewhere, I run the correspon=
ding test script with GUI switched on to look at waveforms.

Non-project mode not only allows me to run different tests simultaneously f=
or the same sources, but also allows me to run multiple synthesis for them.=
=20

I use only this mode for more then 2 years and absolutely happy with that. =
Highly recommend!

Article: 160081
Subject: Re: Test Driven Design?
From: rickman <gnuarm@gmail.com>
Date: Fri, 19 May 2017 18:57:22 -0400
Links: << >> << T >> << A >>

On 5/19/2017 6:31 PM, Ilya Kalistru wrote:
> I've solved the problem with setting up a new project for each testbench by not using any projects. Vivado has a non project mode when you write a simple tcl script which tells vivado what sources to use and what to do with them.
>
> I have a source directory with hdl files in our repository and dozens of scripts.Each script takes sources from the same directory and creates its own temp working directory and runs its test there. I also have a script which runs all the tests at once without GUI. I run it right before coming home. When I come at work in the next morning I run a script which analyses reports looking for errors. If there is an error somewhere, I run the corresponding test script with GUI switched on to look at waveforms.
>
> Non-project mode not only allows me to run different tests simultaneously for the same sources, but also allows me to run multiple synthesis for them.
>
> I use only this mode for more then 2 years and absolutely happy with that. Highly recommend!

Interesting.  Vivado is what, Xilinx?

-- 

Rick C

Article: 160082
Subject: Re: Test Driven Design?
From: lasselangwadtchristensen@gmail.com
Date: Fri, 19 May 2017 16:37:37 -0700 (PDT)
Links: << >> << T >> << A >>

Den l=C3=B8rdag den 20. maj 2017 kl. 00.57.24 UTC+2 skrev rickman:
> On 5/19/2017 6:31 PM, Ilya Kalistru wrote:
> > I've solved the problem with setting up a new project for each testbenc=
h by not using any projects. Vivado has a non project mode when you write a=
 simple tcl script which tells vivado what sources to use and what to do wi=
th them.
> >
> > I have a source directory with hdl files in our repository and dozens o=
f scripts.Each script takes sources from the same directory and creates its=
 own temp working directory and runs its test there. I also have a script w=
hich runs all the tests at once without GUI. I run it right before coming h=
ome. When I come at work in the next morning I run a script which analyses =
reports looking for errors. If there is an error somewhere, I run the corre=
sponding test script with GUI switched on to look at waveforms.
> >
> > Non-project mode not only allows me to run different tests simultaneous=
ly for the same sources, but also allows me to run multiple synthesis for t=
hem.
> >
> > I use only this mode for more then 2 years and absolutely happy with th=
at. Highly recommend!
>=20
> Interesting.  Vivado is what, Xilinx?

yes=20

Article: 160083
Subject: Re: Test Driven Design?
From: Ilya Kalistru <stebanoid@gmail.com>
Date: Sat, 20 May 2017 00:11:22 -0700 (PDT)
Links: << >> << T >> << A >>

Yes. It is xilinx vivado.

Another important advantage of non-project mode is that it is fully compati=
ble with source control systems. When you don't have projects, you don't ha=
ve piles of junk files of unknown purpose that changes every time you open =
a project or run a simulation. In non-project mode you have only hdl source=
s and tcl scripts. Therefore all information is stored in source control sy=
stem but when you commit changes you commit only changes you have done, not=
 random changes of unknown project files.

In this situation work with IP cores a bit trickier, but not much. Consider=
ing that you don't change ip's very often, it's not a problem at all.

I see that very small number of hdl designers know and use this mode. Maybe=
 I should write an article about it. Where it would be appropriate to publi=
sh it?

Article: 160084
Subject: Re: Test Driven Design?
From: rickman <gnuarm@gmail.com>
Date: Sat, 20 May 2017 04:17:16 -0400
Links: << >> << T >> << A >>

On 5/20/2017 3:11 AM, Ilya Kalistru wrote:
> Yes. It is xilinx vivado.
>
> Another important advantage of non-project mode is that it is fully compatible with source control systems. When you don't have projects, you don't have piles of junk files of unknown purpose that changes every time you open a project or run a simulation. In non-project mode you have only hdl sources and tcl scripts. Therefore all information is stored in source control system but when you commit changes you commit only changes you have done, not random changes of unknown project files.
>
> In this situation work with IP cores a bit trickier, but not much. Considering that you don't change ip's very often, it's not a problem at all.
>
> I see that very small number of hdl designers know and use this mode. Maybe I should write an article about it. Where it would be appropriate to publish it?

Doesn't the tool still generate all the intermediate files?  The Lattice 
tool (which uses Synplify for synthesis) creates a huge number of files 
that only the tools look at.  They aren't really project files, they are 
various intermediate files.  Living in the project main directory they 
really get in the way.

-- 

Rick C

Article: 160085
Subject: Re: Test Driven Design?
From: Tom Gardner <spamjunk@blueyonder.co.uk>
Date: Sat, 20 May 2017 11:33:12 +0100
Links: << >> << T >> << A >>

On 20/05/17 08:11, Ilya Kalistru wrote:
> Yes. It is xilinx vivado.
>
> Another important advantage of non-project mode is that it is fully compatible with source control systems. When you don't have projects, you don't have piles of junk files of unknown purpose that changes every time you open a project or run a simulation. In non-project mode you have only hdl sources and tcl scripts. Therefore all information is stored in source control system but when you commit changes you commit only changes you have done, not random changes of unknown project files.
>
> In this situation work with IP cores a bit trickier, but not much. Considering that you don't change ip's very often, it's not a problem at all.
>
> I see that very small number of hdl designers know and use this mode. Maybe I should write an article about it. Where it would be appropriate to publish it?

That would be useful; the project mode is initially appealing,
but the splattered files and SCCS give me the jitters.

Publish it everywhere! Any blog and bulletin board you can find,
not limited to those dedicated to Xilinx.

Article: 160086
Subject: Re: Test Driven Design?
From: Theo Markettos <theom+news@chiark.greenend.org.uk>
Date: 20 May 2017 15:46:49 +0100 (BST)
Links: << >> << T >> << A >>

Ilya Kalistru <stebanoid@gmail.com> wrote:
> I've solved the problem with setting up a new project for each testbench
> by not using any projects.  Vivado has a non project mode when you write a
> simple tcl script which tells vivado what sources to use and what to do
> with them.

Something similar is possible with Intel FPGA (Altera) Quartus.
You need one tcl file for settings, and building is a few commands which we
run from a Makefile.

All our builds run in continuous integration, which extracts logs and
timing/area numbers.  The bitfiles then get downloaded and booted on FPGA,
then the test suite and benchmarks are run automatically to monitor
performance.  Numbers then come back to continuous integration for graphing.

Theo

Article: 160087
Subject: Re: Test Driven Design?
From: Ilya Kalistru <stebanoid@gmail.com>
Date: Sat, 20 May 2017 09:48:46 -0700 (PDT)
Links: << >> << T >> << A >>

On Saturday, May 20, 2017 at 11:17:19 AM UTC+3, rickman wrote:
> On 5/20/2017 3:11 AM, Ilya Kalistru wrote:
> > Yes. It is xilinx vivado.
> >
> > Another important advantage of non-project mode is that it is fully com=
patible with source control systems. When you don't have projects, you don'=
t have piles of junk files of unknown purpose that changes every time you o=
pen a project or run a simulation. In non-project mode you have only hdl so=
urces and tcl scripts. Therefore all information is stored in source contro=
l system but when you commit changes you commit only changes you have done,=
 not random changes of unknown project files.
> >
> > In this situation work with IP cores a bit trickier, but not much. Cons=
idering that you don't change ip's very often, it's not a problem at all.
> >
> > I see that very small number of hdl designers know and use this mode. M=
aybe I should write an article about it. Where it would be appropriate to p=
ublish it?
>=20
> Doesn't the tool still generate all the intermediate files?  The Lattice=
=20
> tool (which uses Synplify for synthesis) creates a huge number of files=
=20
> that only the tools look at.  They aren't really project files, they are=
=20
> various intermediate files.  Living in the project main directory they=20
> really get in the way.
>=20
> --=20
>=20
> Rick C

It does. You can tell the tool where to generate these files and I do it in=
 a special directory. It is easy to delete them, and you don't have to add =
them to your source control system. As all your important stuff is in src d=
ir and all your junk is in sim_* dirs it is easy to manage them.

That's what I have in my repository

Project_name
\sim_Test1NameDir
 *sim_test2NameDir
 *sim_test3NameDir
 |\
 | *sim_report.log
 | *other_junk
 *synth_Module1Dir
 *synth_Module2Dir
 |\
 | *Results
 | | \
 | | *Reports
 | | *bitfiles
 | *Some_junk
 *src
 |\
 | *DesignChunk1SrcDir
 | *DesignChunk2SrcDir
 *sim_test1.tcl
 *sim_test2.tcl
 *sim_test3.tcl
 *synth_Module1.tcl
 *synth_Module2.tcl

Article: 160088
Subject: Re: Test Driven Design?
From: Ilya Kalistru <stebanoid@gmail.com>
Date: Sat, 20 May 2017 09:53:32 -0700 (PDT)
Links: << >> << T >> << A >>


> All our builds run in continuous integration, which extracts logs and
> timing/area numbers.  The bitfiles then get downloaded and booted on FPGA,
> then the test suite and benchmarks are run automatically to monitor
> performance.  Numbers then come back to continuous integration for graphing.
> 
> Theo

Nice!

Article: 160089
Subject: Accelerating Face Detection on Zynq-7020 Using High Level Synthesis
From: yuning he <heyuning20@gmail.com>
Date: Mon, 22 May 2017 23:44:07 -0700 (PDT)
Links: << >> << T >> << A >>

Hello, here is my question:
Purpose: realize face detection on zynq-7020 SoC
Platform: Zedboard with OV5640 camera
Completed work: capturing video from camera, writing into DDR for storage and reading from DDR for display
Question: how to realize a face detection IP and its throughput can reach 30fps(pixel 320*240) 

Here are my jobs:
Base on the Viola Jones algorithm, using HLS(high level synthesis) tool to realize hardware IP from a C++ design
And this is my reference: https://github.com/cornell-zhang/facedetect-fpga

I have simulate and synthesize it into hardware IP, but its throughput does not reach the goal because the interval and latency are very large. (latency is 338(min) to 576593236(max), interval is 336(min) to 142310514002(max))
Looking into the code, I find the latency is mainly caused by the following for loops, but I don't know how to optimize the latency compromising between area.

So you may help me a lot with these:
1.Another way to realize face detection on zynq-7020?
2.How to test the throughput of my system and the relation between real throughput and the synthesis result?
3.Any way to optimize the following for loops?

Looking forward to your reply. Please feel free to contact me at anytime.
Thanks.

----loop1:
imageScalerL1: for ( i = 0 ; i < IMAGE_HEIGHT ; i++ ){ 
    imageScalerL1_1: for (j=0;j < IMAGE_WIDTH ;j++) {
      #pragma HLS pipeline
      if ( j < w2 && i < h2 ) 
        IMG1_data[i][j] =  Data[(i*y_ratio)>>16][(j*x_ratio)>>16];
    }
  }
----loop2:
Pixely: for( y = 0; y < sum_row; y++ ){
    Pixelx: for ( x = 0; x < sum_col; x++ ){
      /* Updates for Integral Image Window Buffer (I) */
      SetIIu: for ( u = 0; u < WINDOW_SIZE; u++){
      #pragma HLS unroll
        SetIIj: for ( v = 0; v < WINDOW_SIZE; v++ ){
        #pragma HLS unroll
          II[u][v] = II[u][v] + ( I[u][v+1] - I[u][0] );
        }
      }
      
      /* Updates for Square Image Window Buffer (SI) */
      SII[0][0] = SII[0][0] + ( SI[0][1] - SI[0][0] );
      SII[0][1] = SII[0][1] + ( SI[0][WINDOW_SIZE] - SI[0][0] );
      SII[1][0] = SII[1][0] + ( SI[WINDOW_SIZE-1][1] - SI[WINDOW_SIZE-1][0] );
      SII[1][1] = SII[1][1] + ( SI[WINDOW_SIZE-1][WINDOW_SIZE] - SI[WINDOW_SIZE-1][0] );
      
      /* Updates for Image Window Buffer (I) and Square Image Window Bufer (SI) */
      SetIj: for( j = 0; j < 2*WINDOW_SIZE-1; j++){
      #pragma HLS unroll
        SetIi: for( i = 0; i < WINDOW_SIZE; i++ ){
        #pragma HLS unroll
          if( i+j != 2*WINDOW_SIZE-1 ){
            I[i][j] = I[i][j+1];
            SI[i][j] = SI[i][j+1];
          }
          else if ( i > 0 ){
            I[i][j] = I[i][j+1] + I[i-1][j+1];
            SI[i][j] = SI[i][j+1] + SI[i-1][j+1];
          }
        }
      }
      // Last column of the I[][] and SI[][] matrix 
      Ilast: for( i = 0; i < WINDOW_SIZE-1; i++ ){
      #pragma HLS unroll
        I[i][2*WINDOW_SIZE-1] = L[i][x];
        SI[i][2*WINDOW_SIZE-1] = L[i][x]*L[i][x];
      }
      I[WINDOW_SIZE-1][2*WINDOW_SIZE-1] = IMG1_data[y][x];
      SI[WINDOW_SIZE-1][2*WINDOW_SIZE-1] = IMG1_data[y][x]*IMG1_data[y][x];

      /** Updates for Image Line Buffer (L) **/
      LineBuf: for( k = 0; k < WINDOW_SIZE-2; k++ ){
      #pragma HLS unroll
        L[k][x] = L[k+1][x];
      }
      L[WINDOW_SIZE-2][x] = IMG1_data[y][x];

      /* Pass the Integral Image Window buffer through Cascaded Classifier. Only pass
       * when the integral image window buffer has flushed out the initial garbage data */             
      if ( element_counter >= ( ( (WINDOW_SIZE-1)*sum_col + WINDOW_SIZE ) + WINDOW_SIZE -1 ) ) {

         /* Sliding Window should not go beyond the boundary */
         if ( x_index < ( sum_col - (WINDOW_SIZE-1) ) && y_index < ( sum_row - (WINDOW_SIZE-1) ) ){
            p.x = x_index;
            p.y = y_index;
            
            result = cascadeClassifier ( p, II, SII );

           if ( result > 0 )
           {
             MyRect r = {myRound(p.x*factor), myRound(p.y*factor), winSize.width, winSize.height};
             AllCandidates_x[*AllCandidates_size]=r.x;
             AllCandidates_y[*AllCandidates_size]=r.y;
             AllCandidates_w[*AllCandidates_size]=r.width;
             AllCandidates_h[*AllCandidates_size]=r.height;
            *AllCandidates_size=*AllCandidates_size+1;
           }
         }// inner if
         if ( x_index < sum_col-1 )
             x_index = x_index + 1;
         else{ 
             x_index = 0;
             y_index = y_index + 1;
         }
      } // outer if
      element_counter +=1;
    } 
  }

Article: 160090
Subject: Re: Accelerating Face Detection on Zynq-7020 Using High Level
From: rickman <gnuarm@gmail.com>
Date: Tue, 23 May 2017 03:11:33 -0400
Links: << >> << T >> << A >>

yuning he wrote:
> Hello, here is my question:
> Purpose: realize face detection on zynq-7020 SoC
> Platform: Zedboard with OV5640 camera
> Completed work: capturing video from camera, writing into DDR for storage and reading from DDR for display
> Question: how to realize a face detection IP and its throughput can reach 30fps(pixel 320*240)
>
> Here are my jobs:
> Base on the Viola Jones algorithm, using HLS(high level synthesis) tool to realize hardware IP from a C++ design
> And this is my reference: https://github.com/cornell-zhang/facedetect-fpga
>
> I have simulate and synthesize it into hardware IP, but its throughput does not reach the goal because the interval and latency are very large. (latency is 338(min) to 576593236(max), interval is 336(min) to 142310514002(max))
> Looking into the code, I find the latency is mainly caused by the following for loops, but I don't know how to optimize the latency compromising between area.
>
> So you may help me a lot with these:
> 1.Another way to realize face detection on zynq-7020?
> 2.How to test the throughput of my system and the relation between real throughput and the synthesis result?
> 3.Any way to optimize the following for loops?
>
> Looking forward to your reply. Please feel free to contact me at anytime.
> Thanks.
>
> ----loop1:
> imageScalerL1: for ( i = 0 ; i < IMAGE_HEIGHT ; i++ ){
>     imageScalerL1_1: for (j=0;j < IMAGE_WIDTH ;j++) {
>       #pragma HLS pipeline
>       if ( j < w2 && i < h2 )
>         IMG1_data[i][j] =  Data[(i*y_ratio)>>16][(j*x_ratio)>>16];
>     }
>   }
> ----loop2:
> Pixely: for( y = 0; y < sum_row; y++ ){
>     Pixelx: for ( x = 0; x < sum_col; x++ ){
>       /* Updates for Integral Image Window Buffer (I) */
>       SetIIu: for ( u = 0; u < WINDOW_SIZE; u++){
>       #pragma HLS unroll
>         SetIIj: for ( v = 0; v < WINDOW_SIZE; v++ ){
>         #pragma HLS unroll
>           II[u][v] = II[u][v] + ( I[u][v+1] - I[u][0] );
>         }
>       }
>
>       /* Updates for Square Image Window Buffer (SI) */
>       SII[0][0] = SII[0][0] + ( SI[0][1] - SI[0][0] );
>       SII[0][1] = SII[0][1] + ( SI[0][WINDOW_SIZE] - SI[0][0] );
>       SII[1][0] = SII[1][0] + ( SI[WINDOW_SIZE-1][1] - SI[WINDOW_SIZE-1][0] );
>       SII[1][1] = SII[1][1] + ( SI[WINDOW_SIZE-1][WINDOW_SIZE] - SI[WINDOW_SIZE-1][0] );
>
>       /* Updates for Image Window Buffer (I) and Square Image Window Bufer (SI) */
>       SetIj: for( j = 0; j < 2*WINDOW_SIZE-1; j++){
>       #pragma HLS unroll
>         SetIi: for( i = 0; i < WINDOW_SIZE; i++ ){
>         #pragma HLS unroll
>           if( i+j != 2*WINDOW_SIZE-1 ){
>             I[i][j] = I[i][j+1];
>             SI[i][j] = SI[i][j+1];
>           }
>           else if ( i > 0 ){
>             I[i][j] = I[i][j+1] + I[i-1][j+1];
>             SI[i][j] = SI[i][j+1] + SI[i-1][j+1];
>           }
>         }
>       }
>       // Last column of the I[][] and SI[][] matrix
>       Ilast: for( i = 0; i < WINDOW_SIZE-1; i++ ){
>       #pragma HLS unroll
>         I[i][2*WINDOW_SIZE-1] = L[i][x];
>         SI[i][2*WINDOW_SIZE-1] = L[i][x]*L[i][x];
>       }
>       I[WINDOW_SIZE-1][2*WINDOW_SIZE-1] = IMG1_data[y][x];
>       SI[WINDOW_SIZE-1][2*WINDOW_SIZE-1] = IMG1_data[y][x]*IMG1_data[y][x];
>
>       /** Updates for Image Line Buffer (L) **/
>       LineBuf: for( k = 0; k < WINDOW_SIZE-2; k++ ){
>       #pragma HLS unroll
>         L[k][x] = L[k+1][x];
>       }
>       L[WINDOW_SIZE-2][x] = IMG1_data[y][x];
>
>       /* Pass the Integral Image Window buffer through Cascaded Classifier. Only pass
>        * when the integral image window buffer has flushed out the initial garbage data */
>       if ( element_counter >= ( ( (WINDOW_SIZE-1)*sum_col + WINDOW_SIZE ) + WINDOW_SIZE -1 ) ) {
>
>          /* Sliding Window should not go beyond the boundary */
>          if ( x_index < ( sum_col - (WINDOW_SIZE-1) ) && y_index < ( sum_row - (WINDOW_SIZE-1) ) ){
>             p.x = x_index;
>             p.y = y_index;
>
>             result = cascadeClassifier ( p, II, SII );
>
>            if ( result > 0 )
>            {
>              MyRect r = {myRound(p.x*factor), myRound(p.y*factor), winSize.width, winSize.height};
>              AllCandidates_x[*AllCandidates_size]=r.x;
>              AllCandidates_y[*AllCandidates_size]=r.y;
>              AllCandidates_w[*AllCandidates_size]=r.width;
>              AllCandidates_h[*AllCandidates_size]=r.height;
>             *AllCandidates_size=*AllCandidates_size+1;
>            }
>          }// inner if
>          if ( x_index < sum_col-1 )
>              x_index = x_index + 1;
>          else{
>              x_index = 0;
>              y_index = y_index + 1;
>          }
>       } // outer if
>       element_counter +=1;
>     }
>   }

Verilog is not my forte, but I think arrays are arrays.  In the initial 
loop you are retrieving the data from 
Data[(i*y_ratio)>>16][(j*x_ratio)>>16].  The range of one index is 0 to 
IMAGE_HEIGHT-1 and the other is 0 to IMAGE_WIDTH-1.  I can never recall 
which is the inner index and which is the outer, but the math involved 
in calculating the address of the data is simpler if the inner index 
range is a binary power.  Is that the case?  If not, you can achieve the 
simplification by declaring the inner index to have a range which is a 
binary power, but only use a subrange that you need.  The cost is wasted 
memory, but it will improve performance and size because the address 
calculation will not require multiplication, but rather shifts which are 
done by mapping the index to the right address lines.

-- 

Rick C

Article: 160091
Subject: Re: Accelerating Face Detection on Zynq-7020 Using High Level Synthesis
From: Kevin Neilson <kevin.neilson@xilinx.com>
Date: Tue, 23 May 2017 11:21:37 -0700 (PDT)
Links: << >> << T >> << A >>

> Looking into the code, I find the latency is mainly caused by the followi=
ng for loops, but I don't know how to optimize the latency compromising bet=
ween area.
>=20
You're not meeting timing, which means you probably need to go look at the =
schematic of the critical paths.  How many levels are they?  How are the mu=
ltipliers being synthesized?  Are they using DSP48s or fabric?  As a rule, =
the more abstract a synthesis tool is, the worse the synthesis results will=
 be.

Also, where is "Data[]"?  Is that a blockRAM?  Or is it DRAM?  If you're ac=
cessing DRAM directly without a cache, you might have problems.

Article: 160092
Subject: Re: Test Driven Design?
From: Kevin Neilson <kevin.neilson@xilinx.com>
Date: Tue, 23 May 2017 11:26:20 -0700 (PDT)
Links: << >> << T >> << A >>

> Another important advantage of non-project mode is that it is fully compa=
tible with source control systems. When you don't have projects, you don't =
have piles of junk files of unknown purpose that changes every time you ope=
n a project or run a simulation. In non-project mode you have only hdl sour=
ces and tcl scripts. Therefore all information is stored in source control =
system but when you commit changes you commit only changes you have done, n=
ot random changes of unknown project files.
>=20
> In this situation work with IP cores a bit trickier, but not much. Consid=
ering that you don't change ip's very often, it's not a problem at all.
>=20
> I see that very small number of hdl designers know and use this mode. May=
be I should write an article about it. Where it would be appropriate to pub=
lish it?

I would like to know more about this.  When I used ISE I only used scripts =
(shell scripts) and when I transitioned to Vivado I promised I would use TC=
L scripts but I've never done that and I'm still just using the GUI.  I nee=
d to use the GUI to look at schematics of critical paths or to look at plac=
ement, but I'd like to use scripts to do all the PAR and timing and everyth=
ing else.

Article: 160093
Subject: Re: Test Driven Design?
From: Ilya Kalistru <stebanoid@gmail.com>
Date: Tue, 23 May 2017 12:05:18 -0700 (PDT)
Links: << >> << T >> << A >>

On Tuesday, May 23, 2017 at 9:26:26 PM UTC+3, Kevin Neilson wrote:
> > Another important advantage of non-project mode is that it is fully com=
patible with source control systems. When you don't have projects, you don'=
t have piles of junk files of unknown purpose that changes every time you o=
pen a project or run a simulation. In non-project mode you have only hdl so=
urces and tcl scripts. Therefore all information is stored in source contro=
l system but when you commit changes you commit only changes you have done,=
 not random changes of unknown project files.
> >=20
> > In this situation work with IP cores a bit trickier, but not much. Cons=
idering that you don't change ip's very often, it's not a problem at all.
> >=20
> > I see that very small number of hdl designers know and use this mode. M=
aybe I should write an article about it. Where it would be appropriate to p=
ublish it?
>=20
> I would like to know more about this.  When I used ISE I only used script=
s (shell scripts) and when I transitioned to Vivado I promised I would use =
TCL scripts but I've never done that and I'm still just using the GUI.  I n=
eed to use the GUI to look at schematics of critical paths or to look at pl=
acement, but I'd like to use scripts to do all the PAR and timing and every=
thing else.

I am writing an article about that. I'll post it here.

I examine timing reports in logs of Vivado, but if I have bad timings somew=
here, I often use GUI as well. It's just easier to understand what part of =
code creates bad timing if you investigate it visually.
I just open vivado, do
    open_checkpoint post_place.cpt
Then I examine schematics of the paths and and their placement. Non-project=
 mode doesn't prevent using GUI when you need it. They work fine together.

Article: 160094
Subject: Re: Accelerating Face Detection on Zynq-7020 Using High Level
From: Rob Gaddi <rgaddi@highlandtechnology.invalid>
Date: Tue, 23 May 2017 12:13:49 -0700
Links: << >> << T >> << A >>

On 05/23/2017 11:21 AM, Kevin Neilson wrote:
> As a rule, the more abstract a synthesis tool is, the worse the synthesis results will be.
> 

Sorry, I just walked back from lunch through a small horde of web 
developers, and was trying to envision the looks on their faces as C was 
referred to as being much too high level.

-- 
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order.  See above to fix.

Article: 160095
Subject: Re: Accelerating Face Detection on Zynq-7020 Using High Level Synthesis
From: yuning he <heyuning20@gmail.com>
Date: Wed, 24 May 2017 00:36:38 -0700 (PDT)
Links: << >> << T >> << A >>

Thank you for your reply.

The timing can meet when I slow down the clk to be 100MHz. 
The mutipliers are synthesized by DSP48s automatically. 
And "Data[]" is saved by BlockRAM dircetly.

Article: 160096
Subject: Re: Accelerating Face Detection on Zynq-7020 Using High Level Synthesis
From: yuning he <heyuning20@gmail.com>
Date: Wed, 24 May 2017 00:54:09 -0700 (PDT)
Links: << >> << T >> << A >>

=E5=9C=A8 2017=E5=B9=B45=E6=9C=8823=E6=97=A5=E6=98=9F=E6=9C=9F=E4=BA=8C UTC=
+8=E4=B8=8B=E5=8D=883:11:40=EF=BC=8Crickman=E5=86=99=E9=81=93=EF=BC=9A
> yuning he wrote:
> > Hello, here is my question:
> > Purpose: realize face detection on zynq-7020 SoC
> > Platform: Zedboard with OV5640 camera
> > Completed work: capturing video from camera, writing into DDR for stora=
ge and reading from DDR for display
> > Question: how to realize a face detection IP and its throughput can rea=
ch 30fps(pixel 320*240)
> >
> > Here are my jobs:
> > Base on the Viola Jones algorithm, using HLS(high level synthesis) tool=
 to realize hardware IP from a C++ design
> > And this is my reference: https://github.com/cornell-zhang/facedetect-f=
pga
> >
> > I have simulate and synthesize it into hardware IP, but its throughput =
does not reach the goal because the interval and latency are very large. (l=
atency is 338(min) to 576593236(max), interval is 336(min) to 142310514002(=
max))
> > Looking into the code, I find the latency is mainly caused by the follo=
wing for loops, but I don't know how to optimize the latency compromising b=
etween area.
> >
> > So you may help me a lot with these:
> > 1.Another way to realize face detection on zynq-7020?
> > 2.How to test the throughput of my system and the relation between real=
 throughput and the synthesis result?
> > 3.Any way to optimize the following for loops?
> >
> > Looking forward to your reply. Please feel free to contact me at anytim=
e.
> > Thanks.
> >
> > ----loop1:
> > imageScalerL1: for ( i =3D 0 ; i < IMAGE_HEIGHT ; i++ ){
> >     imageScalerL1_1: for (j=3D0;j < IMAGE_WIDTH ;j++) {
> >       #pragma HLS pipeline
> >       if ( j < w2 && i < h2 )
> >         IMG1_data[i][j] =3D  Data[(i*y_ratio)>>16][(j*x_ratio)>>16];
> >     }
> >   }
> > ----loop2:
> > Pixely: for( y =3D 0; y < sum_row; y++ ){
> >     Pixelx: for ( x =3D 0; x < sum_col; x++ ){
> >       /* Updates for Integral Image Window Buffer (I) */
> >       SetIIu: for ( u =3D 0; u < WINDOW_SIZE; u++){
> >       #pragma HLS unroll
> >         SetIIj: for ( v =3D 0; v < WINDOW_SIZE; v++ ){
> >         #pragma HLS unroll
> >           II[u][v] =3D II[u][v] + ( I[u][v+1] - I[u][0] );
> >         }
> >       }
> >
> >       /* Updates for Square Image Window Buffer (SI) */
> >       SII[0][0] =3D SII[0][0] + ( SI[0][1] - SI[0][0] );
> >       SII[0][1] =3D SII[0][1] + ( SI[0][WINDOW_SIZE] - SI[0][0] );
> >       SII[1][0] =3D SII[1][0] + ( SI[WINDOW_SIZE-1][1] - SI[WINDOW_SIZE=
-1][0] );
> >       SII[1][1] =3D SII[1][1] + ( SI[WINDOW_SIZE-1][WINDOW_SIZE] - SI[W=
INDOW_SIZE-1][0] );
> >
> >       /* Updates for Image Window Buffer (I) and Square Image Window Bu=
fer (SI) */
> >       SetIj: for( j =3D 0; j < 2*WINDOW_SIZE-1; j++){
> >       #pragma HLS unroll
> >         SetIi: for( i =3D 0; i < WINDOW_SIZE; i++ ){
> >         #pragma HLS unroll
> >           if( i+j !=3D 2*WINDOW_SIZE-1 ){
> >             I[i][j] =3D I[i][j+1];
> >             SI[i][j] =3D SI[i][j+1];
> >           }
> >           else if ( i > 0 ){
> >             I[i][j] =3D I[i][j+1] + I[i-1][j+1];
> >             SI[i][j] =3D SI[i][j+1] + SI[i-1][j+1];
> >           }
> >         }
> >       }
> >       // Last column of the I[][] and SI[][] matrix
> >       Ilast: for( i =3D 0; i < WINDOW_SIZE-1; i++ ){
> >       #pragma HLS unroll
> >         I[i][2*WINDOW_SIZE-1] =3D L[i][x];
> >         SI[i][2*WINDOW_SIZE-1] =3D L[i][x]*L[i][x];
> >       }
> >       I[WINDOW_SIZE-1][2*WINDOW_SIZE-1] =3D IMG1_data[y][x];
> >       SI[WINDOW_SIZE-1][2*WINDOW_SIZE-1] =3D IMG1_data[y][x]*IMG1_data[=
y][x];
> >
> >       /** Updates for Image Line Buffer (L) **/
> >       LineBuf: for( k =3D 0; k < WINDOW_SIZE-2; k++ ){
> >       #pragma HLS unroll
> >         L[k][x] =3D L[k+1][x];
> >       }
> >       L[WINDOW_SIZE-2][x] =3D IMG1_data[y][x];
> >
> >       /* Pass the Integral Image Window buffer through Cascaded Classif=
ier. Only pass
> >        * when the integral image window buffer has flushed out the init=
ial garbage data */
> >       if ( element_counter >=3D ( ( (WINDOW_SIZE-1)*sum_col + WINDOW_SI=
ZE ) + WINDOW_SIZE -1 ) ) {
> >
> >          /* Sliding Window should not go beyond the boundary */
> >          if ( x_index < ( sum_col - (WINDOW_SIZE-1) ) && y_index < ( su=
m_row - (WINDOW_SIZE-1) ) ){
> >             p.x =3D x_index;
> >             p.y =3D y_index;
> >
> >             result =3D cascadeClassifier ( p, II, SII );
> >
> >            if ( result > 0 )
> >            {
> >              MyRect r =3D {myRound(p.x*factor), myRound(p.y*factor), wi=
nSize.width, winSize.height};
> >              AllCandidates_x[*AllCandidates_size]=3Dr.x;
> >              AllCandidates_y[*AllCandidates_size]=3Dr.y;
> >              AllCandidates_w[*AllCandidates_size]=3Dr.width;
> >              AllCandidates_h[*AllCandidates_size]=3Dr.height;
> >             *AllCandidates_size=3D*AllCandidates_size+1;
> >            }
> >          }// inner if
> >          if ( x_index < sum_col-1 )
> >              x_index =3D x_index + 1;
> >          else{
> >              x_index =3D 0;
> >              y_index =3D y_index + 1;
> >          }
> >       } // outer if
> >       element_counter +=3D1;
> >     }
> >   }
>=20
> Verilog is not my forte, but I think arrays are arrays.  In the initial=
=20
> loop you are retrieving the data from=20
> Data[(i*y_ratio)>>16][(j*x_ratio)>>16].  The range of one index is 0 to=
=20
> IMAGE_HEIGHT-1 and the other is 0 to IMAGE_WIDTH-1.  I can never recall=
=20
> which is the inner index and which is the outer, but the math involved=20
> in calculating the address of the data is simpler if the inner index=20
> range is a binary power.  Is that the case?  If not, you can achieve the=
=20
> simplification by declaring the inner index to have a range which is a=20
> binary power, but only use a subrange that you need.  The cost is wasted=
=20
> memory, but it will improve performance and size because the address=20
> calculation will not require multiplication, but rather shifts which are=
=20
> done by mapping the index to the right address lines.
>=20
> --=20
>=20
> Rick C

Thank you for your reply.
Here IMAGE_HEIGHT equals to 240, and IMAGE_WIDTH equals to 320.According to=
 your advice, I can change the inner index of the array to be a binary powe=
r to accelerate the address access. Is this right?

Article: 160097
Subject: Re: Accelerating Face Detection on Zynq-7020 Using High Level
From: rickman <gnuarm@gmail.com>
Date: Wed, 24 May 2017 13:24:12 -0400
Links: << >> << T >> << A >>

yuning he wrote:
> 在 2017年5月23日星期二 UTC+8下午3:11:40，rickman写道：
>> yuning he wrote:
>>> Hello, here is my question:
>>> Purpose: realize face detection on zynq-7020 SoC
>>> Platform: Zedboard with OV5640 camera
>>> Completed work: capturing video from camera, writing into DDR for storage and reading from DDR for display
>>> Question: how to realize a face detection IP and its throughput can reach 30fps(pixel 320*240)
>>>
>>> Here are my jobs:
>>> Base on the Viola Jones algorithm, using HLS(high level synthesis) tool to realize hardware IP from a C++ design
>>> And this is my reference: https://github.com/cornell-zhang/facedetect-fpga
>>>
>>> I have simulate and synthesize it into hardware IP, but its throughput does not reach the goal because the interval and latency are very large. (latency is 338(min) to 576593236(max), interval is 336(min) to 142310514002(max))
>>> Looking into the code, I find the latency is mainly caused by the following for loops, but I don't know how to optimize the latency compromising between area.
>>>
>>> So you may help me a lot with these:
>>> 1.Another way to realize face detection on zynq-7020?
>>> 2.How to test the throughput of my system and the relation between real throughput and the synthesis result?
>>> 3.Any way to optimize the following for loops?
>>>
>>> Looking forward to your reply. Please feel free to contact me at anytime.
>>> Thanks.
>>>
>>> ----loop1:
>>> imageScalerL1: for ( i = 0 ; i < IMAGE_HEIGHT ; i++ ){
>>>     imageScalerL1_1: for (j=0;j < IMAGE_WIDTH ;j++) {
>>>       #pragma HLS pipeline
>>>       if ( j < w2 && i < h2 )
>>>         IMG1_data[i][j] =  Data[(i*y_ratio)>>16][(j*x_ratio)>>16];
>>>     }
>>>   }
>>> ----loop2:
>>> Pixely: for( y = 0; y < sum_row; y++ ){
>>>     Pixelx: for ( x = 0; x < sum_col; x++ ){
>>>       /* Updates for Integral Image Window Buffer (I) */
>>>       SetIIu: for ( u = 0; u < WINDOW_SIZE; u++){
>>>       #pragma HLS unroll
>>>         SetIIj: for ( v = 0; v < WINDOW_SIZE; v++ ){
>>>         #pragma HLS unroll
>>>           II[u][v] = II[u][v] + ( I[u][v+1] - I[u][0] );
>>>         }
>>>       }
>>>
>>>       /* Updates for Square Image Window Buffer (SI) */
>>>       SII[0][0] = SII[0][0] + ( SI[0][1] - SI[0][0] );
>>>       SII[0][1] = SII[0][1] + ( SI[0][WINDOW_SIZE] - SI[0][0] );
>>>       SII[1][0] = SII[1][0] + ( SI[WINDOW_SIZE-1][1] - SI[WINDOW_SIZE-1][0] );
>>>       SII[1][1] = SII[1][1] + ( SI[WINDOW_SIZE-1][WINDOW_SIZE] - SI[WINDOW_SIZE-1][0] );
>>>
>>>       /* Updates for Image Window Buffer (I) and Square Image Window Bufer (SI) */
>>>       SetIj: for( j = 0; j < 2*WINDOW_SIZE-1; j++){
>>>       #pragma HLS unroll
>>>         SetIi: for( i = 0; i < WINDOW_SIZE; i++ ){
>>>         #pragma HLS unroll
>>>           if( i+j != 2*WINDOW_SIZE-1 ){
>>>             I[i][j] = I[i][j+1];
>>>             SI[i][j] = SI[i][j+1];
>>>           }
>>>           else if ( i > 0 ){
>>>             I[i][j] = I[i][j+1] + I[i-1][j+1];
>>>             SI[i][j] = SI[i][j+1] + SI[i-1][j+1];
>>>           }
>>>         }
>>>       }
>>>       // Last column of the I[][] and SI[][] matrix
>>>       Ilast: for( i = 0; i < WINDOW_SIZE-1; i++ ){
>>>       #pragma HLS unroll
>>>         I[i][2*WINDOW_SIZE-1] = L[i][x];
>>>         SI[i][2*WINDOW_SIZE-1] = L[i][x]*L[i][x];
>>>       }
>>>       I[WINDOW_SIZE-1][2*WINDOW_SIZE-1] = IMG1_data[y][x];
>>>       SI[WINDOW_SIZE-1][2*WINDOW_SIZE-1] = IMG1_data[y][x]*IMG1_data[y][x];
>>>
>>>       /** Updates for Image Line Buffer (L) **/
>>>       LineBuf: for( k = 0; k < WINDOW_SIZE-2; k++ ){
>>>       #pragma HLS unroll
>>>         L[k][x] = L[k+1][x];
>>>       }
>>>       L[WINDOW_SIZE-2][x] = IMG1_data[y][x];
>>>
>>>       /* Pass the Integral Image Window buffer through Cascaded Classifier. Only pass
>>>        * when the integral image window buffer has flushed out the initial garbage data */
>>>       if ( element_counter >= ( ( (WINDOW_SIZE-1)*sum_col + WINDOW_SIZE ) + WINDOW_SIZE -1 ) ) {
>>>
>>>          /* Sliding Window should not go beyond the boundary */
>>>          if ( x_index < ( sum_col - (WINDOW_SIZE-1) ) && y_index < ( sum_row - (WINDOW_SIZE-1) ) ){
>>>             p.x = x_index;
>>>             p.y = y_index;
>>>
>>>             result = cascadeClassifier ( p, II, SII );
>>>
>>>            if ( result > 0 )
>>>            {
>>>              MyRect r = {myRound(p.x*factor), myRound(p.y*factor), winSize.width, winSize.height};
>>>              AllCandidates_x[*AllCandidates_size]=r.x;
>>>              AllCandidates_y[*AllCandidates_size]=r.y;
>>>              AllCandidates_w[*AllCandidates_size]=r.width;
>>>              AllCandidates_h[*AllCandidates_size]=r.height;
>>>             *AllCandidates_size=*AllCandidates_size+1;
>>>            }
>>>          }// inner if
>>>          if ( x_index < sum_col-1 )
>>>              x_index = x_index + 1;
>>>          else{
>>>              x_index = 0;
>>>              y_index = y_index + 1;
>>>          }
>>>       } // outer if
>>>       element_counter +=1;
>>>     }
>>>   }
>>
>> Verilog is not my forte, but I think arrays are arrays.  In the initial
>> loop you are retrieving the data from
>> Data[(i*y_ratio)>>16][(j*x_ratio)>>16].  The range of one index is 0 to
>> IMAGE_HEIGHT-1 and the other is 0 to IMAGE_WIDTH-1.  I can never recall
>> which is the inner index and which is the outer, but the math involved
>> in calculating the address of the data is simpler if the inner index
>> range is a binary power.  Is that the case?  If not, you can achieve the
>> simplification by declaring the inner index to have a range which is a
>> binary power, but only use a subrange that you need.  The cost is wasted
>> memory, but it will improve performance and size because the address
>> calculation will not require multiplication, but rather shifts which are
>> done by mapping the index to the right address lines.
>>
>> --
>>
>> Rick C
>
> Thank you for your reply.
> Here IMAGE_HEIGHT equals to 240, and IMAGE_WIDTH equals to 320.According to your advice, I can change the inner index of the array to be a binary power to accelerate the address access. Is this right?

I believe so.  To minimize the waste of memory, I would make the 240 the 
inner index with a range of 256.  Then the multiplication becomes a 
matter of shifting the outer index by 8 bits and adding to the inner 
index.  I don't know for sure, but the tools should figure this out 
automatically.

Keep your loop range as 0 to 239 and everything will still work as you 
expect skipping over 16 array values at each increment of the outer 
index.  You will need to be consistent in all accesses to the memory.

-- 

Rick C

Article: 160098
Subject: fpga zigbee interface
From: srirameee09@gmail.com
Date: Wed, 24 May 2017 22:16:33 -0700 (PDT)
Links: << >> << T >> << A >>

i have spartan6 atlys(LX45) board, can anyone suggest me how to interface zigbee to this board to communicate with pc.thnx

Article: 160099
Subject: Re: fpga zigbee interface
From: BobH <wanderingmetalhead.nospam.please@yahoo.com>
Date: Thu, 25 May 2017 06:05:11 -0700
Links: << >> << T >> << A >>

On 05/24/2017 10:16 PM, srirameee09@gmail.com wrote:
> i have spartan6 atlys(LX45) board, can anyone suggest me how to interface zigbee to this board to communicate with pc.thnx
> 

Buy a Zigbee module and implement whatever physical layer it needs. 
Freescale/NXP probably offers some. Texas Instruments may also.

Any particular reason for using Zigbee? Bluetooth would probably be 
easier to find modules for and probably has better support on the PC end.

BobH

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search