Messages from 114850

Article: 114850
Subject: IP Protection
From: "nagaraj" <nagarajputti@gmail.com>
Date: 25 Jan 2007 02:50:56 -0800
Links: << >> << T >> << A >>

Hi all,

I want to know whether  xilinx provides any features  to encrypt the
netlist files(.ngc or .ngd) before delivering.And also what is the
chance of reverse engineering the netlist files( for example from .ngc
to .vhd.)


I know that xilinx provides features to encrypt bitstream files(using
AES with 256 bit key) but i want to know are there any features to
encrypt netlist files?.


Thanks,
Nagaraj

Article: 114851
Subject: Re: On-chip randomness (V4FX)
From: "Symon" <symon_brewer@hotmail.com>
Date: Thu, 25 Jan 2007 11:29:50 -0000
Links: << >> << T >> << A >>

<jetmarc@hotmail.com> wrote in message 
news:1169720137.703620.129880@a75g2000cwd.googlegroups.com...

>
> For example I'm thinking about an oscillating combinatorial loop,
> sampled during the regular clock events. I expect the output would vary
> (at least a little bit) with temperature, supply voltage and perhaps
> moon phase.
>
Hi Marc,
This is difficult. Ring oscillators are prone to resonate with other on-chip 
clocks. Also, Xilinx have spent a lot of effort fixing other things you 
could use like SEUs (see Austin's posts) or Metastability resolution (see 
Peter's posts). (The metastability thing isn't very random anyway.)
If only you could incorporate the actual ISE software into your design; it 
produces random errors with consummate ease!
I suggest you use the built in Vbatt driven solution with a secret number 
kept alive by a Li coin cell. Apparently,  it's good enough for the Feds.
Not much help, sorry. Syms.

Article: 114852
Subject: Any UK mirror for ISE 8.2i SP2?
From: "lbo_user" <shareef.jalloq@lightblueoptics.com>
Date: 25 Jan 2007 03:46:29 -0800
Links: << >> << T >> << A >>

The Xilinx server is taking forever to serve the latest service pack.
Does anyone know of a UK mirror for it or can anyone host it for me?

Thanks.

Article: 114853
Subject: Re: IP Protection
From: "Sylvain Munaut <SomeOne@SomeDomain.com>" <246tnt@gmail.com>
Date: 25 Jan 2007 03:46:41 -0800
Links: << >> << T >> << A >>

On Jan 25, 11:50 am, "nagaraj" <nagarajpu...@gmail.com> wrote:
> Hi all,
>
> I want to know whether  xilinx provides any features  to encrypt the
> netlist files(.ngc or .ngd) before delivering.And also what is the
> chance of reverse engineering the netlist files( for example from .ngc
> to .vhd.)

Converte from a ngc to a vhd is easy ... but that will be a vhd that
just instanciates primitives. If you want to find out how the ip works,
then there is more work to do, most of it manually. If the IP is huge,
that can take a while. But the attacker still have the full name of
your signals and instances and hierarchy (more or less) so that's of
big help.

You can also "mangle" the netlist to obfuscate it, but that has a cost
: the packer might do it's job a little less correctly without all the
full paths ... (so about +5% slices ...)

> I know that xilinx provides features to encrypt bitstream files(using
> AES with 256 bit key) but i want to know are there any features to
> encrypt netlist files?.

yes,
ngcbuild -insert_secure_netlist prohibit non_secure.ngc secure.ngc

And that will slow down a good attacker for a good ... 10 min give or
take.

    Sylvain

Article: 114854
Subject: Re: video buffering scheme, nonsequential access (no spatial locality)
From: "Sylvain Munaut <SomeOne@SomeDomain.com>" <246tnt@gmail.com>
Date: 25 Jan 2007 04:04:35 -0800
Links: << >> << T >> << A >>

Well, there won't be a schema that fits every possible transform ...
(if there was that would mean the SDRAM would be as flexible as SRAM
...)

Can't you narrow a little the type of access you want to do ?

Article: 114855
Subject: Re: On-chip randomness (V4FX)
From: "Sylvain Munaut <SomeOne@SomeDomain.com>" <246tnt@gmail.com>
Date: 25 Jan 2007 04:10:00 -0800
Links: << >> << T >> << A >>

I think using on board oscillator you should be able to produce some
bits of entropy indeed. You just have to design you circuit so that
even two very close frequency would lead to very different answer.

Maybe you can use external stimuli as source of entropy as well ?

Also, you mention "session", if the user can do several session without
power down / power up, you can use local BRAM to store a seed. That
will at least protect during one power cycle. And making a lot of try
with powerup / powerdown is gonna take some time ...

On Jan 25, 11:15 am, jetm...@hotmail.com wrote:
> Hi,
>
> I'm working on a V4FX design that contains cryptographic primitives.
>
> To secure the system against an expected threat, it is necessary to
> produce a different seed for every session. The seed doesn't have to be
> random nor unpredictable, but the same seed should never be used more
> than once.
>
> The system contains non-volatile storage for the purpose of generating
> a non-repeating seed.  However the storage is external to the V4FX. An
> attacker could isolate the V4FX and make it repeat a seed by replaying
> the external stimulus (as recorded from a previous session).
>
> To counter this threat, I wish to mix the externally acquired seed with
> on-chip generated "randomness". That would result in a different seed
> even during a stimulus replay attack. I know that chips are designed to
> behave as repeatable as possible, and I'm asking for quite the
> opposite. But at least I want to try, given that even a small amount of
> entropy can discourage an attack.
>
> For example I'm thinking about an oscillating combinatorial loop,
> sampled during the regular clock events. I expect the output would vary
> (at least a little bit) with temperature, supply voltage and perhaps
> moon phase.
>
> What other V4FX resources could be (mis)used for this purpose? I'd like
> to use two or three unrelated methods.
>
> If you have any suggestion or experience, I'd highly appreciate your
> input.
> 
> Regards,
> Marc

Article: 114856
Subject: Re: Xilinx ISE 8.2
From: Daniel O'Connor <darius@dons.net.au>
Date: Thu, 25 Jan 2007 23:13:54 +1030
Links: << >> << T >> << A >>

Martin Thompson wrote:
> Simulation I use vhdl-mode for, it scans my entire set of files and
> builds me a complete makefile.  Anyone who hasn't spent 2 or 3 weeks

Sure that's VHDL mode? I can't see anything in the docs about simulation..

> attempting to get up the learning curve of emacs/vhdl-mode has missed
> out IMHO.  Two of my colleagues (who are distinctly mouse and windows
> types) are converted to Emacs for vhdl writing (purely down to the
> power of vhdl-mode).  I used to use Codewright, which I thought was
> pretty hot, but it's not a patch on emacs, especially for VHDL.

Hmm, it looks very interesting but I use Verilog :)

> The rest of my scripts are simple bat files in windows, I just make a
> "working directory", copy in the EDF, UCF and any other NGD files I
> might need, then run NGCbuild, map, par etc.  I have a few other bits
> ont he end for our internal use which generate a C-file with the
> bitstream as an array, and embed the time of compile into the UserID
> register.

I need to know the commands.. I am software guy and all the "compile" steps
are confusing :)

-- 
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C

Article: 114857
Subject: Re: On-chip randomness (V4FX)
From: "Thomas Stanka" <usenet_10@stanka-web.de>
Date: 25 Jan 2007 04:51:13 -0800
Links: << >> << T >> << A >>

Hi,

On 25 Jan., 11:15, jetm...@hotmail.com wrote:
> For example I'm thinking about an oscillating combinatorial loop,
> sampled during the regular clock events. I expect the output would vary
> (at least a little bit) with temperature, supply voltage and perhaps
> moon phase.
>
> What other V4FX resources could be (mis)used for this purpose? I'd like
> to use two or three unrelated methods.

What about multiplying clock with the DCM as fast as possible and clock
a counter (designed to be as slow as possible) with that clock? Best
solutions should be achieved before the DCM is locked ;). This approach
wont give you true random numbers but the vary should be
nondeterministic enough to be useable as seed generator for a lfsr.

bye Thomas

Article: 114858
Subject: Re: On-chip randomness (V4FX)
From: Ray Andraka <ray@andraka.com>
Date: Thu, 25 Jan 2007 09:30:07 -0500
Links: << >> << T >> << A >>

jetmarc@hotmail.com wrote:

> Hi,
> 
> I'm working on a V4FX design that contains cryptographic primitives.
> 
> To secure the system against an expected threat, it is necessary to
> produce a different seed for every session. The seed doesn't have to be
> random nor unpredictable, but the same seed should never be used more
> than once.
> 
> The system contains non-volatile storage for the purpose of generating
> a non-repeating seed.  However the storage is external to the V4FX. An
> attacker could isolate the V4FX and make it repeat a seed by replaying
> the external stimulus (as recorded from a previous session).
> 
> To counter this threat, I wish to mix the externally acquired seed with
> on-chip generated "randomness". That would result in a different seed
> even during a stimulus replay attack. I know that chips are designed to
> behave as repeatable as possible, and I'm asking for quite the
> opposite. But at least I want to try, given that even a small amount of
> entropy can discourage an attack.
> 
> For example I'm thinking about an oscillating combinatorial loop,
> sampled during the regular clock events. I expect the output would vary
> (at least a little bit) with temperature, supply voltage and perhaps
> moon phase.
> 
> What other V4FX resources could be (mis)used for this purpose? I'd like
> to use two or three unrelated methods.
> 
> If you have any suggestion or experience, I'd highly appreciate your
> input.
> 
> Regards,
> Marc
> 

V4 has an oscillator built in that isn't advertised.  I forget the name 
of it at the moment, but you can find it by looking at the Xilinx DCM 
workaround for the NBTI issues, as the tools connect unused DCMs to that 
oscillator. You could beat that against an external clock using a 
counter to get random seeds.  In order to avoid the oscillator syncing 
up to the external clock, you should take the seed sample once as soon 
as practical after start-up, and use that to seed a PN generator.  I am 
aware of someone working on a master's thesis regarding evaluation of 
randomness of randoms generated in various ways in an FPGA.  I believe 
she was using Spartan3's for her work, and much of it revolved around 
using ring oscillators.  I think she found that ring oscillators beating 
against one another worked well as long as the seed was generated right 
after startup and the ring oscillators were located far apart on the die.

Article: 114859
Subject: Re: video buffering scheme, nonsequential access (no spatial locality)
From: "Gabor" <gabor@alacron.com>
Date: 25 Jan 2007 06:42:43 -0800
Links: << >> << T >> << A >>

I've done something similar in the past.  In my project I was
doing small-angle rotation, so I knew ahead of time the maximum
line-to-line skew of pixels that became vertical in the output
image, and it was small (like 1).  When I started the project,
however I had the idea that the best way to accomplish the
general case of rotation is to make a cache memory in
the FPGA.  The parts I was using at the time (XCV50's)
were a bit small to implement a decent cache, but I would
think newer parts could do this quite handily.

Also important are using the minimum burst size in the
SDRAM to reduce you cache-line access time.

HTH,
Gabor

On Jan 24, 2:36 pm, "wallge" <wal...@gmail.com> wrote:
> I am doing some embedded video processing, where I store an incoming
> frame of video, then based on some calculations in another part of the
> system, I warp that buffered frame of video. Now when the frame goes
> into the buffer
> (an off-FPGA SDRAM chip), it is simply written in one pixel at a time
> in row major ordering.
>
> The problem with this is that I will not be accessing it in this way. I
> may want to do some arbitrary image rotation. This means
>  the first pixel I want to access is not the first one I put in the
> buffer, It might actually be the last one in the buffer. If I am doing
> full page reads, or even burst reads, I will get a bunch of pixels that
> I will not need to determine the output pixel value. If i just do
> single reads, this waists a bunch of clock cycles setting up the SDRAM,
> telling it which row to activate and which column to read from. After
> the read is done, you then have to issue the precharge command to close
> the row. There is a high degree of inefficiency to this. It takes 5,
> maybe 10 clock cycles just to retrieve one
> pixel value.
>
> Does anyone know a good way to organize a frame buffer to be more
> friendly (and more optimal) to nonsequential access (like the kind we
> might need if we wanted to warp the input image via some
> linear/nonlinear transformation)?

Article: 114860
Subject: EDK-Modelsim XE
From: olive_dominguez@yahoo.fr
Date: 25 Jan 2007 07:10:53 -0800
Links: << >> << T >> << A >>

Hello,

I am using the Xilinx EDK to perform simulations of the embedded
PowerPC on a Virtex 4 Fx.  I have had no success using simply the EDK
(Simgen) with
ModelsimXE. Does anyone know how can I simulate with ModelsimXE ? Is it
possible to simulate PPC
behavorial?


Thanks, Olivier.

Article: 114861
Subject: Re: video buffering scheme, nonsequential access (no spatial locality)
From: "jbnote" <jbnote@gmail.com>
Date: 25 Jan 2007 08:04:36 -0800
Links: << >> << T >> << A >>

Hello,

It all depends on your needs, of course, but block-style ordering can
help
a bit to relieve the problem by breaking the 1D-orientedness of
raster-scan sort.

For instance, you can pack pixel by 16, which will represent a 4x4
square in your image. When
retrieving the data, you get data from both dimentions, which will have
much better spatial locality
than a line of 16 pixels. This may help you quite a bit.

Peano-style or quadtree-style walking of the image could also be
investigated,
but my memories from it is that it's quite a bit more complicated...

JB

Article: 114862
Subject: Simulation of DCM with Xilinx 8.2 and Modelsim 6.1
From: "Frai" <maybetooparanoid@gmail.com>
Date: 25 Jan 2007 08:13:34 -0800
Links: << >> << T >> << A >>

Hello,

for some reason I still don't understand, when I simulate my Post Place
& Route model, the DCM that was configured to introduce a delay of 2ns
does not work properly. It shows a delay of 1.3 ns instead.

This is the code that I use to instantiate the DCM:

-- DCM with fixed positive phase shift (2 ns, configured with Coregen)
Inst_my_dcm_a: my_dcm PORT MAP(
	CLKIN_IN => dsp_clk_pad_a,
	RST_IN => rst,
	CLKIN_IBUFG_OUT => open,
	CLK0_OUT => dsp_clk_a,
	LOCKED_OUT => LOCKED_a
);

The input clock has a period of 10 ns (100 Mhz).

In Modelsim, this is the route and the delays from the clock input pad
until some point in the circuit:

0.000 ns 	@ /tim_top_tb/dspa_clk    (<- this is the input clock signal
in the testbench)
0.000 ns 	@ /tim_top_tb/i_tim_top/dsp_clk_pad_a  (<- this is the input
pad at the top module instance)
0.000 ns	@ /tim_top_tb/i_tim_top/inst_my_dcm_a_clkin_ibufg_inst/i
0.181 ns	@ /tim_top_tb/i_tim_top/inst_my_dcm_a_clkin_ibufg_inst/o
0.181 ns	@ /tim_top_tb/i_tim_top/dsp_clk_pad_a_inbuf
0.181 ns	@ /tim_top_tb/i_tim_top/inst_my_dcm_a_dcm_inst_clkin_buf/i
1.618 ns	@ /tim_top_tb/i_tim_top/inst_my_dcm_a_dcm_inst_clkin_buf/o
1.618 ns	@ /tim_top_tb/i_tim_top/inst_my_dcm_a_dcm_inst_clkin_buf_11712
1.618 ns	@ /tim_top_tb/i_tim_top/inst_my_dcm_a_dcm_inst/clkin
1.618 ns	@ /tim_top_tb/i_tim_top/inst_my_dcm_a_dcm_inst/clkin_ipd
1.618 ns	@
/tim_top_tb/i_tim_top/inst_my_dcm_a_dcm_inst/i_clock_divide_by_2/clock
1.618 ns	@
/tim_top_tb/i_tim_top/inst_my_dcm_a_dcm_inst/i_clock_divide_by_2/clock_out
1.618 ns	@ /tim_top_tb/i_tim_top/inst_my_dcm_a_dcm_inst/clkin_div
3.649 ns	@ /tim_top_tb/i_tim_top/inst_my_dcm_a_dcm_inst/clkin_ps
3.649 ns	@ /tim_top_tb/i_tim_top/inst_my_dcm_a_dcm_inst/clk0_out
8.601 ns	@ /tim_top_tb/i_tim_top/inst_my_dcm_a_dcm_inst/clk0
8.601 ns	@ /tim_top_tb/i_tim_top/inst_my_dcm_a_clk0_buf
8.601 ns	@ /tim_top_tb/i_tim_top/inst_my_dcm_a_clk0_bufg_inst_i0_used/i
9.440 ns	@ /tim_top_tb/i_tim_top/inst_my_dcm_a_clk0_bufg_inst_i0_used/o
9.440 ns	@ /tim_top_tb/i_tim_top/inst_my_dcm_a_clk0_bufg_inst_i0_inv
9.440 ns	@ /tim_top_tb/i_tim_top/inst_my_dcm_a_clk0_bufg_inst/i0
10.029 ns	@ /tim_top_tb/i_tim_top/inst_my_dcm_a_clk0_bufg_inst/o
10.029 ns	@ /tim_top_tb/i_tim_top/dsp_clk_a       (<- this is the name
of the signal that I actually use in my design, the one that I want to
delay 2 ns)
10.029 ns	@ /tim_top_tb/i_tim_top/g_dsp_n_0_i_dsp_n_busin_16_clkinv/i
11.401 ns	@ /tim_top_tb/i_tim_top/g_dsp_n_0_i_dsp_n_busin_16_clkinv/o
11.401 ns	@
/tim_top_tb/i_tim_top/g_dsp_n_0_i_dsp_n_busin_16_clkinv_38490
11.401 ns	@ /tim_top_tb/i_tim_top/g_dsp_n_0_i_dsp_n_busin_16/clk
11.401 ns	@
/tim_top_tb/i_tim_top/g_dsp_n_0_i_dsp_n_busin_16/clk_resolved
11.401 ns	@ /tim_top_tb/i_tim_top/g_dsp_n_0_i_dsp_n_busin_16/clk_ipd
11.401 ns	@ /tim_top_tb/i_tim_top/g_dsp_n_0_i_dsp_n_busin_16/clk_dly
(<- this is the clock input to a register)

Can anyone tell me what I am missing? Why the delay between
/tim_top_tb/dspa_clk and /tim_top_tb/i_tim_top/dsp_clk_a is 0.029 ns
instead of 2 ns?

Regards.

Article: 114863
Subject: Re: ML403 board - VGA schematics - wrong pins
From: "Brad Smallridge" <bradsmallridge@dslextreme.com>
Date: Thu, 25 Jan 2007 08:15:11 -0800
Links: << >> << T >> << A >>


Hi Gerardo Sosa

I switch from ML402 to ML403 and back quite often.
The ML403 give me faster compile times so it's good
for testing small designs. When I do this I comment
the extra pins out as shown in the UCF file below.

I also have to change my VHDL to reflect fewer
inputs with the ML403 and I am searching for a
way to automate that procedure.

It seems like I don't use the blank, sync_n,
or p_save pins in either design.
Just HSYNC VSYNC and CLK. They must be pulled
to acceptable levels.

Brad Smallridge
AiVision


# VGA OUTPUTS

NET "vga_clk_out"    LOC = "AF8" ;
NET "vga_clk_out"    IOSTANDARD = LVDCI_33 ;
NET "vga_clk_out"    SLEW = FAST ;
NET "vga_clk_out"    DRIVE = 8 ;

NET "vga_vsync_out"  LOC = "A8" ;
NET "vga_vsync_out"  SLEW = FAST ;
NET "vga_vsync_out"  DRIVE = 8 ;

NET "vga_hsync_out"  LOC = "C10" ;
NET "vga_hsync_out"  SLEW = FAST ;
NET "vga_hsync_out"  DRIVE = 8 ;

NET "vga_b_out<3>"  LOC = "C5"  ; # VGA_B3 or tft_lcd_b<1>
NET "vga_b_out<4>"  LOC = "C7"  ; # VGA_B4 or tft_lcd_b<2>
NET "vga_b_out<5>"  LOC = "B7"  ; # VGA_B5 or tft_lcd_b<3>
NET "vga_b_out<6>"  LOC = "G8"  ; # VGA_B6 or tft_lcd_b<4>
NET "vga_b_out<7>"  LOC = "F8"  ; # VGA_B7 or tft_lcd_b<5>
#NET vga_b_out<*> IOSTANDARD = LVCMOS33;
NET "vga_g_out<3>"  LOC = "E4"  ; # VGA_G3 or tft_lcd_g<1>
NET "vga_g_out<4>"  LOC = "D3"  ; # VGA_G4 or tft_lcd_g<2>
NET "vga_g_out<5>"  LOC = "H7"  ; # VGA_G5 or tft_lcd_g<3>
NET "vga_g_out<6>"  LOC = "H8"  ; # VGA_G6 or tft_lcd_g<4>
NET "vga_g_out<7>"  LOC = "C1"  ; # VGA_G7 or tft_lcd_g<5>
#NET vga_g_out<*> IOSTANDARD = LVCMOS33;
NET "vga_r_out<3>"  LOC = "C2"  ; #VGA_R3 tft_lcd_r<1>
NET "vga_r_out<4>"  LOC = "G7"  ; #VGA_R4 tft_lcd_r<2>
NET "vga_r_out<5>"  LOC = "F7"  ; #VGA_R5 tft_lcd_r<3>
NET "vga_r_out<6>"  LOC = "E5"  ; #VGA_R6 tft_lcd_r<4>
NET "vga_r_out<7>"  LOC = "E6"  ; #VGA_R7 tft_lcd_r<5>

# extra VGA connections for ML402 not ML403
#NET vga_b_out<0> LOC = "M26";
#NET vga_b_out<1> LOC = "M21";
#NET vga_b_out<2> LOC = "L26";
#NET vga_g_out<0> LOC = "M22";
#NET vga_g_out<1> LOC = "M23";
#NET vga_g_out<2> LOC = "M20";
#NET vga_r_out<0> LOC = "N23";
#NET vga_r_out<1> LOC = "N24";
#NET vga_r_out<2> LOC = "N25";
# END EXTRA CONNECTIONS

#NET vga_psave_n LOC = "M25";
#NET vga_blank_n LOC = "M24";
#NET vga_sync_n LOC =  "L23";

# drive strength and speed for VGA
NET vga_r_out<*> SLEW  = FAST;
NET vga_r_out<*> DRIVE = 8;
NET vga_g_out<*> SLEW  = FAST;
NET vga_g_out<*> DRIVE = 8;
NET vga_b_out<*> SLEW  = FAST;
NET vga_b_out<*> DRIVE = 8;

Article: 114864
Subject: Re: On-chip randomness (V4FX)
From: Austin Lesea <austin@xilinx.com>
Date: Thu, 25 Jan 2007 08:39:16 -0800
Links: << >> << T >> << A >>

Marc,

See:

http://www.xilinx.com/xlnx/xweb/xil_tx_display.jsp?iLanguageID=1&category=&sGlobalNavPick=&sSecondaryNavPick=&multPartNum=1&sTechX_ID=alsd_random

or

http://tinyurl.com/2qk77z

There are actually rules for use of ring oscillators to generate random
numbers, unclassified, for official use only, from the NSA.  I am sure
you can find it (since it is unclassified) somewhere.  The caveat is:
whatever you do, you must then test it.

Austin

Article: 114865
Subject: Re: Any UK mirror for ISE 8.2i SP2?
From: "John Adair" <g1@enterpoint.co.uk>
Date: 25 Jan 2007 08:39:41 -0800
Links: << >> << T >> << A >>

Your best bet is to try the Xilinx server 7-9 am UK time. US generally
asleep then and Europe not active. Server is going to be very busy this
week with ISE 9.1 Webpack release.

John Adair
Enterpoint Ltd.

On 25 Jan, 11:46, "lbo_user" <shareef.jal...@lightblueoptics.com>
wrote:
> The Xilinx server is taking forever to serve the latest service pack.
> Does anyone know of a UK mirror for it or can anyone host it for me?
> 
> Thanks.

Article: 114866
Subject: Re: Aligning data with clock
From: Bill <-@-.com>
Date: Thu, 25 Jan 2007 08:53:35 -0800
Links: << >> << T >> << A >>

I meant the data and the clock is just out of phase initially. It is a biphase mark code I am workign with. I have recovered the clock, but the bits may be out of phase when I sample, so I need to delay either the clock or the bitstream. It will stay aligned after that until the stream is broken.

It is running at 20mhz+.

Article: 114867
Subject: Re: video buffering scheme, nonsequential access (no spatial locality)
From: Martin Thompson <martin.j.thompson@trw.com>
Date: Thu, 25 Jan 2007 17:07:49 +0000
Links: << >> << T >> << A >>

"wallge" <wallge@gmail.com> writes:

> I am doing some embedded video processing, where I store an incoming
> frame of video, then based on some calculations in another part of the
> system, I warp that buffered frame of video. Now when the frame goes
> into the buffer
> (an off-FPGA SDRAM chip), it is simply written in one pixel at a time
> in row major ordering.
>
> The problem with this is that I will not be accessing it in this way. I
> may want to do some arbitrary image rotation. This means
>  the first pixel I want to access is not the first one I put in the
> buffer, It might actually be the last one in the buffer. If I am doing
> full page reads, or even burst reads, I will get a bunch of pixels that
> I will not need to determine the output pixel value. If i just do
> single reads, this waists a bunch of clock cycles setting up the SDRAM,
> telling it which row to activate and which column to read from. After
> the read is done, you then have to issue the precharge command to close
> the row. There is a high degree of inefficiency to this. It takes 5,
> maybe 10 clock cycles just to retrieve one
> pixel value.
>

If you are doing truly arbitrary warping, then is it not right that
you can never get an optimal organisation for all warps?

> Does anyone know a good way to organize a frame buffer to be more
> friendly (and more optimal) to nonsequential access (like the kind we
> might need if we wanted to warp the input image via some
> linear/nonlinear transformation)?
>

Could you do some kind of caching scheme where you read an entire DRAM
row in at a time, and "hope it comes in handy" later?

Failing that, can you use SSRAM for your frame buffer?

Or, can you parallelise your task so that it operates on (eg) 4 wildly
different areas of input data at a time, which means you can use the
banking mechanism of the DRAMs to hide the latency?

Those are my initial thoughts (whilst waiting for a very loooooong
simulation to run :-)

Cheers,
Martin

-- 
martin.j.thompson@trw.com 
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.conekt.net/electronics.html

Article: 114868
Subject: Re: video buffering scheme, nonsequential access (no spatial locality)
From: Mike Treseler <mike_treseler@comcast.net>
Date: Thu, 25 Jan 2007 09:11:25 -0800
Links: << >> << T >> << A >>

wallge wrote:

> Does anyone know a good way to organize a frame buffer to be more
> friendly (and more optimal) to nonsequential access

Sounds like a RAM.
If it didn't fit in fpga block ram
I would use an external device.

       -- Mike Treseler

Article: 114869
Subject: xilinx 8.2 xps debug problems
From: Ludwig Lenz <llenz@vlsi.informatik.tu-darmstadt.de>
Date: Thu, 25 Jan 2007 18:39:32 +0100
Links: << >> << T >> << A >>

Hi

I am working with an xilinx xup board. 8.2 sp1 ise + 8.2 edk.
I am starting creating a design with ppc running from ddr ram, sysace and
uart for output. Starting the debugger with:
xmd -xmp system.xmp -opt etc/xmd_ppc405_0.opt
with example application works.

Closing the project adding a new plb peripherial template with 4x32
registers no interrupts. Reopening the project. Download bitstream.
Starting the debugger again gives:

JTAG chain configuration
--------------------------------------------------
Device   ID Code        IR Length    Part Name
 1       05059093          16        XCF32P
 2       0a001093           8        System_ACE
 3       0127e093          14        XC2VP30


Unable to connect to PowerPC target. Invalid Processor Version No 0x00000000

make -f system.make clean doesn't help.

Any idea why the jtag autodetection fails after adding an core to the plb
bus?

Thanks
Ludwig

Article: 114870
Subject: Re: video buffering scheme, nonsequential access (no spatial locality)
From: "Pete Fraser" <pfraser@covad.net>
Date: Thu, 25 Jan 2007 09:43:54 -0800
Links: << >> << T >> << A >>

"Gabor" <gabor@alacron.com> wrote in message 
news:1169736163.476029.150290@l53g2000cwa.googlegroups.com...
> I've done something similar in the past.  In my project I was
> doing small-angle rotation, so I knew ahead of time the maximum
> line-to-line skew of pixels that became vertical in the output
> image, and it was small (like 1).  When I started the project,
> however I had the idea that the best way to accomplish the
> general case of rotation is to make a cache memory in
> the FPGA.  The parts I was using at the time (XCV50's)
> were a bit small to implement a decent cache, but I would
> think newer parts could do this quite handily.

Another option (depending on your mapping) would be to do
it in two passes. There's a transpose in the middle, so
it would probably be best to do it in small sections to an on-chip
transpose buffer, before writing it out to the intermediate store.

Have you thought about what order of filtering you need?

Check out Digital Image Warping by Wolberg, or one of
Alvy Ray Smith's scan line ordering papers.

Article: 114871
Subject: Re: On-chip randomness (V4FX)
From: "Symon" <symon_brewer@hotmail.com>
Date: Thu, 25 Jan 2007 17:45:21 -0000
Links: << >> << T >> << A >>

"Austin Lesea" <austin@xilinx.com> wrote in message 
news:epamfl$vv5@cnn.xsj.xilinx.com...
> Marc,
>
> See:
>
> http://www.xilinx.com/xlnx/xweb/xil_tx_display.jsp?iLanguageID=1&category=&sGlobalNavPick=&sSecondaryNavPick=&multPartNum=1&sTechX_ID=alsd_random
>
> or
>
> http://tinyurl.com/2qk77z
>
> There are actually rules for use of ring oscillators to generate random
> numbers, unclassified, for official use only, from the NSA.  I am sure
> you can find it (since it is unclassified) somewhere.  The caveat is:
> whatever you do, you must then test it.
>
> Austin

Hmm, this is interesting:- http://www.faqs.org/rfcs/rfc4086.html

Article: 114872
Subject: Re: video buffering scheme, nonsequential access (no spatial locality)
From: "wallge" <wallge@gmail.com>
Date: 25 Jan 2007 09:48:34 -0800
Links: << >> << T >> << A >>

I should have been more specific in my question.

I have to use a small (64 Mbit) mobile sdram. I can't choose
to use a different storage element in the system (other than *some*
FPGA buffering, though not full frame).

I have heard some discussion of the way in which graphic accelerator
boards do memory transactions, storing pixels in blocks of neighbor
pixels
(instead of being organized row major). In other words the spatial
locality
in the SDRAM buffer might look like:

Image pixels:
                  N2 N3 N4
                  N1  P  N5
                  N8 N7 N6

Memory organization:
ADDR     DATA
0x0000      P
0x0001     N1
0x0002     N2
0x0003     N3
0x0004     N4
0x0005     N5
0x0006     N6
0x0007     N7
0x0008     N8

Where P is the central pixel of interest, and the N's are its
neighbors.
We organize the pixels in the SDRAM buffer not by rows, but by regions
of interest.
This way if we are doing some kind of Image warp and we want to get
more bang for the buck
in terms of read latency, we are more likely to reuse pixels in the
neighborhood of the currently accessed pixel
than if we were arranged in a row or column major ordering (consider
the case were we wanted to rotate an image by 47.2 degrees from input
to output).

Has anyone seen something like this or know of any resources online
with regard to memory buffer organization schemes for graphics or image
processing?

On Jan 24, 2:36 pm, "wallge" <wal...@gmail.com> wrote:
> I am doing some embedded video processing, where I store an incoming
> frame of video, then based on some calculations in another part of the
> system, I warp that buffered frame of video. Now when the frame goes
> into the buffer
> (an off-FPGA SDRAM chip), it is simply written in one pixel at a time
> in row major ordering.
>
> The problem with this is that I will not be accessing it in this way. I
> may want to do some arbitrary image rotation. This means
>  the first pixel I want to access is not the first one I put in the
> buffer, It might actually be the last one in the buffer. If I am doing
> full page reads, or even burst reads, I will get a bunch of pixels that
> I will not need to determine the output pixel value. If i just do
> single reads, this waists a bunch of clock cycles setting up the SDRAM,
> telling it which row to activate and which column to read from. After
> the read is done, you then have to issue the precharge command to close
> the row. There is a high degree of inefficiency to this. It takes 5,
> maybe 10 clock cycles just to retrieve one
> pixel value.
>
> Does anyone know a good way to organize a frame buffer to be more
> friendly (and more optimal) to nonsequential access (like the kind we
> might need if we wanted to warp the input image via some
> linear/nonlinear transformation)?

Article: 114873
Subject: Re: video buffering scheme, nonsequential access (no spatial locality)
From: "Pete Fraser" <pfraser@covad.net>
Date: Thu, 25 Jan 2007 10:00:31 -0800
Links: << >> << T >> << A >>

"wallge" <wallge@gmail.com> wrote in message 
news:1169747314.537493.237140@l53g2000cwa.googlegroups.com...

>
> Image pixels:
>                  N2 N3 N4
>                  N1  P  N5
>                  N8 N7 N6

Have you thought about what order of filtering you'll
need to use?

Article: 114874
Subject: Re: On-chip randomness (V4FX)
From: "Peter Alfke" <peter@xilinx.com>
Date: 25 Jan 2007 10:00:49 -0800
Links: << >> << T >> << A >>

I suggest an on-chip LFSR pseudo-random sequence generator, clocked by
a ring-oscillator (or equivalent) clock source and timed out by a fixed
delay, derived from the xtal oscillator. That's all very simple, takes
only a few CLBs, when the SRL16 is used for the LFSR.
Peter Alfke, Xilinx

On Jan 25, 2:15 am, jetm...@hotmail.com wrote:
> Hi,
>
> I'm working on a V4FX design that contains cryptographic primitives.
>
> To secure the system against an expected threat, it is necessary to
> produce a different seed for every session. The seed doesn't have to be
> random nor unpredictable, but the same seed should never be used more
> than once.
>
> The system contains non-volatile storage for the purpose of generating
> a non-repeating seed.  However the storage is external to the V4FX. An
> attacker could isolate the V4FX and make it repeat a seed by replaying
> the external stimulus (as recorded from a previous session).
>
> To counter this threat, I wish to mix the externally acquired seed with
> on-chip generated "randomness". That would result in a different seed
> even during a stimulus replay attack. I know that chips are designed to
> behave as repeatable as possible, and I'm asking for quite the
> opposite. But at least I want to try, given that even a small amount of
> entropy can discourage an attack.
>
> For example I'm thinking about an oscillating combinatorial loop,
> sampled during the regular clock events. I expect the output would vary
> (at least a little bit) with temperature, supply voltage and perhaps
> moon phase.
>
> What other V4FX resources could be (mis)used for this purpose? I'd like
> to use two or three unrelated methods.
>
> If you have any suggestion or experience, I'd highly appreciate your
> input.
> 
> Regards,
> Marc

Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources

Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Custom Search