Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
On 21 Lip, 19:17, woj...@gmail.com wrote: > On 21 Lip, 17:06, woj...@gmail.com wrote: > > > > My calculator says that 12.288 MHz / 44100 Hz = 278.63946, which suggests > > > that you need a different master clock frequency, or you will have > > > jitter-related problems. > > > Are You sure with that ? Most clock diveders work as a counters and > > when counter overflowes, output changes state. So if I use 12.288 as > > main clock and count from 1 to 256 i get output changed with 44100 Hz. > > Or maybe i made mistake somewhere... > > My mistake, i mean of course 48 kHz Another question - if i have 4 codecs, where each is connected to differnet bank (bank0-4) - should each bank have to have its own clock ? or may i have one master clock connected to GCLK0 ? I have studied datasheet from xilinx but i have not found answer for that question.Article: 134001
On Jul 21, 12:03=A0pm, ppero...@gmail.com wrote: > Hi all, > > I'm doing some performance tests with multi-threaded xilkernel > applications and I always get erroneous times in programs where > context switches occurs continuously. I have written a trivial > application that shows the problem: > > Two threads running without communication between them. Each one > simply iterates N times and in each loop makes a yield() call. It > ensures one context switch per loop and thread (both have the same > priority and the policy is SCHED_PRIO). Now, each M iterations, I > track the elapsed kernel ticks and print them on screen. Both > xilkernel and the main application are optimized (-O2) and each > xilkernel tick is set to 10 milliseconds. > > thread body > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > #define N =A0 =A01000000 > #define M =A0 =A01000 > > for (i=3D0; i<N; i++) { > > =A0 yield(); > > =A0 if (!(i % M)) { > =A0 =A0 time =3D xget_clock_ticks(); > =A0 =A0 xil_printf("Elapsed time: %d msec\n", time * 10); > > } > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > Results indicate that the times printed by the application are always > lesser than the times elapsed in reality (you can measure the real > elapsed time, for example, with the Windows' clock or with a > timekeeper). I mean if the program really takes about 50 seconds to > finish, I get from application about 24 seconds. This is impossible. > > For example, If I change yield() for sleep(1), results now corresponds > with the real elapsed time. Moreover, if I do yield() each X > iterations (testing progressively with a bigger value of X: 10, 100, > 1000...1000000), the results printed on screen fit more and more to > the real elapsed times. > > Can anyone explain me what is happening?, Is it a xilkernel bug? > > Many thanks for your help. > Paco ---------------------------------------------------------------------------= ------------------ Try shortening > xil_printf("Elapsed time: %d msec\n", time * 10); to xil_printf(" %d msec\n" time * 10); and see if the elapsed time better matches the execution time. Perhaps the print buffer or whatever is affecting your results. > Moreover, if I do yield() each X > iterations (testing progressively with a bigger value of X: 10, 100, > 1000...1000000), the results printed on screen fit more and more to > the real elapsed times. waht is X? is it M? is it NArticle: 134002
I've got a skew-sensitive part of my design that I need to place and route by hand - no matter how much I try and help it ISE just doesn't seem capable of getting the skew below about 1.5 ns, whereas hand routing (using a symmetrical pattern) can get it down to zero according to the editor, which I assume means it's probably about as good as it's going to get :) Anyhow, using the FPGA editor is, I find, simply painful. Sometimes it comes back and says "Nothing found to route", other times "Cannot manually route", and simply repeating the exact series of clicks will then route happily. Other times it doesn't route what I have selected - despite having a (as far as I can tell) correct route, it'll go and send the net half way across the FPGA and back again. While this is somewhat annoying, there's two things that make it much worse - apparently there's no "undo" feature, and even worse there's no way to partially unroute a net. So typically what happens is I nearly complete the net, it throws a route half way across the FPGA, and I have to start the net all over again from scratch. As a result, all this has turned what should have been a 1 hour job into something that took close to two evenings. So, apart from the somewhat useful supplied documentation, does anyone have any good links for taming the FPGA editor? Oh, and also the autoroute feature seems to really hate my design. Select any two pins (or a net), click autoroute, and then it crashes . No error message, it just disappears. This happens in 9.1i SP3 and 10.1 SP1 (though I haven't really played around too much with 10.1 because it's so much slower - I didn't know it was even possible for an application to take 3 seconds to close a "find" dialog box!) -- Michael Brown Add michael@ to emboss.co.nz ---+--- My inbox is always openArticle: 134003
On Jul 21, 1:47 pm, woj...@gmail.com wrote: > On 21 Lip, 19:17, woj...@gmail.com wrote: > > > On 21 Lip, 17:06, woj...@gmail.com wrote: > > > > > My calculator says that 12.288 MHz / 44100 Hz = 278.63946, which suggests > > > > that you need a different master clock frequency, or you will have > > > > jitter-related problems. > > > > Are You sure with that ? Most clock diveders work as a counters and > > > when counter overflowes, output changes state. So if I use 12.288 as > > > main clock and count from 1 to 256 i get output changed with 44100 Hz. > > > Or maybe i made mistake somewhere... > > > My mistake, i mean of course 48 kHz > > Another question - if i have 4 codecs, where each is connected to > differnet bank (bank0-4) - should each bank have to have its own > clock ? or may i have one master clock connected to GCLK0 ? I have > studied datasheet from xilinx but i have not found answer for that > question. The clocks are independent of the banks, so one clock can drive the entire chip. RickArticle: 134004
Michael Brown wrote: > I've got a skew-sensitive part of my design that I need to place and route > by hand - no matter how much I try and help it ISE just doesn't seem capable > of getting the skew below about 1.5 ns, whereas hand routing (using a > symmetrical pattern) can get it down to zero according to the editor, which > I assume means it's probably about as good as it's going to get :) > > Anyhow, using the FPGA editor is, I find, simply painful. Sometimes it comes > back and says "Nothing found to route", other times "Cannot manually route", > and simply repeating the exact series of clicks will then route happily. > Other times it doesn't route what I have selected - despite having a (as far > as I can tell) correct route, it'll go and send the net half way across the > FPGA and back again. While this is somewhat annoying, there's two things > that make it much worse - apparently there's no "undo" feature, and even > worse there's no way to partially unroute a net. So typically what happens > is I nearly complete the net, it throws a route half way across the FPGA, > and I have to start the net all over again from scratch. As a result, all > this has turned what should have been a 1 hour job into something that took > close to two evenings. > > So, apart from the somewhat useful supplied documentation, does anyone have > any good links for taming the FPGA editor? > > > Oh, and also the autoroute feature seems to really hate my design. Select > any two pins (or a net), click autoroute, and then it crashes . No error > message, it just disappears. This happens in 9.1i SP3 and 10.1 SP1 (though I > haven't really played around too much with 10.1 because it's so much > slower - I didn't know it was even possible for an application to take 3 > seconds to close a "find" dialog box!) > > -- > Michael Brown > Add michael@ to emboss.co.nz ---+--- My inbox is always open No links but a couple quick pointers: When a bad route occurs, it's usually associated with one destination pin. If it isn't yet due to partial routing, finish the connection of that route to one valid destination pin. Then select *only* that destination pin (no net) and hit the "unroute" button. The entire net is not unrouted, just what's associated with that connection. To try to autoroute one portion at a time, click only the single destination pin and hit autoroute to see if the route for that portion of the net takes an acceptable turn. If not, unroute the one pin and start back to manual routing. Be sure you have your autorouter set up before you try autorouting, whether individual pins or full nets. The defaults tend to be the "totally lackluster results" settings rather than timing and/or resource friendly settings. Knowing the aoutoroute and unroute can be per-pin can sincerely help in getting things done. You may have discovered that manually routing segments sometimes requires you turn off the routes so you can select a segment rather than the partial net that's occupying that segement. You can click the route display back on before or after performing the partial route. My recollection on this point is less clear: if you select the driver and one destination, the "route" button may try to add that one connection pair to any existing partial net similar to the autoroute with only the destination selected. For manual routing I often have a common segment selected, select one distination pin, route, deselect the destination pin, then select a different destination pin which keeps the common segment selected (since it's hard to select thanks to the display of the routed nets). Isn't FPGA Editor fun?! - John_HArticle: 134005
Hi, I'll try to explain what I think is happening with your program (maybe I'm completely wrong). You are doing too many yields(). With the code you are showing, you are doing one yield() operation for each iteration of the loop, it means that each three instructions in the loop are executed per each yield() call: (incrementing i, testing if i < N, and testing if i % M == 0, this will be probably between 5 to 15 assembly instrucions depending on compiler optimizations). Each yield call will be a system call with more than 100 assembly instructions being executed (probably many more, just doing the context switch will be storing 33 registers out and 33 registers in ) All the instructions executed for the yield() call are executed with the INTERRUPTS DISABLED because xilkernel disables interrupts when doing system calls, and if a timer interrupt arrives while executing the yield() the timer will be stopped until yield() ends, but time you measure with your hand clock is still counting. When you do the yield() calls each 10, 100, 1000 . . . . iterations you are getting better results because the time the system is performing system calls is fewer, so it's less probable that a timer interrupt arrives while the interrupts are disabled. I don't know if with my poor english you will understadn what I wrote. If something is not clear just ask again :-) Regards, Pablo HArticle: 134006
Michael Brown wrote: > > So, apart from the somewhat useful supplied documentation, does anyone > have any good links for taming the FPGA editor? My advice is to avoid FPGA Editor at all costs. Not necessarily because it's a bad tool, but even if it were awesome it's still an unwise design practice to hand-route--as you have found, it will take hours and hours to fix one net, and you will never save time in the long run. Also, the hours spend hand-routing are in vain if you ever change the source again. Hand-routing is a last resort that should make you ask, "How did I get here? Isn't there a better way?" When you have a timing problem, the first place to make changes is the source. If there is any possible way to fix the source (pipelining, Shannon expansion, etc.) then fix it there. Even a major architecture change will take less time than hand-routing. Make sure constraints are correct; make explicit all multicycle paths and ignored paths to free up routing resources. Then try different ISE options (multipass, efforts levels, etc.) Then try hand-placement using area groups or by locking down individual CLBs using PlanAhead. Try to avoid any sort of manual routing or routing guide files. I know this doesn't answer your question directly, but in my experience, you will avoid getting stuck in a quagmire if you use other methods to fix timing and avoid the lure of the imagined "quick fix" of hand-routing. Fix things at the front end. You don't want to find yourself underwater when a Katrina-like disaster strikes. Ha ha; I bet you haven't heard that one before. -KevinArticle: 134007
Hopefully beating the bushes in many arenas will help me ellict some responses. I have a system on a V4FX100 that is using both PPC405s. Each system is an exact clone of the other, with some minor variations like how the other's memory controller is mapped in. (They both can't be mapped to 0x0 :P). I have a Linux software image that I've built that will run on one and not the other. And by not run, I mean it boots successfully, gives me a console, I run the applications and then in some point in time in the near future the system crashes, semi-gracefully. The kernel crashes and prints out the crash, hence the gracefully part, but its dead in the water. Has anyone seen anything similar when using both PPCs? From what I can tell it looks like it may be related to timers, but that's only guessing and inconclusive. I don't have any extra timers (plb/opb/ xps_timer) built into either system. Thanks, MikeArticle: 134008
Kevin Neilson wrote: > Michael Brown wrote: > >> >> So, apart from the somewhat useful supplied documentation, does anyone >> have any good links for taming the FPGA editor? > > My advice is to avoid FPGA Editor at all costs. Not necessarily because > it's a bad tool, but even if it were awesome it's still an unwise design > practice to hand-route--as you have found, it will take hours and hours > to fix one net, and you will never save time in the long run. Also, the > hours spend hand-routing are in vain if you ever change the source > again. Hand-routing is a last resort that should make you ask, "How did > I get here? Isn't there a better way?" When you have a timing problem, > the first place to make changes is the source. If there is any possible > way to fix the source (pipelining, Shannon expansion, etc.) then fix it > there. Even a major architecture change will take less time than > hand-routing. Make sure constraints are correct; make explicit all > multicycle paths and ignored paths to free up routing resources. Then > try different ISE options (multipass, efforts levels, etc.) Then try > hand-placement using area groups or by locking down individual CLBs > using PlanAhead. Try to avoid any sort of manual routing or routing > guide files. I know this doesn't answer your question directly, but in > my experience, you will avoid getting stuck in a quagmire if you use > other methods to fix timing and avoid the lure of the imagined "quick > fix" of hand-routing. Fix things at the front end. You don't want to > find yourself underwater when a Katrina-like disaster strikes. Ha ha; I > bet you haven't heard that one before. -Kevin I would respectfully disagree that some hand routing is a complete waste of energy. When RPMs or explicitly placed logic elements become useful because of performance needs, one simple step further is to use manual routing constraints - "directed routing" - that are included in the UCF (user constraints file). This is not a "reroute every time you compile" or "completely mix up the naming" problem. Nets and logic cones tend to retain their relationship such that if name changes do occur, they're very simple to edit. If these performance critical items are in their own hierarchical module, even the names are very unlikely to change. For things like ring oscillators and DDR structures that don't use the I/O cells, directed routing can be invaluable. It's not a task for the inexperienced but is a powerful tool for those who need to kick the last picoseconds out of their timing budget. - John_HArticle: 134009
"Kevin Neilson" <kevin_neilson@removethiscomcast.net> wrote in message news:g633bv$4452@cnn.xsj.xilinx.com... [...] > If there is any possible way to fix the source (pipelining, Shannon > expansion, etc.) then fix it there. The problem is that it's in an async part of the design (I know, I know, asynchronous designs are evil[*]) - actually the part that takes an async external signal and gets it to something more friendly. I have a number of clocks, each with a slight phase shift, each latching the wire I'm having trouble with. The amount of skew in the line basically determines the number of flipflops that will be metastable. I have logic in there to handle this, but unfortunately it essentially grows exponentially with the amount of skew. Ideally, I'd like to tell the placer/router to sort things out so that the skew goes in the opposite direction to the clock delays. Unfortunately this doesn't seem to be possible? Hence why I resorted to hand-routing it. Additionally, once I get this bit sorted out, it's very unlikely to change. [*] Actually, this design uses pretty much every evil technique in the book. Asynchronous regions, combinatorial loops, and logic-generated clocks. A -4 Spartan 3E really takes a lot of coersion to handle 1-ns pulses ... -- Michael Brown Add michael@ to emboss.co.nz ---+--- My inbox is always openArticle: 134010
John_H wrote: [...] > Knowing the aoutoroute and unroute can be per-pin can sincerely help > in getting things done. Aha! Thanks for all the tips, especially the unrouting one. This will make things a lot less fustrating. I can live with it throwing a fit every so often as long as there's a way to tidy up its mess without wiping out a quarter of an hour or more of work. I just tried autorouting on a simple design (a counter) and it works fine. Trying to autoroute an exactly identical counter in my more complex design makes the editor crash. Oh well, I don't need it to work for this design anyhow :) -- Michael Brown Add michael@ to emboss.co.nz ---+--- My inbox is always openArticle: 134011
Hi, Here is the problem: it seems that the memory monitor window does not always correctly display a ram contents. When the address is close to a ram boundary, the monitor may show ???? instead of real data. I know that I wasn't first to dicover this bug and that there is a workaround, which is to use gdb. However, I would like to know if this glitch have been fixed in the newer version 10.1. Thanks, -- AndrewArticle: 134012
On Jul 21, 1:45 pm, rickman <gnu...@gmail.com> wrote: > I don't remember for sure what the basis of 44100 Hz was, but I think > it has to do with being compatible with TV scan rates. It is > divisible by both 50 Hz and 60 Hz. But then so is 48,000 Hz. 44100 The wikipedia article on CD's suggests that it comes from the early practice of using PCM converters to store/master digital audio on professional video cassette decks before the later arrival of purpose built digital audio tape equipment. They could store three stereo samples per video line, and somehow that gets you 44100 on a PAL recorder. (The dimensions of the tape cassette no doubt derive from the need for efficient packing in a shipping container capable of loading on an undercarriage compatible with the rut spacing of roman roads...)Article: 134013
On Jul 21, 6:54 pm, morphiend <morphi...@gmail.com> wrote: > I have a system on a V4FX100 that is using both PPC405s. Each system > is an exact clone of the other, with some minor variations like how > the other's memory controller is mapped in. (They both can't be mapped > to 0x0 :P). Can you sway this difference in assignment between them and see if the problem goes with it or stays in the original CPU? > I have a Linux software image that I've built that will run on one and > not the other. And by not run, I mean it boots successfully, gives me > a console, I run the applications and then in some point in time in > the near future the system crashes, semi-gracefully. The kernel > crashes and prints out the crash, hence the gracefully part, but its > dead in the water. Without any basis in specific factual knowledge, I'd wonder about it trying to use memory that doesn't physically exist in the mapping. Is there some sort of ram test config option in the kernel? If not, could you add one? Does it always crash in the same place? Learning to debug linux kernel crashes can be interesting. Sometimes the output messages aren't real informative, but if you think you've found the code that triggers them you can modify it with extra output that will tell you for sure that you've found the right spot, and maybe give you more detail as to why. Oh, and be glad your crash occurs quickly, as that makes it a lot easier to track down. My recent learning experience opportunity typically occurred only after several hours of uptime (traced to bad handling of corrupted ethernet packets, which were occurring rarely)Article: 134014
Hi, Then it should be a direct match for the FSL interface. If MicroBlaze executes a 'put' instruction , it will not write until the FSL_M_Full flag is '0' and when it write it will set the FSL_M Write high for one clock cycle. MicroBlaze have plenty of options for the FSL instructions, you can get all about them in the reference manual. Göran "Ray D." <ray.delvecchio@gmail.com> wrote in message news:693f947e-929e-49f6-939d-d834e0048121@27g2000hsf.googlegroups.com... On Jul 21, 2:36 am, "Göran Bilski" <goran.bil...@xilinx.com> wrote: > Hi, > > Depending a little on how your busy signals work, you might just hook up > your module to the FSL interface on MicroBlaze. > Your busy signal needs be high when it can't accept a new word even when > there is no attempt to write to the module. > MicroBlaze will also just do one cycle write so your module needs to > accept > a new word in one clock cycle when busy is low. > > Connect: > din(7 downto 0) -> FSL0_M_Data(24 to 31) > din_ready -> FSL0_M_Write > busy -> FSL0_M_Full > > You need to enable FSL Interfaces to MicroBlaze with the parameter > C_FSL_LINKS (set it to 1) > You can write to the fsl interface with the function putfslx, you can read > more about this function in the document "OS and Libraries Document > Collection". > > Göran > > "Ray D." <ray.delvecc...@gmail.com> wrote in message > > news:276dce6d-c9ed-4937-95ea-e3c86ff3656a@d45g2000hsc.googlegroups.com... > > > Hey all, > > > I have a Xilinx Spartan-3E starter board, and I'm implementing a > > MicroBlaze processor on the FPGA. I would also like to use the LCD > > which is on board, and I have already developed a hardware module that > > takes care of initialization and printing to the LCD. The interface > > is shown below: > > > entity LCD_top is > > Port ( > > clk : in STD_LOGIC; > > reset : in STD_LOGIC; > > > din : in STD_LOGIC_VECTOR (7 downto 0); > > din_ready : in STD_LOGIC; > > busy : out STD_LOGIC; > > > LCD_D : out STD_LOGIC_VECTOR (11 downto 8); > > LCD_E : out STD_LOGIC; > > LCD_RS : out STD_LOGIC; > > LCD_RW : out STD_LOGIC > > > ); > > end LCD_top; > > > I really would like to instantiate this module along with the > > processor core. My question is this - how would I go about > > interfacing this with the MicroBlaze processor internal to the FPGA? > > What I would like to do is define a GPIO port on the processor to > > connect to the din, din_ready and busy lines of the LCD module, but I > > keep getting the following error: > > > ERROR:MDT - INST:LCD_data_status_10Bit PORT:GPIO_IO > > CONNECTOR:LCD_data_status_10Bit_GPIO_IO - C:\EDK_Test_LCD > > \system.mhs line 150 > > - connection is not connected to an external port! > > MPD subproperties IOB_STATE=BUF|REG or THREE_STATE=TRUE require > > that the port > > be connected directly to an external port. > > > Is there any way to work around this? I realize I could just connect > > the LCD to the GPIO directly and write software drivers, but I'm > > trying to avoid that because I already have the hardware module in > > place and working smoothly. It will also be nice to have this > > separate module so that it does the work of printing to the LCD, and > > the processor itself can stay busy with other more important jobs. > > > Also, is there an easier way to add another hardware module without > > manually editing the generated VHDL files for the core? I'm not sure > > if you can do that within Platform Studio. > > > Any advice would be much appreciated, thanks! > > > Ray That is how the module works so I'll have to try some of these options! The busy signal is set high the entire time data is being written to the LCD. Originally I had a module "program.vhd" that controlled the LCD module along with a keyboard module that we we had in place for user input. Within program.vhd, I implement a state machine and check if the busy signal is high before writing to the LCD. If busy = 0, then I set din_ready high and set the 8-bits of data. This is buffered within the LCD module and you only need to hold din_ready for a single cycle to write to the LCD. The LCD is connected over a 4-bit interface to the FPGA and this is taken care of within the LCD module. When the writing operation begins busy is set to '1' until complete. RayArticle: 134015
What mr Datta suggested did not work but eventually got me on the right track. From the same location in the Settings, I disabled automatic inferring of RAM and shift registers. Then I re-wrote the design to specifially use embedded memory where it was needed. This seem to solve the problem and also leaves 2.5K LUTs free after full synthesis. I think the problem was that Quartus used an all-or-nothing approach during mapping. It took my ~70 shiftregisters and made it into one giant shiftregister, then tried to fit all of them either into LUTs or RAMs. thanks everyone for your help!Article: 134016
Hi, I can not find detailed information on this feature "PCR re-stamping". In certain applications it may be that the input stream from the transport multiplexer is provided at a fixed rate and will not support the standard TS interface handshake mechanism and consequently some form of rate adaption is required. Padding the input TS stream with NULL TS packets (or PRBS TS packets) as required and perform any PCR adjustment. But how ? I can not find "detailed" information. Someone with a experience in this regard ? Thanks. Kappa.Article: 134017
As an intellectual exercise I have been playing with some cryptographic functions. Currently, I am looking at the RC5 =93key expander=94 (http://people.csail.mit.edu/rivest/Rivest-rc5.pdf): A' =3D (S + A + B) rol 3 B' =3D (L + A' + B) rol (A'+B) where =93rol=94 denotes rotate left and all variables are 32-bit. While simple enough, I have had hard time to implement this efficiently. My current best implementation requires >300 LUTs and >128 regs :( Can anybody help me understand how to optimize this simple function for time and area? I am working under the following assumptions: 1.the target is an Cyclone II FPGA (which according to Ray the suck at arithmetic, this makes this exercise even more interesting...) 2.The latency is two cycles (i.e. A, B, S & L are asserted at clk=3Dt, A' and B' are read at clk=3Dt+2). I could go with three cycles, but I'm not sure it does any good. 3.S and L could be re-timed, that is, they could be available before or after clk=3Dt if needed (for L, it is preferred to come earlier). 4.I am using the webpack (the $$$ version can optimize the pipeline automatically). 5.There is an special case where S is a constant, this could maybe allow some optimization? Could anyone help me improve my implementation? PS. Please understand that this is an intellectual exercise and not someones school assignment, hence I prefer a good discussion instead of your code :)Article: 134018
On Jul 22, 3:48=A0am, tgau3...@gmail.com wrote: > As an intellectual exercise I have been playing with some > cryptographic functions. Currently, I am looking at the RC5 =93key > expander=94 (http://people.csail.mit.edu/rivest/Rivest-rc5.pdf): > > A' =3D (S + A + B) rol 3 > B' =3D (L + A' + B) rol (A'+B) > > where =93rol=94 denotes rotate left and all variables are 32-bit. > > While simple enough, I have had hard time to implement this > efficiently. My current best implementation requires >300 LUTs and > > >128 regs :( > > Can anybody help me understand how to optimize this simple function > for time and area? I am working under the following assumptions: > 1.the target is an Cyclone II FPGA (which according to Ray the suck at > arithmetic, this makes this exercise even more interesting...) > 2.The latency is two cycles (i.e. A, B, S & L are asserted at clk=3Dt, > A' and B' are read at clk=3Dt+2). I could go with three cycles, but I'm > not sure it does any good. > 3.S and L could be re-timed, that is, they could be available before > or after clk=3Dt if needed (for L, it is preferred to come earlier). > 4.I am using the webpack (the $$$ version can optimize the pipeline > automatically). > 5.There is an special case where S is a constant, this could maybe > allow some optimization? > > Could anyone help me improve my implementation? > > PS. Please understand that this is an intellectual exercise and not > someones school assignment, hence I prefer a good discussion instead > of your code :) 1) Have you verified the non-optimized interpretation of the algorithm against the examples? If not, it could be that you are trying to optimize against the wrong solution. 2) If you can try a different synthesizer (Synplicity), it may have superior logic reduction results. Different synthesizer settings may help. 3) Perhaps the rol in B' =3D (L + A' + B) rol (A'+B) could be accomplished smoothly by block multipliers if hardware block multipliers are available in a Cyclone II. Using a resource that may have been unused sometimes is a good thing if it saves on other scare ones. 4) I would guess that the logic associated with rol in item 3 is large. Search the group for a barrel shifter to get some ideas.Article: 134019
On Tue, 22 Jul 2008 00:48:26 -0700 (PDT), tgau3qk4@gmail.com wrote: >As an intellectual exercise I have been playing with some >cryptographic functions. Currently, I am looking at the RC5 “key >expander” (http://people.csail.mit.edu/rivest/Rivest-rc5.pdf): > >A' = (S + A + B) rol 3 >B' = (L + A' + B) rol (A'+B) > >where “rol” denotes rotate left and all variables are 32-bit. > As far as I can tell you need 3 32 bit adders (C=A+B, S+C, L+C) and a barrel shifter controlled by the bottom 5 bits of C for the second "rol". The first rol is just moving wires around. The barrel shifter would be large but can be implemented with a multiplier if your device has them. You can probably calculate A in the first cycle with two adders and in the second cycle do an add plus rol. A third cycle might be helpful to reuse an adder.Article: 134020
On Jul 21, 11:10 pm, Newman <newman5...@yahoo.com> wrote: > On Jul 21, 12:03 pm, ppero...@gmail.com wrote: > > > > > Hi all, > > > I'm doing some performance tests with multi-threaded xilkernel > > applications and I always get erroneous times in programs where > > context switches occurs continuously. I have written a trivial > > application that shows the problem: > > > Two threads running without communication between them. Each one > > simply iterates N times and in each loop makes a yield() call. It > > ensures one context switch per loop and thread (both have the same > > priority and the policy is SCHED_PRIO). Now, each M iterations, I > > track the elapsed kernel ticks and print them on screen. Both > > xilkernel and the main application are optimized (-O2) and each > > xilkernel tick is set to 10 milliseconds. > > > thread body > > =========== > > > #define N 1000000 > > #define M 1000 > > > for (i=0; i<N; i++) { > > > yield(); > > > if (!(i % M)) { > > time = xget_clock_ticks(); > > xil_printf("Elapsed time: %d msec\n", time * 10); > > > } > > > =========== > > > Results indicate that the times printed by the application are always > > lesser than the times elapsed in reality (you can measure the real > > elapsed time, for example, with the Windows' clock or with a > > timekeeper). I mean if the program really takes about 50 seconds to > > finish, I get from application about 24 seconds. This is impossible. > > > For example, If I change yield() for sleep(1), results now corresponds > > with the real elapsed time. Moreover, if I do yield() each X > > iterations (testing progressively with a bigger value of X: 10, 100, > > 1000...1000000), the results printed on screen fit more and more to > > the real elapsed times. > > > Can anyone explain me what is happening?, Is it a xilkernel bug? > > > Many thanks for your help. > > Paco > > --------------------------------------------------------------------------------------------- > Try shortening> xil_printf("Elapsed time: %d msec\n", time * 10); > > to > xil_printf(" %d msec\n" time * 10); > and see if the elapsed time better matches the execution time. > Perhaps the print buffer or whatever is affecting your results. Newman, I tried with xil_printf("%d msec\n", time * 10); but results are the same. I tried to protect xil_printf with a mutex (xil_printf isn't thread-safe). pthread_mutex_lock(&mutex); xil_printf("%d msec\n", time * 10); pthread_mutex_unlock(&mutex); But results continue being the same. I think my problems aren't related to the print buffer or the UART. > > Moreover, if I do yield() each X > > iterations (testing progressively with a bigger value of X: 10, 100, > > 1000...1000000), the results printed on screen fit more and more to > > the real elapsed times. > > waht is X? is it M? is it N. X isn't neither M nor N. Look at the next code: thread body =========== #define N 1000000 #define M 1000 #define X 10000 /*10*/ /*100 */ /* 1000 */ for (i=0; i<N; i++) { if (!(i % X)) yield(); if (!(i % M)) { time = xget_clock_ticks(); xil_printf("Elapsed time: %d msec\n", time * 10); } } Thanks for your time. PacoArticle: 134021
On 8 Lip, 00:46, Mike Treseler <mike_trese...@comcast.net> wrote: > Zhane wrote: > > I'm noob, so trying to save some effort from recoding by using this > > code which I found somewhere ~_~ > > If you want to understand it well enough to test it, > read up on direct digital synthesis. > > > it did work when I try on the actual fpga... > > Good luck. > > -- Mike Treseler Let me give some suggestions (since I am the author of the code), for simulation purposes put somewhere baudTick <= clk; and omit the phase accumulator Good luck, oh and if you have any other questions simply write an e- mail or something since I don't usually check discussion groups.Article: 134022
>A better design would use clk here >and make baudTick a clock enable. @ Mike: I'm afraid it wouldn't work, I'm not sure if you get how the phase accumulator works, trust me it is ok the way it is.Article: 134023
On Jul 22, 6:32=A0am, wojtek <wojtekpowiertow...@gmail.com> wrote: > >A better design would use clk here > >and make baudTick a clock enable. > > @ Mike: I'm afraid it wouldn't work, I'm not sure if you get how the > phase accumulator works, trust me it is ok the way it is. I suspect Mike knows how a phase accumulator works. The rising edge of the phase accumulator can be detected without having it be a clock input. Some people cringe when they see a register output used as an input clock to other synchronous logic and will go to great lengths to avoid it because they might have to explain why this will never cause a timing issue. In general, I would not easily discount what Mike has to say IMHO.Article: 134024
On Jul 22, 12:36 am, Pablo H <pablo.hue...@gmail.com> wrote: > Hi, > > I'll try to explain what I think is happening with your program (maybe > I'm completely wrong). > > You are doing too many yields(). With the code you are showing, you > are doing one yield() operation for each iteration of the loop, it > means that each three instructions in the loop are executed per each > yield() call: (incrementing i, testing if i < N, and testing if i % M > =3D=3D 0, this will be probably between 5 to 15 assembly instrucions > depending on compiler optimizations). > Each yield call will be a system call with more than 100 assembly > instructions being executed (probably many more, just doing the > context switch will be storing 33 registers out and 33 registers in ) > All the instructions executed for the yield() call are executed with > the INTERRUPTS DISABLED because xilkernel disables interrupts when > doing system calls, and if a timer interrupt arrives while executing > the yield() the timer will be stopped until yield() ends, but time you > measure with your hand clock is still counting. > > When you do the yield() calls each 10, 100, 1000 . . . . iterations > you are getting better results because the time the system is > performing system calls is fewer, so it's less probable that a timer > interrupt arrives while the interrupts are disabled. > > I don't know if with my poor english you will understadn what I wrote. > If something is not clear just ask again :-) > > Regards, > > Pablo H Hi Pablo, I have understood you perfectly. I didn't know how Xilkernel works (I haven't seen the source code yet), but I think you have reason. Mmm... I have another example where arises the same problem, but without using yield(). I want to benchmark the thread life-cycle, that is, the time required to create, schedule and terminate one thread. Look at the next source code: #define ROUNDS 1000000 static void *thread(void *arg) { return NULL; } /* main_main is called after Xilkernel is initialised */ main_main() { t0 =3D xget_clock_ticks(); for (i=3D0; i<ROUNDS; i++) { pthread_create(&thr, NULL, thread, NULL); pthread_join(thr, NULL); } t1 =3D xget_clock_ticks(); xil_printf("Thread: t0: %d msec\n", t0 * 10); xil_printf("Thread: t1: %d msec\n", t1 * 10); } I obtain the same results. For example: Thread: t0: 10 msec Thread: t1: 10 msec When it really takes about, for example, 30 seconds to finalize. Here I don't use yield(), but a lot of context switches take place. Is too heavy the context switch in Xilkernel? Is the context switch done with interrupts disabled? Am I doing something wrong? Many thanks for your input. PD. Por cierto, actualmente trabajo con Pablo Antunez en proyectos relacionados con multicomputadores DSP y FPGA. Creo que ya has tenido un par de charlas con =E9l :)
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z