Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
OK. I know this is borderline for this group, but seems the best place I can think of for asking right now. I am very interested in utilizing the Automotive Graphics Controller ref design that Altera has posted on their website with a NIOS II core running uCLinux. The drivers that were posted seemed to be 'raw' NIOS code. Are there any Linux Framebuffer drivers available for this device? Are there any plans or on-going projects that are porting drivers over for this device? Thanks for any insight. KeithArticle: 106901
G=F6ran Bilski wrote: > Frank Buss wrote: > > G=F6ran Bilski wrote: > > > > > >>If the interesting part is to create this solution without any time > >>limits than you should create most from scratch. > > > > > > Yes, this is what I'm planning. > > > > I have another idea for a CPU, very RISC like. The bits of an instructi= ons > > are something like micro-instructions: > > > > There are two internal 16 bit registers, r1 and r2, on which the core c= an > > perform operations and 6 "normal" 16 bit registers. The first 2 bits of= an > > instructions defines the meaning of the rest: > > > > 2 bits: operation: > > 00 load internal register 1 > > 01 load internal register 2 > > 10 execute operation > > 11 store internal register 1 > > > > I think it is a good idea to use 8 bits for one instruction instead of > > using non-byte-aligned instructions, so we have 6 bits for the operatio= n=2E > > Some useful operations: > > > > 6 bits: execute operation: > > r1 =3D r1 and r2 > > r1 =3D r1 or r2 > > r1 =3D r1 xor r2 > > cmp(r1, r2) > > r1 =3D r1 + r2 > > r1 =3D r1 - r2 > > pc =3D r1 > > pc =3D r1, if c=3D0 > > pc =3D r1, if c=3D1 > > pc =3D r1, if z=3D0 > > pc =3D r1, if z=3D1 > > > > For the load and store micro instructions, we have 6 bits for encoding = the > > place on which the load and store acts: > > > > 6 bits place: > > 1 bit: transfer width (0=3D8, 1=3D16 bits) > > 2 bits source/destination: > > 00: register: > > 3 bits: register index > > 01: immediate: > > 1 bit: width of immediate value (0=3D8, 1=3D16 bits) > > next 1 or 2 bytes: immediate number (8/16 bits) > > 10: memory address in register > > 3 bits: register index > > 11: address > > 1 bit: width of address (0=3D8, 1=3D16 bits) > > next 1 or 2 bytes: address (8/16 bits) > > > > The transfer width and the value need not to be the same. E.g. 1010xx > > means, that the next byte is loaded into the internal register and the > > upper 8 bits are set to 0. > > > > But for this reduced instruction set a compiler would be a good idea. Or > > different layers of assembler. I'll try to translate my first CPU desig= n, > > which needed 40 bytes: > > > > ; swap 6 byte source and destination MACs > > .base =3D 0x1000 > > p1: .dw 0 > > p2: .dw 0 > > tmp: .db 0 > > move #5, p1 > > move #11, p2 > > loop: move.b (p1), tmp > > move.b (p2), (p1) > > move.b tmp, (p2) > > sub.b p2, #1 > > sub.b p1, #1 > > bcc.b loop > > > > With my new instruction set it could be written like this (the normal > > registers 0 and 1 are constant 0 and 1) : > > > > load r1 immediate with 5 > > store r1 to register 2 > > load r1 immediate with 11 > > store r1 to register 3 > > loop: load r1 from memory address in register 2 > > load r2 from memory address in register 3 > > store r1 to memory address in register 3 > > store r2 to memory address in register 2 > > load r1 from register 3 > > load r2 from register 1 > > operation r1 =3D r1 - r2 > > store r1 in register 3 > > load r1 in register 2 > > operation r1 =3D r1 - r2 > > store r1 in register 2 > > operation pc =3D loop if c=3D0 > > > > This is 20 bytes long. As you can see, there are micro optimizations > > possible, like for the last two register decrements, where the subtrahe= nd > > needs to be loaded only once. > > > > I think this instruction set could be implemented with very few gates, > > compared to other instruction sets, and the memory usage is low, too. > > Another advantage: 64 different instructions are possible and orthogonal > > higher levels are easy to implement with it, because the load and store > > operations work on all possible places. Speed would be not the fastest,= but > > this is no problem for my application. > > > > The only problem is that you need a C compiler or something like this, > > because writing assembler with this reduced instruction set looks like = it > > will be no fun. > > > > Instead of 16 bits, 32 bits and more is easy to implement with generic > > parameters for this core. > > > > Things to keep in mind is to handle larger arithmetic than 16 bits. > That will usually introduce some kind of carry bits (stored where?). > You seems to have a c,z bits somewhere but you will need two versions of > each instruction, one which uses the carry and one which doesn't > > Running more than just simple programs in real-time applications > requires interrupt support which messes things up considerable in the > control part of the processor. > > Do you consider using only absolute branching or also doing relative > branching? > > If you really are wanting to have a processor which is code efficient, > you might want to look at a stack machine. > If I was to create a tiny tiny processor with little area and code > efficient I would do a stack machine. > But they are much nastier to program but they can be implemented very > efficiently. > > > G=F6ran Since you have control of the microcode, you can implement 16-bit math in an 8-bitter by chaining other states. The v8 uRISC/Arclite has a 16-bit increment, which is implemented as {Rn+1,Rn}++. It takes two clock cycles to execute because it issues two commands to the ALU. Yes, you do have to keep a carry flag, but you would keep one anyway. BTW - I just finished the interrupt controller for my processor core, and it wasn't that difficult. (once I got past the priority part). In my case, I wait for the next instruction decode, and then enter the interrupt states. Once it starts an interrupt, it's a simple matter of storing off the flag register and current PC + 1, and then doing a JSR to the location indicated in the service vector. I use a req/ack scheme to let the microcode FSM indicate that it has entered an ISR. Of course, my CPU doesn't have any cache, and a simple two-stage pipeline - so that might have something to do with the simplicity of it.Article: 106902
G=F6ran Bilski wrote: > Frank Buss wrote: > > G=F6ran Bilski wrote: > > > > > >>You seems to have a c,z bits somewhere but you will need two versions of > >>each instruction, one which uses the carry and one which doesn't > > > > > > Yes, I have carry and zero flag. To make the implementation of the core > > easier, I think I'll use one bit of the instruction set to determine if= the > > flags are updated or not. > > > > > >>Running more than just simple programs in real-time applications > >>requires interrupt support which messes things up considerable in the > >>control part of the processor. > > > > > > Why? I think I can implement a "call" instruction like in 68000: > > > > r2=3Dpc > > pc=3Dr1 > > > > In the sub routine I can save r2, if I need more call stack. > > > > Interrupts could be implemented by saving the PC register in a special > > register and restoring it by calling a special return instruction. > > > > > So will you have instructions that saves the C,Z values? > Imagine doing a cmp instruction and after that you take an interrupt, > the interrupt handler will also use these flags so when you return the > interrupted program will use the wrong values. > > > >>Do you consider using only absolute branching or also doing relative > >>branching? > > > > > > 64 instructions are possible, so relative branching is a good idea and = I'll > > use the same concept with one bit for deciding, if it is absolute or > > relative. > > > > > >>If you really are wanting to have a processor which is code efficient, > >>you might want to look at a stack machine. > >>If I was to create a tiny tiny processor with little area and code > >>efficient I would do a stack machine. > >>But they are much nastier to program but they can be implemented very > >>efficiently. > > > > > > I've implemented a simple Forth implementation for Java and it's just > > different, not more difficult to program in Forth: > > > > http://www.frank-buss.de/forth/ > > > > The MARC4 from Atmel uses qForth: > > > > http://www.atmel.com/journal/documents/issue5/pg46_48_Atmel_5_CodePatch= _A.pdf > > > > Maybe you are right and the core and programs are smaller with Forth, I= 'll > > think about it. Really useful is that it is simple to write an interact= ive > > read-eval-print loop in Forth (like in Lisp), so that you can program a= nd > > debug a system over RS232. > > Simpler solution - have the microcode FSM push the flags to the stack. It's a simple alteration, and saves a lot of heartache. I have contemplated even pushing the entire context to the stack, since I can burst write from the FSM a lot faster than I can with individual PSH/POP instructions, but I figure that would be overkill.Article: 106903
Hi everybody; I am new in VDHL and crypto also. I would like to implement the Davies-meyer HASH function ( Hi = Emi(Hi-1)+Hi-1 ) in VHDL. The problem I am having is that: The block cipher I am having (Kasumi) have 64 bits input and output and the HASH function(SHA1) is having 160 bits output. I don't know how can I manage an agrement between them in order to implement the Davies-meyer. Can anyone help me in getting an arrangement of those functions or indicate where I can find literatures or implementations about this. Thanks all and nice day. Adam.Article: 106904
Thank you all for your prompt replies. @ Hans - Thank you - license command works - but MAKE always fail.. gives weird errors. @ Joseph - You are right - I am using Modelsim GUI for both compilation & simulation. Is it faster - compilation from command prompt?? Modelsim GUI uses a script file which consists of compilation order. My project is having both Verilog & VHDL files - that is Mixed.. I am uisng Modelsim SE PLUS 6.0C - Linux OS. Regards, Krishna Janumanchi > > Hi there, > > I guess you are trying to compile your design within > the Modelsim GUI (vsim), and then run the simulation. > Maybe it is better to separate it into two stages? > > When you compile the verilog design, run vlog with -incr > option ("incremental"). > > When running vsim, if using VHDL, you can use -lic_vhdl, > and for verilog you can use -lic_vlog. > > It might be useful to tell us > - which OS you are using? > - VHDL / verilog/ mix language > - how do you compiling the design? Within Modelsim GUI > or inside C-shell / Windows CMD? > > In addition, is a lot of the design files you are compiling > are the xilinx / Altera libraries? If yes, you maybe able > to use library feature instead of compiling everything in > work. And the library will only need to be compiled once. > > JosephArticle: 106905
G=F6ran Bilski wrote: > Frank Buss wrote: > > G=F6ran Bilski wrote: > > > > > >>If the interesting part is to create this solution without any time > >>limits than you should create most from scratch. > > > > > > Yes, this is what I'm planning. > > > > I have another idea for a CPU, very RISC like. The bits of an instructi= ons > > are something like micro-instructions: > > > > There are two internal 16 bit registers, r1 and r2, on which the core c= an > > perform operations and 6 "normal" 16 bit registers. The first 2 bits of= an > > instructions defines the meaning of the rest: > > > > 2 bits: operation: > > 00 load internal register 1 > > 01 load internal register 2 > > 10 execute operation > > 11 store internal register 1 > > > > I think it is a good idea to use 8 bits for one instruction instead of > > using non-byte-aligned instructions, so we have 6 bits for the operatio= n=2E > > Some useful operations: > > > > 6 bits: execute operation: > > r1 =3D r1 and r2 > > r1 =3D r1 or r2 > > r1 =3D r1 xor r2 > > cmp(r1, r2) > > r1 =3D r1 + r2 > > r1 =3D r1 - r2 > > pc =3D r1 > > pc =3D r1, if c=3D0 > > pc =3D r1, if c=3D1 > > pc =3D r1, if z=3D0 > > pc =3D r1, if z=3D1 > > > > For the load and store micro instructions, we have 6 bits for encoding = the > > place on which the load and store acts: > > > > 6 bits place: > > 1 bit: transfer width (0=3D8, 1=3D16 bits) > > 2 bits source/destination: > > 00: register: > > 3 bits: register index > > 01: immediate: > > 1 bit: width of immediate value (0=3D8, 1=3D16 bits) > > next 1 or 2 bytes: immediate number (8/16 bits) > > 10: memory address in register > > 3 bits: register index > > 11: address > > 1 bit: width of address (0=3D8, 1=3D16 bits) > > next 1 or 2 bytes: address (8/16 bits) > > > > The transfer width and the value need not to be the same. E.g. 1010xx > > means, that the next byte is loaded into the internal register and the > > upper 8 bits are set to 0. > > > > But for this reduced instruction set a compiler would be a good idea. Or > > different layers of assembler. I'll try to translate my first CPU desig= n, > > which needed 40 bytes: > > > > ; swap 6 byte source and destination MACs > > .base =3D 0x1000 > > p1: .dw 0 > > p2: .dw 0 > > tmp: .db 0 > > move #5, p1 > > move #11, p2 > > loop: move.b (p1), tmp > > move.b (p2), (p1) > > move.b tmp, (p2) > > sub.b p2, #1 > > sub.b p1, #1 > > bcc.b loop > > > > With my new instruction set it could be written like this (the normal > > registers 0 and 1 are constant 0 and 1) : > > > > load r1 immediate with 5 > > store r1 to register 2 > > load r1 immediate with 11 > > store r1 to register 3 > > loop: load r1 from memory address in register 2 > > load r2 from memory address in register 3 > > store r1 to memory address in register 3 > > store r2 to memory address in register 2 > > load r1 from register 3 > > load r2 from register 1 > > operation r1 =3D r1 - r2 > > store r1 in register 3 > > load r1 in register 2 > > operation r1 =3D r1 - r2 > > store r1 in register 2 > > operation pc =3D loop if c=3D0 > > > > This is 20 bytes long. As you can see, there are micro optimizations > > possible, like for the last two register decrements, where the subtrahe= nd > > needs to be loaded only once. > > > > I think this instruction set could be implemented with very few gates, > > compared to other instruction sets, and the memory usage is low, too. > > Another advantage: 64 different instructions are possible and orthogonal > > higher levels are easy to implement with it, because the load and store > > operations work on all possible places. Speed would be not the fastest,= but > > this is no problem for my application. > > > > The only problem is that you need a C compiler or something like this, > > because writing assembler with this reduced instruction set looks like = it > > will be no fun. > > > > Instead of 16 bits, 32 bits and more is easy to implement with generic > > parameters for this core. > > > > Things to keep in mind is to handle larger arithmetic than 16 bits. > That will usually introduce some kind of carry bits (stored where?). > You seems to have a c,z bits somewhere but you will need two versions of > each instruction, one which uses the carry and one which doesn't or you will have to clear the carry when you want to add without carry. > Running more than just simple programs in real-time applications > requires interrupt support which messes things up considerable in the > control part of the processor. a register swap for interrupt processing is the easiest. > Do you consider using only absolute branching or also doing relative > branching? either would work, but relative has code size advantage, and absolute has execution advantage. > If you really are wanting to have a processor which is code efficient, > you might want to look at a stack machine. > If I was to create a tiny tiny processor with little area and code > efficient I would do a stack machine. > But they are much nastier to program but they can be implemented very > efficiently. > search for MSL16 as a compact example of stack machine, i would use slightly different ops, and things if i did it. 2/ ??? i'd have full bit reversal get rid of the subtract. umm?? cheers jackoArticle: 106906
Modelsim report is: # Reading C:/Modeltech_6.1b/tcl/vsim/pref.tcl # // ModelSim SE 6.1b Sep 8 2005 # // # // Copyright Mentor Graphics Corporation 2005 # // All Rights Reserved. # // # // THIS WORK CONTAINS TRADE SECRET AND # // PROPRIETARY INFORMATION WHICH IS THE PROPERTY # // OF MENTOR GRAPHICS CORPORATION OR ITS LICENSORS # // AND IS SUBJECT TO LICENSE TERMS. # // # do {test_clock.fdo} # ** Warning: (vlib-34) Library already exists at "work". # Model Technology ModelSim SE vlog 6.1b Compiler 2005.09 Sep 8 2005 # -- Compiling module stm4ser # # Top level modules: # stm4ser # Model Technology ModelSim SE vlog 6.1b Compiler 2005.09 Sep 8 2005 # -- Compiling module dcm1 # # Top level modules: # dcm1 # Model Technology ModelSim SE vlog 6.1b Compiler 2005.09 Sep 8 2005 # -- Compiling module clock # # Top level modules: # clock # Model Technology ModelSim SE vlog 6.1b Compiler 2005.09 Sep 8 2005 # -- Compiling module test_clock # # Top level modules: # test_clock # Model Technology ModelSim SE vlog 6.1b Compiler 2005.09 Sep 8 2005 # -- Compiling module glbl # # Top level modules: # glbl # vsim -L xilinxcorelib_ver -L unisims_ver -lib work -t 1ps test_clock glbl # Loading work.test_clock # Loading work.clock # Loading work.dcm1 # Loading C:\Xilinx\verilog\mti_se\unisims_ver.BUFG # Loading C:\Xilinx\verilog\mti_se\unisims_ver.IBUFG # Loading C:\Xilinx\verilog\mti_se\unisims_ver.DCM # Loading C:\Xilinx\verilog\mti_se\unisims_ver.dcm_clock_divide_by_2 # Loading C:\Xilinx\verilog\mti_se\unisims_ver.dcm_maximum_period_check # Loading C:\Xilinx\verilog\mti_se\unisims_ver.dcm_clock_lost # Loading work.stm4ser # Loading C:\Xilinx\verilog\mti_se\unisims_ver.GT_CUSTOM # Loading C:\Xilinx\verilog\mti_se\unisims_ver.GT # Loading C:\Xilinx\verilog\mti_se\unisims_ver.GT_SWIFT # Loading C:\Xilinx\verilog\mti_se\unisims_ver.GT_SWIFT_BIT # Loading work.glbl # ** Warning: (vsim-PLI-3003) C:/Xilinx/verilog/mti_se/unisims_ver/unisims_ver_SmartWrapper_source.v(18339): [TOFD] - System task or function '$lm_model' is not defined. # Region: /test_clock/UUT/module1/GT_CUSTOM_INST/gt_1/gt_swift_1/I1 # .main_pane.mdi.interior.cs.vm.paneset.cli_0.wf.clip.cs.pw.wf # .main_pane.workspace # .main_pane.signals.interior.cs # No errors or warnings. # Break at test_clock.tfw line 82 # Simulation Breakpoint: Break at test_clock.tfw line 82 # MACRO ./test_clock.fdo PAUSED at line 17 In this report I'm not andestend warning. All off signals from RocketIO module is x-state. But all of oter modules simulate succes. :) And sorry my very bad englishArticle: 106907
krishna.janumanchi@gmail.com wrote: > Thank you all for your prompt replies. > > @ Hans - Thank you - license command works - but MAKE always fail.. > gives weird errors. > > @ Joseph - > You are right - I am using Modelsim GUI for both compilation & > simulation. > Is it faster - compilation from command prompt?? > Modelsim GUI uses a script file which consists of compilation order. > My project is having both Verilog & VHDL files - that is Mixed.. > I am uisng Modelsim SE PLUS 6.0C - Linux OS. > > Regards, > Krishna Janumanchi > > Hi Krishna, It won't be much faster, but by doing that you can avoid your license problem. Assumed your top level is "testbench". When you run (in Linux shell) $> vsim -gui testbench By default Modelsim will wait if the required license is not available, and queue for a license. When the license is becoming available, it will then start to load the design. Another advantage of separating the compile stage is that you can easily redirect stdout messages to a log file. And examine it if things goes wrong. While inside the GUI, the console display inside the GUI wondows will have limited number of text lines. Older error or warning messages could be lost. JosephArticle: 106908
I have been trying to use an internal clock net in my offset constraint, sourced from a DCM instance. This appears to be legal, according to the CGD: "OFFSET is used only for padrelated signals, and cannot be used to extend the arrival time specification method to the internal signals in a design. A clock that comes from an internal signal is one generated from a synch element, like a FF. A clock that comes from a PAD and goes through a DLL, DCM, clock buffer, or combinatorial logic is supported." But, this doesn't work for me. If I use the original clock pad in the constraint it works. However, these OFFSETS are driven by a DCM FX output, so the timing is different from the original. Will ISE infer this timing difference? It appears to be unclear to me... Thanks, -BrandonArticle: 106909
bart wrote: > density. the Xilinx XC3S1000 has more LUTs (4-input Look Up Tables) > than the Altera EP1C6. ... while the Cyclone II also has 4 inputs per LUT AFAIRArticle: 106910
Documentation ist the point! For certain applications, a detailed docu has to be deliverered too, no matter how fast text based coding had been or might have been. I once had a project with several hundred states, hard to keep an overview when just dealing with text based signal handling, and almost impossible to work with 2-3 persons at it the same time. Mike Treseler wrote: > The quartus hdl/state machine viewer > works the other way around. Well, I know about this function but did not find it convenient. States are placed in linear order and obviously it is not possible to rearange this or use this diagram for further input.Article: 106911
(this was my post, I just changed user name) No Idea? It does not necessarily need to be a video app - just high data rete with RAM and DSP port.Article: 106912
Have you checked AR 22214? http://www.xilinx.com/xlnx/xil_ans_display.jsp?BV_UseBVCookie=yes&getPagePath=22214 HTH, Jim http://home.comcast.net/~jimwu88/tools/ axalay@gmail.com wrote: > Modelsim report is: > > # Reading C:/Modeltech_6.1b/tcl/vsim/pref.tcl > # // ModelSim SE 6.1b Sep 8 2005 > # // > # // Copyright Mentor Graphics Corporation 2005 > # // All Rights Reserved. > # // > # // THIS WORK CONTAINS TRADE SECRET AND > # // PROPRIETARY INFORMATION WHICH IS THE PROPERTY > # // OF MENTOR GRAPHICS CORPORATION OR ITS LICENSORS > # // AND IS SUBJECT TO LICENSE TERMS. > # // > # do {test_clock.fdo} > # ** Warning: (vlib-34) Library already exists at "work". > # Model Technology ModelSim SE vlog 6.1b Compiler 2005.09 Sep 8 2005 > # -- Compiling module stm4ser > # > # Top level modules: > # stm4ser > # Model Technology ModelSim SE vlog 6.1b Compiler 2005.09 Sep 8 2005 > # -- Compiling module dcm1 > # > # Top level modules: > # dcm1 > # Model Technology ModelSim SE vlog 6.1b Compiler 2005.09 Sep 8 2005 > # -- Compiling module clock > # > # Top level modules: > # clock > # Model Technology ModelSim SE vlog 6.1b Compiler 2005.09 Sep 8 2005 > # -- Compiling module test_clock > # > # Top level modules: > # test_clock > # Model Technology ModelSim SE vlog 6.1b Compiler 2005.09 Sep 8 2005 > # -- Compiling module glbl > # > # Top level modules: > # glbl > # vsim -L xilinxcorelib_ver -L unisims_ver -lib work -t 1ps test_clock > glbl > # Loading work.test_clock > # Loading work.clock > # Loading work.dcm1 > # Loading C:\Xilinx\verilog\mti_se\unisims_ver.BUFG > # Loading C:\Xilinx\verilog\mti_se\unisims_ver.IBUFG > # Loading C:\Xilinx\verilog\mti_se\unisims_ver.DCM > # Loading C:\Xilinx\verilog\mti_se\unisims_ver.dcm_clock_divide_by_2 > # Loading C:\Xilinx\verilog\mti_se\unisims_ver.dcm_maximum_period_check > # Loading C:\Xilinx\verilog\mti_se\unisims_ver.dcm_clock_lost > # Loading work.stm4ser > # Loading C:\Xilinx\verilog\mti_se\unisims_ver.GT_CUSTOM > # Loading C:\Xilinx\verilog\mti_se\unisims_ver.GT > # Loading C:\Xilinx\verilog\mti_se\unisims_ver.GT_SWIFT > # Loading C:\Xilinx\verilog\mti_se\unisims_ver.GT_SWIFT_BIT > # Loading work.glbl > # ** Warning: (vsim-PLI-3003) > C:/Xilinx/verilog/mti_se/unisims_ver/unisims_ver_SmartWrapper_source.v(18339): > [TOFD] - System task or function '$lm_model' is not defined. > # Region: > /test_clock/UUT/module1/GT_CUSTOM_INST/gt_1/gt_swift_1/I1 > # .main_pane.mdi.interior.cs.vm.paneset.cli_0.wf.clip.cs.pw.wf > # .main_pane.workspace > # .main_pane.signals.interior.cs > # No errors or warnings. > # Break at test_clock.tfw line 82 > # Simulation Breakpoint: Break at test_clock.tfw line 82 > # MACRO ./test_clock.fdo PAUSED at line 17 > > In this report I'm not andestend warning. All off signals from RocketIO > module is x-state. But all of oter modules simulate succes. > > :) And sorry my very bad englishArticle: 106913
Hi all, I have implanted a complete Reed Solomon decoder. I used the Berlekamp-Massey algorithm but I don't know how to detect a failure e.g. e>t e : error and t is the capacity of correction I have t+1 values of the error locator polynomial (elp) so his degree is t But if I have t+1 errors, the degree of the elp is t+1 so t+2 values but I have only t+1 blocks to compute the elp !!! How to do to detect failure ? ThanksArticle: 106914
Jim Granville wrote: > Frank Buss wrote: > <snip> > >> The only problem is that you need a C compiler or something like this, >> because writing assembler with this reduced instruction set looks like it >> will be no fun. > > > Since this is a very specifica application, do you have a handle on the > code size yet ? > > Another angle to this, would be to choose the smallest CPU for which a > C compiler exists. > > Here, Freescale's new RS08 could be a reasonable candidate ? > > Or chose another more complex core and then scan the compiled output, > to check the Opcode usages, and subset that. > > -jg > > Quite a while back I designed a small microcontroller for a Xilinx XC4000E series part that used approximately 80 LUTs and ran at IIRC, 105 MHz, I think it was in a 4020XL. It was a simple risc machine that was sort of a cross between a PIC microcontroller and an RCA1802. It had a register file with 16 registers like the 1802, and had a small instruction set similar to a PIC. If I recall correctly, it was a harvard architecture. The ISA was specifically designed for the FPGA architecture. Anyway the difficult part about it was that it had no programming tools to support it. We did write a crude assembler for it, but that was about as far as we took it. The point is, the hardware and ISA design is only part of the job. The tools development is as big a piece as the processor design itself.Article: 106915
I am speeding up a design for data processing, where many simple steps are done causing much overhead. Therefore, I try to increase the system speed, by eg. inserting some FFs for critical paths but found fitting problems with the multipliers. My solution, was to parallelize some large (40x40) multiplications and used multi cycle contraints (two clocks) to make it run. Quartus says, it is fine. After using the constraints, I obtain speeds above 150MHz. Problem: I cannot check this in Modelsim, because the result of the multiplications show up immediately after on clock, which is not the case in reality.Article: 106916
I would like to have a control application running out of a LMB BRAM, that is mapped to the low-order addresses, say 0x0000 - 0xFFFF. I'm wondering if there is any way to have the microblaze write to the higher-order addresses (0x10000-0x1FFFF) in the instruction store and then begin executing these instructions? I realize that on the LMB instruction interface the Byte_Enable, Data_Write, and Write_Strobe signals are not used, I'm basically wondering if there is a good way to utilize/attach logic to these signals.Article: 106917
jacko wrote: > in quartus II loading a .hex file into 16 bit rom, would it be little > endian, or is only low byte filled?? I can only speak for MaxPlus+ but there reading Intel hex is non-standard. Intel hex has always addresses pointing to 8 bit locations, while the destination width in MaxPlus changes. (16 bit destinations for 16 bit wide RAM) Intel hex is little endian, while Maxplus reads data in big endian order. Look at the simulation! You will see easily, what happens. I have written a converter, but I am not sure if I can release it as open source. I have to ask my boss.. RalfArticle: 106918
alterauser wrote: > I am speeding up a design for data processing, where many simple steps > are done causing much overhead. Therefore, I try to increase the system > speed, by eg. inserting some FFs for critical paths but found fitting > problems with the multipliers. > > My solution, was to parallelize some large (40x40) multiplications and > used multi cycle contraints (two clocks) to make it run. Quartus says, > it is fine. After using the constraints, I obtain speeds above 150MHz. > > Problem: I cannot check this in Modelsim, because the result of the > multiplications show up immediately after on clock, which is not the > case in reality. If you don't want to do a back-annotated, post place&route simulation, you can add delays into the source code for Modelsim. If you add a fixed delay on your multiplier that is longer than one clock cycle but short enough to meet the setup time on the second clock you can test that the behavior of your multicycle design is correct. One caveat, Quartus probably doesn't give you a minimum clock to output on the multiplier, but it's probably safe to assume that it is LESS than one clock cycle. So you really need to test that the behavior is the same whether the clock to output is less than one cycle (as you see in Modelsim with no delay) or more than one cycle but less than two cycles. HTH, GaborArticle: 106919
Brandon Jasionowski wrote: > I have been trying to use an internal clock net in my offset > constraint, sourced from a DCM instance. This appears to be legal, > according to the CGD: "OFFSET is used only for padrelated > signals, and cannot be used to extend the arrival time specification > method to the > internal signals in a design. A clock that comes from an internal > signal is one generated > from a synch element, like a FF. A clock that comes from a PAD and goes > through a DLL, > DCM, clock buffer, or combinatorial logic is supported." > > But, this doesn't work for me. If I use the original clock pad in the > constraint it works. However, these OFFSETS are driven by a DCM FX > output, so the timing is different from the original. Will ISE infer > this timing difference? It appears to be unclear to me... > > Thanks, > -Brandon Normally ISE will infer the timing difference correctly if there is a simple relationship (like 1:1) from the external pad frequency to the internal FX clock frequency. Since the OFFSET is used for PAD signals, this would make sense, since there would have to be an assumed phase relationship between the external pad signal requiring the OFFSET constraint and the internal clock. If the signals requiring the OFFSET constraint are not externally generated, it would normally make sense to use a PERIOD or FROM : TO constraint instead. HTH, GaborArticle: 106920
simpson.eric@gmail.com schrieb: > I would like to have a control application running out of a LMB BRAM, > that is mapped to the low-order addresses, say 0x0000 - 0xFFFF. I'm > wondering if there is any way to have the microblaze write to the > higher-order addresses (0x10000-0x1FFFF) in the instruction store and > then begin executing these instructions? > > I realize that on the LMB instruction interface the Byte_Enable, > Data_Write, and Write_Strobe signals are not used, I'm basically > wondering if there is a good way to utilize/attach logic to these > signals. in most cases the ILMB and DLMB are connected to A and B ports of the same BRAM blocks, those the instruction memory is also accessible for normal writes at the same addresses anttiArticle: 106921
Brad Smallridge wrote: >>Answer #20986 says it's fixed but maybe you've uncovered >>a place where it's not ... > > > Humph. You're right. > It appears it can't handle 7/2. > Is there a work around? > > Brad Smallridge > brad at > aivision > dot com > > > Brad, the work-around is to specify the period rather than the clock frequency. If there is a DCM involved make sure the period specified is divisible by the DCM input to output ratio too.Article: 106922
A couple of questions that will help answer your query: Are you using hard or soft multipliers? What are you doing that you need 40 bit factors? Thanks, Jay alterauser wrote: > I am speeding up a design for data processing, where many simple steps > are done causing much overhead. Therefore, I try to increase the system > speed, by eg. inserting some FFs for critical paths but found fitting > problems with the multipliers. > > My solution, was to parallelize some large (40x40) multiplications and > used multi cycle contraints (two clocks) to make it run. Quartus says, > it is fine. After using the constraints, I obtain speeds above 150MHz. > > Problem: I cannot check this in Modelsim, because the result of the > multiplications show up immediately after on clock, which is not the > case in reality.Article: 106923
Frank Buss wrote: > For implementing the higher level protocols for my Spartan 3E starter kit > TCP/IP stack implementation, I plan to use a CPU, because I think this > needs less gates than in pure VHDL. The instruction set could be limited, > because more instructions and less gates is good, and it doesn't need to be > fast, so I can design a very orthogonal CPU, which maybe needs even less > gates. The first draft: > > http://www.frank-buss.de/vhdl/cpu.html > > It is some kind of a 68000 clone, but much easier. What do you think of it? > Any ideas to reduce the instruction set even more, without the drawback to > need more instructions for a given task? > > -- > Frank Buss, fb@frank-buss.de > http://www.frank-buss.de, http://www.it4-systems.de I did a google for <tiny tcp stack> and saw lots of things I was looking specifically for Adam Dunkels , he gets alot of press on OSNews and other sites for his various embedded OS projects. His uIP stack claims to be the worlds smallest stack, uses 4-5KB of code space and only a few 100 bytes of ram. uIP has been ported to a wide range of systems and many commercial projects. He mentions ABB, Altera, BMW, Cisco Systems, Ericsson, GE, HP, Volvo Technology, Xilinx. The IwIP is a bigger faster version of uIP. http://www.sics.se/~adam/ Besides uIP he also has a tiny OS Contiki, a ProtoThreads package. John Jakson transputer_guyArticle: 106924
Hello Johan, I had the same map error. Only thing to do is to create a complete new project and start all over again. Regards, Jeroen "Johan Bernspång" <xjohbex@xfoix.se> schreef in bericht news:ece9jc$mok$1@mercur.foi.se... > Hi all, > > I came back from vacation yesterday, and full of ideas I started to work > where I left before the holidays. The first thing that happens is that I > can't map my design anymore. The vhdl is exactly the same as earlier, and > the only difference in the map report is the lines regarding related and > unrelated logic below: > > Design Summary > -------------- > Number of errors: 0 > Number of warnings: 130 > Logic Utilization: > Number of Slice Flip Flops: 7,351 out of 21,504 34% > Number of 4 input LUTs: 5,267 out of 21,504 24% > Logic Distribution: > Number of occupied Slices: 5,505 out of 10,752 51% > Number of Slices containing only related logic: 5,505 out of 5,505 > 100% > Number of Slices containing unrelated logic: 0 out of 5,505 > 0% > *See NOTES below for an explanation of the effects of unrelated > logic > Total Number 4 input LUTs: 6,045 out of 21,504 28% > Number used as logic: 5,267 > Number used as a route-thru: 310 > Number used as Shift registers: 468 > > Number of bonded IOBs: 132 out of 456 28% > IOB Flip Flops: 5 > IOB Master Pads: 55 > IOB Slave Pads: 55 > IOB Dual-Data Rate Flops: 26 > Number of Block RAMs: 42 out of 56 75% > Number of MULT18X18s: 40 out of 56 71% > Number of GCLKs: 5 out of 16 31% > Number of DCMs: 1 out of 8 12% > Number of BSCANs: 1 out of 1 100% > > Number of RPM macros: 26 > Total equivalent gate count for design: 3,057,923 > Additional JTAG gate count for IOBs: 6,336 > Peak Memory Usage: 266 MB > > Following the design summary is a note regarding related logic. > > In my old map report none of this related logic stuff is present. But I > can't really understand why the mapper fails due to this since none of the > logic is unrelated. > > I am pretty sure that my code hasn't changed during my vacation. Does ISE > know that there is a new version out and it wants me to upgrade? ;-) > > Has anyone experienced this before? > > Regards > Johan > > -- > ----------------------------------------------- > Johan Bernspång, xjohbex@xfoix.se > Research engineer > > Swedish Defence Research Agency - FOI > Division of Command & Control Systems > Department of Electronic Warfare Systems > > www.foi.se > > Please remove the x's in the email address if > replying to me personally. > -----------------------------------------------
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z