Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
=D1=81=D1=80=D0=B5=D0=B4=D0=B0, 9 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2005=C2=A0= =D0=B3., 16:33:14 UTC+2 =D0=BF=D0=BE=D0=BB=D1=8C=D0=B7=D0=BE=D0=B2=D0=B0=D1= =82=D0=B5=D0=BB=D1=8C jandc =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB: > > I wonder if there is any simple way to send the data from a block-ram t= o=20 > > the RS232-interface, without the need to write all the RS232 VHDL-code= =20 > > myself! >=20 > There you go. And can we now stop requesting RS232 stuff? ;) >=20 > Jan Hi Jan! I know this might be a little too late to ask, but would you be so kind as = to post here the libraries mentioned by backhus? I mean these: > library work; > use work.shift_registers.all; > library nicethings; > use nicethings.ASCII.all, nicethings.overloaded_std_logic_arith.all; Best regards, Sergiy.Article: 155076
On Apr 5, 1:31=A0pm, Arlet Ottens <usene...@c-scape.nl> wrote: > On 04/05/2013 09:51 AM, Mark Wills wrote: > > > > > > >> I'm pretty sure that conclusion is not correct. =A0If you have an > >> instruction that does two or three memory accesses in one instruction > >> and you replace it with three instructions that do one memory access > >> each, you end up with two extra memory accesses. =A0How is this faster= ? > > >> That is one of the reasons why I want to increase code density, in my > >> machine it automatically improves execution time as well as reducing t= he > >> amount of storage needed. > > I think you're on the right track. With FPGAs it's really quite simple > > to execute all instructions in a single cycle. It's no big deal at all > > - with MPY and DIV being exceptions. In the 'Forth CPU world' even > > literals can be loaded in a single cycle. It then comes down to > > careful selection of your instruction set. With a small enough > > instruction set one can pack more than one instruction in a word - and > > there's your code density. If you can pack more than one instruction > > in a word, you can execute them in a single clock cycle. With added > > complexity, you may even be able to execute them in parallel rather > > than as a process. > > Multiple instructions per word sounds like a bad idea. It requires > instructions that are so small that they can't do very much, so you need > more of them. And if you need 2 or more small instructions to do > whatever 1 big instruction does, it's better to use 1 big instruction > since it makes instruction decoding more efficient and simpler.- Hide quo= ted text - > > - Show quoted text - If you're referring to general purpose CPUs I'm inclined to agree. When commenting earlier, I had in mind a Forth CPU, which executes Forth words as native CPU instructions. That is, the instruction set is Forth. Since there are no need for things like addressing modes in (shall we say) classical Forth processors then you don't actually need things like bit-fields in which to encode registers and/or addressing modes*. All you're left with is the instructions themselves. And you don't need that many bits for that. A Forth chip that I am collaborating on right now two 6 bit instruction slots per 16-bit word, and a 4-bit 'special' field for other stuff. We haven't allocated all 64 instructions yet. * Even though it's not strictly necessary in a classical registerless Forth CPU, bit-fields can be useful. We're using a couple of bits to tell the ALU if a word pushes results or pops arguments, for example.Article: 155077
On 04/05/2013 04:33 PM, Mark Wills wrote: >>>> That is one of the reasons why I want to increase code density, in my >>>> machine it automatically improves execution time as well as reducing the >>>> amount of storage needed. >>> I think you're on the right track. With FPGAs it's really quite simple >>> to execute all instructions in a single cycle. It's no big deal at all >>> - with MPY and DIV being exceptions. In the 'Forth CPU world' even >>> literals can be loaded in a single cycle. It then comes down to >>> careful selection of your instruction set. With a small enough >>> instruction set one can pack more than one instruction in a word - and >>> there's your code density. If you can pack more than one instruction >>> in a word, you can execute them in a single clock cycle. With added >>> complexity, you may even be able to execute them in parallel rather >>> than as a process. >> >> Multiple instructions per word sounds like a bad idea. It requires >> instructions that are so small that they can't do very much, so you need >> more of them. And if you need 2 or more small instructions to do >> whatever 1 big instruction does, it's better to use 1 big instruction >> since it makes instruction decoding more efficient and simpler.- Hide quoted text - >> >> - Show quoted text - > > If you're referring to general purpose CPUs I'm inclined to agree. > When commenting earlier, I had in mind a Forth CPU, which executes > Forth words as native CPU instructions. That is, the instruction set > is Forth. > > Since there are no need for things like addressing modes in (shall we > say) classical Forth processors then you don't actually need things > like bit-fields in which to encode registers and/or addressing modes*. > All you're left with is the instructions themselves. And you don't > need that many bits for that. > > A Forth chip that I am collaborating on right now two 6 bit > instruction slots per 16-bit word, and a 4-bit 'special' field for > other stuff. We haven't allocated all 64 instructions yet. > > * Even though it's not strictly necessary in a classical registerless > Forth CPU, bit-fields can be useful. We're using a couple of bits to > tell the ALU if a word pushes results or pops arguments, for example. Well, I was thinking about general purpose applications. A Forth CPU may map well to a Forth program, but you also have to take into account how well the problem you want to solve maps to a Forth program. A minimal stack based CPU can be efficient if the values you need can be kept on top of the stack. But if you need 5 or 6 intermediate values, you'll need to store them in memory, resulting in expensive access to them. Even getting the address of a memory location where a value is stored can be expensive. Compare that to a register based machine with 8 registers. You need 3 more opcode bits, but you get immediate access to a pool of 8 intermediate values. And, with some clever encoding (like rickman suggested) some operations can be restricted to a subset of the registers, relaxing the number of encoding bits required. It would be interesting to see a comparison using non-trivial applications, and see how much code is required for one of those minimal stack CPUs compared to a simple register based CPU.Article: 155078
In comp.arch.fpga Arlet Ottens <usenet+5@c-scape.nl> wrote: (snip on MISC, RISC, and CISC) > Multiple instructions per word sounds like a bad idea. It requires > instructions that are so small that they can't do very much, so you need > more of them. And if you need 2 or more small instructions to do > whatever 1 big instruction does, it's better to use 1 big instruction > since it makes instruction decoding more efficient and simpler. Well, it depends on your definition of instruction. The CDC 60 bit computers have multiple instructions per 60 bit word, but then the word is big enough. Now, consider that on many processors an instruction can push or pop only one register to/from the stack. The 6809 push and pop instructions have a bit mask that allows up to eight register to be pushed or popped in one instruction. (It takes longer for more, but not as long as for separate instructions.) For x87, there are instructions that do some math function and then do, or don't, remove values from the stack. Some might count those as two or three instructions. -- glenArticle: 155079
On 4/4/2013 9:17 PM, Albert van der Horst wrote: > In article<kjkpnp$qdp$1@dont-email.me>, rickman<gnuarm@gmail.com> wrote: >> >> Albert, do you have a reference about this? > > Not in a wikipedia sense where you're not allowed to mention original > research, and are only quoting what is in the books. It is more experience > that school knowledge. > > If you want to see what can be done with a good macro processor like m4 > study the one source of the 16/32/64 bit ciforth x86 for linux/Windows/Apple. > See my site below. > > The existance of an XLAT instruction (to name an example) OTOH does virtually > nothing to make the life of an assembler programmer better. > > Groetjes Albert Ok, I'm a bit busy with a number of things to go into this area now, but I appreciate the info. I have used macro assemblers in the past and got quite good at them. I even developed a micro programmed board which used a macro assembler and added new opcodes for my design (it was a company boilerplate design adapted to a new host) to facilitate some self test functions. I sorta got yelled at because this meant you needed my assembler adaptations to assemble the code for the board. I wasn't ordered to take it out so it remained. I didn't. I left within a year and the company eventually folded. You can make a connection if you wish... lol -- RickArticle: 155080
On 4/5/2013 3:51 AM, Mark Wills wrote: > On Apr 5, 1:07 am, rickman<gnu...@gmail.com> wrote: >> On 4/4/2013 7:16 AM, Albert van der Horst wrote: >> >> >> >> >> >>> In article<kjin8q$so...@speranza.aioe.org>, >>> glen herrmannsfeldt<g...@ugcs.caltech.edu> wrote: >>>> In comp.arch.fpga Rod Pemberton<do_not_h...@notemailnotq.cpm> wrote: >>>>> "rickman"<gnu...@gmail.com> wrote in message >>>>> news:kjf48e$5qu$1@dont-email.me... >>>>>> Weren't you the person who brought CISC into this discussion? >> >>>>> Yes. >> >>>>>> Why are you asking this question about CISC? >> >>>>> You mentioned code density. AISI, code density is purely a CISC >>>>> concept. They go together and are effectively inseparable. >> >>>> They do go together, but I am not so sure that they are inseperable. >> >>>> CISC began when much coding was done in pure assembler, and anything >>>> that made that easier was useful. (One should figure out the relative >>>> costs, but at least it was in the right direction.) >> >>> But, of course, this is a fallacy. The same goal is accomplished by >>> macro's, and better. Code densitity is the only valid reason. >> >> I'm pretty sure that conclusion is not correct. If you have an >> instruction that does two or three memory accesses in one instruction >> and you replace it with three instructions that do one memory access >> each, you end up with two extra memory accesses. How is this faster? >> >> That is one of the reasons why I want to increase code density, in my >> machine it automatically improves execution time as well as reducing the >> amount of storage needed. >> >> -- >> >> Rick- Hide quoted text - >> >> - Show quoted text - > > I think you're on the right track. With FPGAs it's really quite simple > to execute all instructions in a single cycle. It's no big deal at all > - with MPY and DIV being exceptions. In the 'Forth CPU world' even > literals can be loaded in a single cycle. It then comes down to > careful selection of your instruction set. With a small enough > instruction set one can pack more than one instruction in a word - and > there's your code density. If you can pack more than one instruction > in a word, you can execute them in a single clock cycle. With added > complexity, you may even be able to execute them in parallel rather > than as a process. I have looked at the multiple instruction in parallel thing and have not made any conclusions yet. To do that you need a bigger instruction word and smaller instruction opcodes. The opcodes essentially have to become specific for the execution units. My design has three, the data stack, the return stack and the instruction fetch. It is a lot of work to consider this because there are so many tradeoffs to analyze. One issue that has always bugged me is that allocating some four or five bits for the instruction fetch instruction seems very wasteful when some 70-90% of the time the instruction is IP < IP+1. Trying to Huffman encode this is a bit tricky as what do you do with the unused bit??? I gave up looking at this until after I master the Rubik's Cube. lol It does clearly have potential, it's just a bear to unlock without adding a lot of data pathways to the design. -- RickArticle: 155081
On 4/5/2013 11:01 AM, Arlet Ottens wrote: > On 04/05/2013 04:33 PM, Mark Wills wrote: > >>>>> That is one of the reasons why I want to increase code density, in my >>>>> machine it automatically improves execution time as well as >>>>> reducing the >>>>> amount of storage needed. >>>> I think you're on the right track. With FPGAs it's really quite simple >>>> to execute all instructions in a single cycle. It's no big deal at all >>>> - with MPY and DIV being exceptions. In the 'Forth CPU world' even >>>> literals can be loaded in a single cycle. It then comes down to >>>> careful selection of your instruction set. With a small enough >>>> instruction set one can pack more than one instruction in a word - and >>>> there's your code density. If you can pack more than one instruction >>>> in a word, you can execute them in a single clock cycle. With added >>>> complexity, you may even be able to execute them in parallel rather >>>> than as a process. >>> >>> Multiple instructions per word sounds like a bad idea. It requires >>> instructions that are so small that they can't do very much, so you need >>> more of them. And if you need 2 or more small instructions to do >>> whatever 1 big instruction does, it's better to use 1 big instruction >>> since it makes instruction decoding more efficient and simpler.- Hide >>> quoted text - >>> >>> - Show quoted text - >> >> If you're referring to general purpose CPUs I'm inclined to agree. >> When commenting earlier, I had in mind a Forth CPU, which executes >> Forth words as native CPU instructions. That is, the instruction set >> is Forth. >> >> Since there are no need for things like addressing modes in (shall we >> say) classical Forth processors then you don't actually need things >> like bit-fields in which to encode registers and/or addressing modes*. >> All you're left with is the instructions themselves. And you don't >> need that many bits for that. >> >> A Forth chip that I am collaborating on right now two 6 bit >> instruction slots per 16-bit word, and a 4-bit 'special' field for >> other stuff. We haven't allocated all 64 instructions yet. >> >> * Even though it's not strictly necessary in a classical registerless >> Forth CPU, bit-fields can be useful. We're using a couple of bits to >> tell the ALU if a word pushes results or pops arguments, for example. > > Well, I was thinking about general purpose applications. A Forth CPU may > map well to a Forth program, but you also have to take into account how > well the problem you want to solve maps to a Forth program. Just to make a point, my original post wasn't really about Forth processors specifically. It was about MISC processors which may or may not be programmed in Forth. > A minimal stack based CPU can be efficient if the values you need can be > kept on top of the stack. But if you need 5 or 6 intermediate values, > you'll need to store them in memory, resulting in expensive access to > them. Even getting the address of a memory location where a value is > stored can be expensive. There aren't many algorithms that can't be dealt with reasonably using the data and return stacks. The module I was coding that made me want to try a register approach has four input variables, two double precision. These are in memory and have to be read in because this is really an interrupt routine and there is no stack input. The main process would have access to these parameters to update them. The stack routine uses up to 9 levels on the data stack currently. But that is because to optimize the execution time I was waiting until the end when I could save them off more efficiently all at once. But - in the process of analyzing the register processor I realized that I had an unused opcode that would allow me to save the parameters in the same way I am doing it in the register based code, reducing the stack usage to five items max. I understand that many stack based machines only have an 8 level data stack, period. The GA144 is what, 10 words, 8 circular and two registers? > Compare that to a register based machine with 8 registers. You need 3 > more opcode bits, but you get immediate access to a pool of 8 > intermediate values. And, with some clever encoding (like rickman > suggested) some operations can be restricted to a subset of the > registers, relaxing the number of encoding bits required. That's the whole enchilada with a register MISC, figuring out how to encode the registers in the opcodes. I think I've got a pretty good trade off currently, but I have not looked at running any Forth code on it yet. This may be much less efficient than a stack machine... duh! > It would be interesting to see a comparison using non-trivial > applications, and see how much code is required for one of those minimal > stack CPUs compared to a simple register based CPU. I have been invited to get a presentation to the SVFIG on my design. I need to work up some details and will let you know when that will be. They are talking about doing a Google+ hangout which I suppose is a video since it would be replayed for the meeting. I'm not sure I can be ready in time for the next meeting, but I'll see. I don't think my design is all that novel in the grand scheme of MISC. So I likely will do this as a comparison between the two designs. I think I also need to bone up on some of the other designs out there like the J1 and the B16. -- RickArticle: 155082
On 4/5/2013 3:51 AM, Mark Wills wrote: > > I think you're on the right track. With FPGAs it's really quite simple > to execute all instructions in a single cycle. It's no big deal at all > - with MPY and DIV being exceptions. In the 'Forth CPU world' even > literals can be loaded in a single cycle. It then comes down to > careful selection of your instruction set. With a small enough > instruction set one can pack more than one instruction in a word - and > there's your code density. If you can pack more than one instruction > in a word, you can execute them in a single clock cycle. With added > complexity, you may even be able to execute them in parallel rather > than as a process. This morning I found one exception to the one cycle rule. Block RAM. My original design was for the Altera ACEX part which is quite old now, but had async read on the block rams for memory and stack. So the main memory and stacks could be accessed for reading and writing in the same clock cycle, read/modify/write. You can't do that with today's block RAMs, they are totally synchronous. I was looking at putting the registers in a block RAM so I could get two read ports and two write ports. But this doesn't work the way an async read RAM will. So I may consider using a multiphase clock. The operations that really mess me up is the register indirect reads of memory, like stack accesses... or any memory accesses really, that is the only way to address memory, via register. So the address has to be read from register RAM, then the main memory RAM is read, then the result is written back to the register RAM. Wow! That is three clock edges I'll need. If I decide to go with using block RAM for registers it will give me N sets of regs so I have a big motivation. It also has potential for reducing the total amount of logic since a lot of the multiplexing ends up inside the RAM. The multiphase clocking won't be as complex as using multiple machine cycles for more complex instructions. But it is more complex than the good old simple clock I have worked with. It will also require some tighter timing and more complex timing constraints which are always hard to get correct. -- RickArticle: 155083
On 4/5/2013 8:31 AM, Arlet Ottens wrote: > On 04/05/2013 09:51 AM, Mark Wills wrote: > >>> I'm pretty sure that conclusion is not correct. If you have an >>> instruction that does two or three memory accesses in one instruction >>> and you replace it with three instructions that do one memory access >>> each, you end up with two extra memory accesses. How is this faster? >>> >>> That is one of the reasons why I want to increase code density, in my >>> machine it automatically improves execution time as well as reducing the >>> amount of storage needed. > >> I think you're on the right track. With FPGAs it's really quite simple >> to execute all instructions in a single cycle. It's no big deal at all >> - with MPY and DIV being exceptions. In the 'Forth CPU world' even >> literals can be loaded in a single cycle. It then comes down to >> careful selection of your instruction set. With a small enough >> instruction set one can pack more than one instruction in a word - and >> there's your code density. If you can pack more than one instruction >> in a word, you can execute them in a single clock cycle. With added >> complexity, you may even be able to execute them in parallel rather >> than as a process. > > Multiple instructions per word sounds like a bad idea. It requires > instructions that are so small that they can't do very much, so you need > more of them. And if you need 2 or more small instructions to do > whatever 1 big instruction does, it's better to use 1 big instruction > since it makes instruction decoding more efficient and simpler. I think multiple instructions per word is a good idea of you have a wide disparity in speed between instruction memory and the CPU clock rate. If not, why introduce the added complexity? Well, unless you plan to execute them simultaneously... I took a look at a 16 bit VLIW idea once and didn't care for the result. There are just too many control points so that a proper VLIW design would need more than just 16 bits I think. At least in the stack design. An 18 bit instruction word might be worth looking at in the register CPU. But getting more parallelism in the register design will require more datapaths and there goes the "minimal" part of the MISC. So many options, so little time... -- RickArticle: 155084
On 4/5/2013 10:33 AM, Mark Wills wrote: > On Apr 5, 1:31 pm, Arlet Ottens<usene...@c-scape.nl> wrote: >> On 04/05/2013 09:51 AM, Mark Wills wrote: >> >> >> >> >> >>>> I'm pretty sure that conclusion is not correct. If you have an >>>> instruction that does two or three memory accesses in one instruction >>>> and you replace it with three instructions that do one memory access >>>> each, you end up with two extra memory accesses. How is this faster? >> >>>> That is one of the reasons why I want to increase code density, in my >>>> machine it automatically improves execution time as well as reducing the >>>> amount of storage needed. >>> I think you're on the right track. With FPGAs it's really quite simple >>> to execute all instructions in a single cycle. It's no big deal at all >>> - with MPY and DIV being exceptions. In the 'Forth CPU world' even >>> literals can be loaded in a single cycle. It then comes down to >>> careful selection of your instruction set. With a small enough >>> instruction set one can pack more than one instruction in a word - and >>> there's your code density. If you can pack more than one instruction >>> in a word, you can execute them in a single clock cycle. With added >>> complexity, you may even be able to execute them in parallel rather >>> than as a process. >> >> Multiple instructions per word sounds like a bad idea. It requires >> instructions that are so small that they can't do very much, so you need >> more of them. And if you need 2 or more small instructions to do >> whatever 1 big instruction does, it's better to use 1 big instruction >> since it makes instruction decoding more efficient and simpler.- Hide quoted text - >> >> - Show quoted text - > > If you're referring to general purpose CPUs I'm inclined to agree. > When commenting earlier, I had in mind a Forth CPU, which executes > Forth words as native CPU instructions. That is, the instruction set > is Forth. > > Since there are no need for things like addressing modes in (shall we > say) classical Forth processors then you don't actually need things > like bit-fields in which to encode registers and/or addressing modes*. > All you're left with is the instructions themselves. And you don't > need that many bits for that. > > A Forth chip that I am collaborating on right now two 6 bit > instruction slots per 16-bit word, and a 4-bit 'special' field for > other stuff. We haven't allocated all 64 instructions yet. > > * Even though it's not strictly necessary in a classical registerless > Forth CPU, bit-fields can be useful. We're using a couple of bits to > tell the ALU if a word pushes results or pops arguments, for example. That can be useful. The automatic pop of operands is sometimes expensive by requiring a DUP beforehand. In my design the fetch/store words have versions to drop the address from the return stack or increment and hold on to it. In looking at the register design I realized it would be useful to just make the plus and the drop both options so now there are three versions, fetch, fetch++ and fetchK (keep). -- RickArticle: 155085
rickman wrote: > > So the main memory and stacks could be accessed for reading and writing > in the same clock cycle, read/modify/write. You can't do that with today's > block RAMs, they are totally synchronous. > I had the same problem when I first moved my XC4000 based RISC over to the newer parts with registered Block RAM. I ended up using opposite edge clocking, with a dual port BRAM, to get what appears to be single cycle access on the data and instruction ports. As this approach uses the same clock, the constraints are painless; but you now have half a clock for address -> BRAM setup, and half for the BRAM data <-> core data setup. The latter can cause some some timing issues if the core is configured with a byte lane mux so as to support 8/16/32 bit {sign extending} loads. -BrianArticle: 155086
On 4/7/2013 5:59 PM, Brian Davis wrote: > rickman wrote: >> >> So the main memory and stacks could be accessed for reading and writing >> in the same clock cycle, read/modify/write. You can't do that with today's >> block RAMs, they are totally synchronous. >> > I had the same problem when I first moved my XC4000 based RISC > over to the newer parts with registered Block RAM. > > I ended up using opposite edge clocking, with a dual port BRAM, > to get what appears to be single cycle access on the data and > instruction ports. > > As this approach uses the same clock, the constraints are painless; > but you now have half a clock for address -> BRAM setup, and half > for the BRAM data<-> core data setup. The latter can cause some > some timing issues if the core is configured with a byte lane mux > so as to support 8/16/32 bit {sign extending} loads. Yes, that was one way to solve the problem. This other I considered was to separate the read and write on the two ports. Then the read would be triggered from the address that was at the input to the address register... from the previous cycle. So the read would *always* be done and the data presented whether you used it or not. I'm not sure how much power this would waste, but the timing impact would be small. I looked at making the register block RAM part of the main memory address space. This would required a minimum of three clock cycles in a machine cycle, read address or data from register, use address to read or write data from/to memory and then write data to register. If it helps timing, the memory write can be done at the same time as the register write. I'm not crazy about this approach, but I'm considering how useful it would be to have direct address capability of the multiple register banks. Some of the comments about register vs. stacks and what I have seen of the J1 has made me think about a hybrid approach using stacks in memory, but with offset access, so items further down in the stack can be operands, not just TOS and NOS. This has potential for saving stack operations. The J1 has a two bit field controlling the stack pointer, I assume that is +1 to -2 or 1 push to 2 pop. The author claims this provides some ability to combine Forth functions into one instruction, but doesn't provide details. I guess the compiler code would have to be examined to find out what combinations would be useful. The compiler end is not my strong suit, but I suppose I could figure out how to take advantage of features like this. -- RickArticle: 155087
In article <kjtjne$nfu$1@dont-email.me>, rickman <gnuarm@gmail.com> wrote: >On 4/7/2013 5:59 PM, Brian Davis wrote: >> rickman wrote: >>> >>> So the main memory and stacks could be accessed for reading and writing >>> in the same clock cycle, read/modify/write. You can't do that with today's >>> block RAMs, they are totally synchronous. >>> >> I had the same problem when I first moved my XC4000 based RISC >> over to the newer parts with registered Block RAM. >> >> I ended up using opposite edge clocking, with a dual port BRAM, >> to get what appears to be single cycle access on the data and >> instruction ports. >> >> As this approach uses the same clock, the constraints are painless; >> but you now have half a clock for address -> BRAM setup, and half >> for the BRAM data<-> core data setup. The latter can cause some >> some timing issues if the core is configured with a byte lane mux >> so as to support 8/16/32 bit {sign extending} loads. > >Yes, that was one way to solve the problem. This other I considered was >to separate the read and write on the two ports. Then the read would be >triggered from the address that was at the input to the address >register... from the previous cycle. So the read would *always* be done >and the data presented whether you used it or not. I'm not sure how >much power this would waste, but the timing impact would be small. > >I looked at making the register block RAM part of the main memory >address space. This would required a minimum of three clock cycles in a >machine cycle, read address or data from register, use address to read >or write data from/to memory and then write data to register. If it >helps timing, the memory write can be done at the same time as the >register write. I'm not crazy about this approach, but I'm considering >how useful it would be to have direct address capability of the multiple >register banks. > >Some of the comments about register vs. stacks and what I have seen of >the J1 has made me think about a hybrid approach using stacks in memory, >but with offset access, so items further down in the stack can be >operands, not just TOS and NOS. This has potential for saving stack >operations. The J1 has a two bit field controlling the stack pointer, I >assume that is +1 to -2 or 1 push to 2 pop. The author claims this >provides some ability to combine Forth functions into one instruction, >but doesn't provide details. I guess the compiler code would have to be >examined to find out what combinations would be useful. This is the approach we took with the FIETS chip, about 1980, emulated on an Osborne CPM computer, never build. The emulation could run a Forth and it benefited from reaching 8 deep into both the return and the data stack. It still would be interesting to build using modern FPGA. > >The compiler end is not my strong suit, but I suppose I could figure out >how to take advantage of features like this. > >-- > >Rick Groetjes Albert -- Albert van der Horst, UTRECHT,THE NETHERLANDS Economic growth -- being exponential -- ultimately falters. albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horstArticle: 155088
Matt, can you elaborate on why the OP cannot do this in an FPGA, if a suitable FPGA is available & cost-effective? I completely understand that it may be highly unlikely that it can be done in a cost-effective FPGA, but you excluded that as a reason in your reply. AndyArticle: 155089
You do realize that this thread is over 8 years old, right? And frankly, anything that needs a package named "overloaded_std_logic_arith" would raise my suspicions about its worth. AndyArticle: 155090
You might consider to use 16 external receivers and 16 external transmitter= s and use the FPGA to mux the databuses. There are some Rx/Tx that support = DDR on the databuses, so this will get you 16pins per Rx/TX (12b+HD+VD+DE+C= lk) x 32 =3D 512 Pins Total. There are at least low cost Cyclone IV that ha= ve so many IOs (CE30/CE40). But I have not checked if this DDR-style Rx/Tx are also available for HDMI1= .4 and how this solution compares to this crosspoint switches. Regards, ThomasArticle: 155091
"Albert van der Horst" <albert@spenarnc.xs4all.nl> wrote in message news:515e262b$0$26895$e4fe514c@dreader37.news.xs4all.nl... > The existance of an XLAT instruction (to name an example) > OTOH does virtually nothing to make the life of an > assembler programmer better. > Why do you say that? It seems good for 256 byte (or less) lookup tables, 8-bit character translation, simple decompression algorithms, etc. You can even use it for multiple tables at once, e.g., using XCHG to swap BX. It's definately difficult for a compiler implementer to determine when to use such a CISC instruction. Rod PembertonArticle: 155092
On 08/04/13 17:58, thomas.entner99@gmail.com wrote: > You might consider to use 16 external receivers and 16 external > transmitters and use the FPGA to mux the databuses. There are some > Rx/Tx that support DDR on the databuses, so this will get you 16pins > per Rx/TX (12b+HD+VD+DE+Clk) x 32 = 512 Pins Total. There are at > least low cost Cyclone IV that have so many IOs (CE30/CE40). > > But I have not checked if this DDR-style Rx/Tx are also available for > HDMI1.4 and how this solution compares to this crosspoint switches. > > Regards, > > Thomas > Unfortunately, the numbers are bigger than that. HDMI receivers and transmitters that I have seen have SDR on the databus, but for HDMI1.4 that would be 36 lines at 340 Mbps. So for 16 channels in and 16 channels out, that would be 36*16*2 = 1152 pins, all running at 340 Mbps. That's a lot of pins - and even if we got an FPGA big enough, designing such a board and getting matched lengths on all the lines needed would be a serious effort. The crosspoint switches mentioned by another poster are one likely choice. The other realistic architecture is to use large numbers of 4-to-1 HDMI multiplexers.Article: 155093
On 2013-03-03 18:23, hamilton wrote: > I have been looking for an SDIO serial port. > > A single chip would work as well. > > SDIO <-> async serial port Txd, Rxd, CTS, RTS > > Also Linux and Win drivers. > > Are these still around ? You could do this with the right Atmel SAM3 Cortex-M3/M4 controller Best Regards Ulf SamuelssonArticle: 155094
On 4/11/2013 12:06 PM, Ulf Samuelsson wrote: > On 2013-03-03 18:23, hamilton wrote: >> I have been looking for an SDIO serial port. >> >> A single chip would work as well. >> >> SDIO <-> async serial port Txd, Rxd, CTS, RTS >> >> Also Linux and Win drivers. >> >> Are these still around ? > > You could do this with the right Atmel SAM3 Cortex-M3/M4 controller > Best Regards > Ulf Samuelsson 1> The serial port that this SDIO port need to connect to is another processor. (redesign is not an option) 2> Which SAM device has SDIO device side interface ? ( not Host side)Article: 155095
Folks, I bought a couple of these on Ebay as they were cheap but I am having fun = trying to program them. The program is normally stored in a Strata Flash me= mory on the board but the Digilent Adapt software doesn't talk to that type= of flash. I can see from an old post that Antti Lukats created some software to pro= gram the flash using the JTAG interface, but the link on xilant.com seems d= ead. Does any one know if this software is available elsewhere for download= , or is it possible some one can send me a copy? DaveArticle: 155096
On 4/12/2013 1:37 PM, dave.g4ugm@gmail.com wrote: > Folks, > > I bought a couple of these on Ebay as they were cheap but I am having fun trying to program them. The program is normally stored in a Strata Flash memory on the board but the Digilent Adapt software doesn't talk to that type of flash. > > I can see from an old post that Antti Lukats created some software to program the flash using the JTAG interface, but the link on xilant.com seems dead. Does any one know if this software is available elsewhere for download, or is it possible some one can send me a copy? > > Dave > I did some work with those those boards several years ago. I have an archive of information that includes the Digilent "S3ESP Configurator Setup.msi", which I believe I used to program them. As I recall it would only work with the Digilent parallel port programming cable (with a real, built-in LPT port, not a USB version). The archive (which includes PDF user guide and sample projects) is about 13MB, the Setup file is less than 1MB. Would you like me to send one of those to you? ChrisArticle: 155097
Hallo, hat jemand Erfahrungen mit der neuen 7-er Serie von XILINX, speziell mit dem Artix-7? Hat schon jemand Erfahrungen mit dem Demo-Board des Artix-7? Grüße BodoArticle: 155098
On Friday, April 12, 2013 11:27:56 PM UTC-7, Bodo wrote: > Hallo, > hat jemand Erfahrungen mit der neuen 7-er Serie von XILINX, speziell mit dem > Artix-7? > Hat schon jemand Erfahrungen mit dem Demo-Board des Artix-7? I have a Kintex-7 and a Zynq board with which I am working now. No Artix-7 yet but the fabric in them are not that different from the Kintex-7 fabric. Kintex-7 board is very similar to other dev boards and Zynq is a delight to work with.Article: 155099
jonesandy@comcast.net wrote: > If I had a nickel for every time I've heard throughout my career about this or that technology no longer being relevant... > > Technology is like fashion: whatever is old will be new again someday, with a new spin and a new relevance. Don't throw it away; just keep it in the back of your closet, and you will be able to use it again. And for those that missed it the first time around, the second-hand stores are always full of these still-useful articles from bygone times. > > I remember my college digital design coursework included implementing boolean logic functions with multiplexers and decoders. Then PALs came along and changed that to sum-of-products. Then FPGAs came along and changed it back (pre-HDL). Then HDL came along and changed it again. > > The Cordic algorithms were not new when FPGAs came along. They were dusted off from the ancient spells of the priests of the order of multiplierless microprocessors and "pieces of eight". And those priests were probably taught their craft by the wizards of relays and vacuum tubes. Yes indeed; CORDIC was old when I used it in 1976 on 6800s. The earliest papers I have date from 1962: J E Meggit, Pseudo division and pseudo multiplication processes, IBM Journal April 1962 1959: Jack E Volder, The CORDIC trigonometric computing technique, IRE Trans Electron Comput ec-8:330-334 Neither reference anything from the time when "computer" was a job title.
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z