Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
http://mymicroprocessor.blogspot.com/2018/08/cheapest-fpga-board-rm250-similar-to.html The A-CE4E6 Intel Cyclone IV FPGA ic.Article: 160651
On Wed, 15 Aug 2018 16:54:16 -0700 (PDT) Othman Ahmad <othmana@gmail.com> wrote: > http://mymicroprocessor.blogspot.com/2018/08/cheapest-fpga-board-rm250-si= milar-to.html >=20 > The A-CE4E6 >=20 > Intel Cyclone IV FPGA ic. I like this one better, almost everything for a very low power processor on one chip [1], and open-source development tool chain available [2]. It also has 15% more LUTs, ten times more RAM, and an SPI flash for code storage. But only 60% of multipliers, and 16b wide memories. The iCE40 UP5K also has on chip 10kHz and 48MHz oscillators, and hardware support for 2 x SPI and 2 x I2C interfaces.=20 Jan Coombs --=20 [1] Gnarly Grey UPDuino v1.0 Board $9.99 delivered 5.3K LUTs, 1Mb SPRAM, 120Kb DPRAM, 8 Multipliers, 34 GPIO on 0.1=E2=80=9D headers, SPI Flash, RGB LED, 3.3V and 1.2V Regulators Gnarly Grey UPDuino v1.0 Board http://gnarlygrey.atspace.cc/development-platform.html#upduino_v1 or with std FTDI programmer interface $15.99 delivered http://gnarlygrey.atspace.cc/development-platform.html#upduino_v2 [2] Project Icestorm - see iCE40-UP5K-SG48 http://www.clifford.at/icestorm/Article: 160652
On Thursday, 16 August 2018 08:55:06 UTC+8, Jan Coombs wrote: > On Wed, 15 Aug 2018 16:54:16 -0700 (PDT) > Othman Ahmad <othmana@gmail.com> wrote: >=20 > > http://mymicroprocessor.blogspot.com/2018/08/cheapest-fpga-board-rm250-= similar-to.html > >=20 > > The A-CE4E6 > >=20 > > Intel Cyclone IV FPGA ic. >=20 > I like this one better, almost everything for a very low power > processor on one chip [1], and open-source development tool chain > available [2]. >=20 > It also has 15% more LUTs, ten times more RAM, and an SPI > flash for code storage. But only 60% of multipliers, and 16b > wide memories. >=20 > The iCE40 UP5K also has on chip 10kHz and 48MHz oscillators, and > hardware support for 2 x SPI and 2 x I2C interfaces.=20 >=20 > Jan Coombs > --=20 >=20 > [1] Gnarly Grey UPDuino v1.0 Board $9.99 delivered > 5.3K LUTs, 1Mb SPRAM, 120Kb DPRAM, 8 Multipliers, 34 GPIO on > 0.1=E2=80=9D headers, SPI Flash, RGB LED, 3.3V and 1.2V Regulators > Gnarly Grey UPDuino v1.0 Board > http://gnarlygrey.atspace.cc/development-platform.html#upduino_v1 >=20 > or with std FTDI programmer interface $15.99 delivered > http://gnarlygrey.atspace.cc/development-platform.html#upduino_v2 >=20 > [2] Project Icestorm - see iCE40-UP5K-SG48 > http://www.clifford.at/icestorm/ Thank you for introducing me to Lattice FPGA. I had been looking for source= s of Lattice Logic FPGA but cannot find any supplier. The development tools of this FPGA is still primitive compared to Intel Qua= rtus. I started with Xilink in the 1990s. 30 years ago. When I returned to the a= cademic 10 years ago, I found that Xilink does not provide its tools for fr= ee so I chose Altera. We started with simulation tools but later on managed to get funds to buy f= ull development boards for teaching. And now, are committed to developing F= PGA using Altera/Intel ics. I do not know the status of Xilink but would li= ke to reconsider if very attractive. Despite a few offerings, like the Spartan development boards at competitive= prices, Intel FPGA are still more widespread and tend to be slightly cheap= er than Xilink boards. Lattice boards are even more expensive. Your source seem cheap but transportation cost will kill us. Its tools are = still primitive but if Lattice were to provide manual routing tools, or any= body else in the ICE project were to provide manual routing tools, I may re= consider. Intel boards only allow auto-routing. Xinlink used to provide man= ual routing tools but no more. With manual routing tools, I can see exactly what devices are to be connect= ed and how they are connected. It will allow me to optimise my design bette= r. I used to do it for a Xinlink fpga for an instruction decoder demonstrat= ion. It was also satisfying to be able to see our components clearly. The pin pl= anners are too jumbled up and do not provide much information about devices= that are connected.Article: 160653
On Wed, 15 Aug 2018 19:00:03 -0700 (PDT) Othman Ahmad <othmana@gmail.com> wrote: > On Thursday, 16 August 2018 08:55:06 UTC+8, Jan Coombs wrote: > > [1] Gnarly Grey UPDuino v1.0 Board $9.99 delivered > > 5.3K LUTs, 1Mb SPRAM, 120Kb DPRAM, 8 Multipliers, 34 GPIO on > > 0.1=E2=80=9D headers, SPI Flash, RGB LED, 3.3V and 1.2V Regulators > > Gnarly Grey UPDuino v1.0 Board > > http://gnarlygrey.atspace.cc/development-platform.html#upduino_v1 > >=20 > > or with std FTDI programmer interface $15.99 delivered > > http://gnarlygrey.atspace.cc/development-platform.html#upduino_v2 > >=20 > > [2] Project Icestorm - see iCE40-UP5K-SG48 > > http://www.clifford.at/icestorm/ >=20 > Thank you for introducing me to Lattice FPGA. I had been > looking for sources of Lattice Logic FPGA but cannot find any > supplier. [snip] > Your source seem cheap but transportation cost will kill us. The boards I have bought are shipped 5000km for free, and one at a time do not attract customs charges. > Its tools are still primitive but if Lattice were to provide > manual routing tools, or anybody else in the ICE project were > to provide manual routing tools, I may reconsider.=20 The Icestorm tool chain is open source, so could be adapted for layout control, elimination of synthesis tool, and similar custom work.=20 > With manual routing tools, I can see exactly what devices are > to be connected and how they are connected. It will allow me > to optimise my design better. I used to do it for a Xinlink > fpga for an instruction decoder demonstration. >=20 > It was also satisfying to be able to see our components > clearly. The pin planners are too jumbled up and do not > provide much information about devices that are connected.=20 These chips are so small that maybe you could do this without a GUI, or add a chip layout planner to the tool chain yourself? Jan CoombsArticle: 160654
On 11/08/2018 20:21, gnuarm.deletethisbit@gmail.com wrote: > On Saturday, August 11, 2018 at 12:18:27 PM UTC-4, Michael Kellett wrote: >> On 09/08/2018 23:28, othmana@gmail.com wrote: >>> http://mymicroprocessor.blogspot.com/2018/08/fpga-simplest-processor.html >>> >>> Go to my blog for more information. >>> >> Why not make it easy for us and just give us the UK patent reference here. > > What is easy about reading patents? > > The US transmits a time code signal from Colorado which isn't always receivable on the east coast. Some 10 years ago (or so) they modified the signal to include phase modulation which should be easier to receive. The only trouble is a company (who may have worked with the US government) got a patent for a receiver of this phase modulated signal. I can't be sure of what I'm reading in the patent, so I can't try to design around it and sell a phase demodulated receiver. If they were easy to read, I would know just what had been patented and would be able to design a non-infringing receiver. > > Rick C. > Some patents are more readable than others. This is an especially chatty and helpful one. I get the feeling that UK patent style is tending more towards sensible descriptions but that may just be a reflection of the ones I've happened to look at. MKArticle: 160655
>I started with Xilink in the 1990s. 30 years ago. When I returned to the >academic 10 years ago, I found that Xilink does not provide its tools for >free so I chose Altera. You've started 30 years ago... and stayed there.Article: 160656
Hi, I think I've got a really good way to improve a commonly used & well establ= ished algorithm that is often used in FPGAs, and it all checks out. The imp= lementation completes the same tasks in 2/3rds the cycles and using 2/3rds = the resources of an standard Xilinx IP block, with comparable timing). I've verified that the output is correct over the entire range of 32-bit in= put values. I can't find anything similar designs in a Google patent search, or looking= through journal articles. Once you are familiar with the original algorith= m, and the optimization is explained it becomes pretty self-evident in retr= ospect. It just seems the right way to do things. What should I do?=20 Should I just throw the implementation on a website somewhere as a curiosit= y? Publish it in an article? Pass it to a local student to make a paper from it? (I'm not studying at al= l)=20 Attempt to patent and then commercialize it? Thanks! MikeArticle: 160657
I think the best option is to write an article -- or a patent. Simply because it's an extra opportunity to verify that your approach is correct. If there's a hidden mistake, a student might be unable to see it. Gene On 03.09.2018 13:17, Mike Field wrote: > Hi, > > I think I've got a really good way to improve a commonly used & well established algorithm that is often used in FPGAs, and it all checks out. The implementation completes the same tasks in 2/3rds the cycles and using 2/3rds the resources of an standard Xilinx IP block, with comparable timing). > > I've verified that the output is correct over the entire range of 32-bit input values. > > I can't find anything similar designs in a Google patent search, or looking through journal articles. Once you are familiar with the original algorithm, and the optimization is explained it becomes pretty self-evident in retrospect. It just seems the right way to do things. > > What should I do? > > Should I just throw the implementation on a website somewhere as a curiosity? > > Publish it in an article? > > Pass it to a local student to make a paper from it? (I'm not studying at all) > > Attempt to patent and then commercialize it? > > Thanks! > > Mike > >Article: 160658
I agree with Gene, plus you might consider publishing the IP as open source= code on a website of your own or opencores.org. --Mike On Monday, September 3, 2018 at 3:41:02 AM UTC-7, Gene Filatov wrote: > I think the best option is to write an article -- or a patent. >=20 > Simply because it's an extra opportunity to verify that your approach is= =20 > correct. >=20 > If there's a hidden mistake, a student might be unable to see it. >=20 > Gene >=20 >=20 > On 03.09.2018 13:17, Mike Field wrote: > > Hi, > > > > I think I've got a really good way to improve a commonly used & well es= tablished algorithm that is often used in FPGAs, and it all checks out. The= implementation completes the same tasks in 2/3rds the cycles and using 2/3= rds the resources of an standard Xilinx IP block, with comparable timing). > > > > I've verified that the output is correct over the entire range of 32-bi= t input values. > > > > I can't find anything similar designs in a Google patent search, or loo= king through journal articles. Once you are familiar with the original algo= rithm, and the optimization is explained it becomes pretty self-evident in = retrospect. It just seems the right way to do things. > > > > What should I do? > > > > Should I just throw the implementation on a website somewhere as a curi= osity? > > > > Publish it in an article? > > > > Pass it to a local student to make a paper from it? (I'm not studying a= t all) > > > > Attempt to patent and then commercialize it? > > > > Thanks! > > > > Mike > > > >Article: 160659
On 03/09/2018 11:17, Mike Field wrote: > Hi, > > I think I've got a really good way to improve a commonly used & well established algorithm that is often used in FPGAs, and it all checks out. The implementation completes the same tasks in 2/3rds the cycles and using 2/3rds the resources of an standard Xilinx IP block, with comparable timing). > > I've verified that the output is correct over the entire range of 32-bit input values. > > I can't find anything similar designs in a Google patent search, or looking through journal articles. Once you are familiar with the original algorithm, and the optimization is explained it becomes pretty self-evident in retrospect. It just seems the right way to do things. > > What should I do? > > Should I just throw the implementation on a website somewhere as a curiosity? > > Publish it in an article? > > Pass it to a local student to make a paper from it? (I'm not studying at all) > > Attempt to patent and then commercialize it? > > Thanks! > > Mike > > I'd publish - since you are not already in the IP licensing/patenting groove I doubt if you would make any money from it but you might gain kudos which may help you career and business. Xilinx might want to publish it - which might give a lot more visibility. If you have a web site you could put it on that. MKArticle: 160660
On Monday, September 3, 2018 at 6:17:54 AM UTC-4, Mike Field wrote: > Hi, >=20 > I think I've got a really good way to improve a commonly used & well esta= blished algorithm that is often used in FPGAs, and it all checks out. The i= mplementation completes the same tasks in 2/3rds the cycles and using 2/3rd= s the resources of an standard Xilinx IP block, with comparable timing). >=20 > I've verified that the output is correct over the entire range of 32-bit = input values. >=20 > I can't find anything similar designs in a Google patent search, or looki= ng through journal articles. Once you are familiar with the original algori= thm, and the optimization is explained it becomes pretty self-evident in re= trospect. It just seems the right way to do things. >=20 > What should I do?=20 >=20 > Should I just throw the implementation on a website somewhere as a curios= ity? >=20 > Publish it in an article? >=20 > Pass it to a local student to make a paper from it? (I'm not studying at = all)=20 >=20 > Attempt to patent and then commercialize it? >=20 > Thanks! >=20 > Mike Licensing and selling IP comes with a bit of a learning curve and requires = an investment on your part. As Michael mentions, without some of that fram= ework already in place, a license vetted by an IP attorney, and a good mark= eting plan, you might not see a return on that investment. If you want your name more prominently attached to it, I'd suggest posting = up on a personal Github account rather than opencores.org which makes you c= onform to their requirements (such as wishbone interface, etc.). Xilinx always welcomes guest articles on their blogs (although those have b= een in flux since the recent reorg), and their e-magazine Xcell Journal (ag= ain, seems to have been discontinued and the Xcell Daily Blog archived) https://forums.xilinx.com/t5/Xilinx-Xclusive-Blog/bg-p/xilinx_xclusive https://forums.xilinx.com/t5/Adaptable-Advantage-Blog/bg-p/tech_blog https://www.xilinx.com/about/xcell-publications/xcell-journal.html --KrisArticle: 160661
On Tuesday, September 4, 2018 at 7:48:41 AM UTC-7, kkoorndyk wrote: > If you want your name more prominently attached to it, I'd suggest postin= g up on a personal Github account rather than opencores.org which makes you= conform to their requirements (such as wishbone interface, etc.). >=20 OpenCores encourages use of the Wishbone interface for SoC components and t= hey do offer coding guidelines, but there are no requirements for either. F= or example in the entire DSP core section there are 38 entries, none of whi= ch are marked as Wishbone compliant.Article: 160662
On Wednesday, 5 September 2018 02:48:41 UTC+12, kkoorndyk wrote: > On Monday, September 3, 2018 at 6:17:54 AM UTC-4, Mike Field wrote: > > Hi, > >=20 > > I think I've got a really good way to improve a commonly used & well es= tablished algorithm that is often used in FPGAs, and it all checks out. The= implementation completes the same tasks in 2/3rds the cycles and using 2/3= rds the resources of an standard Xilinx IP block, with comparable timing). > >=20 > > I've verified that the output is correct over the entire range of 32-bi= t input values. > >=20 > > I can't find anything similar designs in a Google patent search, or loo= king through journal articles. Once you are familiar with the original algo= rithm, and the optimization is explained it becomes pretty self-evident in = retrospect. It just seems the right way to do things. > >=20 > > What should I do?=20 > >=20 > > Should I just throw the implementation on a website somewhere as a curi= osity? > >=20 > > Publish it in an article? > >=20 > > Pass it to a local student to make a paper from it? (I'm not studying a= t all)=20 > >=20 > > Attempt to patent and then commercialize it? > >=20 > > Thanks! > >=20 > > Mike >=20 > Licensing and selling IP comes with a bit of a learning curve and require= s an investment on your part. As Michael mentions, without some of that fr= amework already in place, a license vetted by an IP attorney, and a good ma= rketing plan, you might not see a return on that investment. >=20 > If you want your name more prominently attached to it, I'd suggest postin= g up on a personal Github account rather than opencores.org which makes you= conform to their requirements (such as wishbone interface, etc.). >=20 > Xilinx always welcomes guest articles on their blogs (although those have= been in flux since the recent reorg), and their e-magazine Xcell Journal (= again, seems to have been discontinued and the Xcell Daily Blog archived) >=20 > https://forums.xilinx.com/t5/Xilinx-Xclusive-Blog/bg-p/xilinx_xclusive > https://forums.xilinx.com/t5/Adaptable-Advantage-Blog/bg-p/tech_blog >=20 > https://www.xilinx.com/about/xcell-publications/xcell-journal.html >=20 >=20 > --Kris I never though I would agree with Rick, but.... All sounds like too much work. So here is a quick summary with C-like pseud= o-code. I'll put the HDL code up somewhere soon once I am happy with it. I = am removing the last rounding errors. I've been playing with CORDIC, and have come up with what looks to be an ov= erlooked optimization. I've done a bit of googling, and haven't found anyth= ing - maybe it is a novel approach? I've tested it with 32-bit inputs and outputs, and it is within +/-2, and a= nd average error of around 0.6.I a am sure with a bit more analysis of wher= e the errors are coming from I can get it more accurate. This has two parts to it, both by themselves seem quite trivial, but comple= ment each other quite nicely. Scaling Z --------- 1. The 'z' value in CORDIC uses becomes smaller and smaller as stages incre= ase: The core of CORDIC for SIN() and COS() is: x =3D INITIAL; y =3D INITIAL; for(i =3D 0; i < CORDIC_REPS; i++ ) { int64_t tx,ty; // divide to scale the current vector tx =3D x >> (i+1); ty =3D y >> (i+1); // Either add or subtract at right angles to the current=20 x -=3D (z > 0 ? ty : -ty); y +=3D (z > 0 ? tx : -tx); z -=3D (z > 0 ? angles[i] : -angles[i]); } The value for angle[] is all important, for example: angle[0] =3D 1238021 angle[1] =3D 654136 angle[2] =3D 332050 angle[3] =3D 166670 angle[4] =3D 83415 angle[5] =3D 41718 angle[6] =3D 20860 angle[7] =3D 10430 angle[8] =3D 5215 angle[9] =3D 2607 angle[10] =3D 1303 angle[11] =3D 652 angle[12] =3D 326 angle[13] =3D 163 angle[14] =3D 81 angle[15] =3D 41 angle[16] =3D 20 angle[17] =3D 10 angle[18] =3D 5 angle[19] =3D 3 angle[20] =3D 1 If you make the following change: for(i =3D 0; i < CORDIC_REPS; i++ ) { int64_t tx,ty; // divide to scale the current vector tx =3D x >> (i+1); ty =3D y >> (i+1); // Either add or subtract at right angles x -=3D (z > 0 ? ty : -ty); y +=3D (z > 0 ? tx : -tx); z -=3D (z > 0 ? angles[i] : -angles[i]); //!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! z <<=3D 1; // Double the result of 'z' //!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! } Then you can use all the bits in angle[], because you can scale by 2^i (thi= s is data from a different set of parameters, hence the values and count is= different): angle[0] =3D 1238021 angle[1] =3D 1308273 angle[2] =3D 1328199 angle[3] =3D 1333354 angle[4] =3D 1334654 angle[5] =3D 1334980 angle[6] =3D 1335061 angle[7] =3D 1335082 angle[8] =3D 1335087 angle[9] =3D 1335088 angle[10] =3D 1335088 angle[11] =3D 1335088 angle[12] =3D 1335088 angle[13] =3D 1335088 angle[14] =3D 1335088 angle[15] =3D 1335088 angle[16] =3D 1335088 angle[17] =3D 1335088 angle[18] =3D 1335088 angle[19] =3D 1335088 angle[20] =3D 1335088 angle[21] =3D 1335088 angle[22] =3D 1335088 angle[23] =3D 1335088 angle[24] =3D 1335088 angle[25] =3D 1335088 angle[26] =3D 1335088 angle[27] =3D 1335088 angle[28] =3D 1335088 angle[29] =3D 1335088 ...and angle[i] rapidly becomes a constant value after the first 9 or 10 it= erations. This is what you would expect, as the angle gets smaller and smal= ler. Part 2: Add a lookup table =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D If you split the input into: [2 MSB] quadrant [next 9 bits] an lookup table index [the rest] The starting CORDIC Z value, offset by 1<<(num_of_bits-1) And have a lookup table of 512 x 36-bit values (i.e. a block RAM), which ho= ld the SIN/COS values at the center of the range =3D e.g. initial[i] =3D sc= ale_factor * sin(PI/2.0/1024*(2*i+1)); Because you need both the SIN() and COS() starting point, you can get them = from the same table (screaming out "dual port memory!" to me) You can then do a standard lookup to get the starting points, 9 cycles into= the CORDIC: /* Use Dual Port memory for this */ if(quadrant & 1) { x =3D initial[index]; y =3D initial[TABLE_SIZE-1-index]; } else { x =3D initial[TABLE_SIZE-1-index]; y =3D initial[index]; } /* Subtract half the sector angle from Z */ z -=3D 1 << (CORDIC_BITS-1); /* Now do standard CORDIC, with a lot of work already done */ ... This removes ~8 cycles of latency. The end result =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D If you combine both of these you can get rid of the angles[] table complete= ly - it is now a constant. /* Use Dual Port memory for this */ if(quadrant & 1) { x =3D initial[index]; y =3D initial[TABLE_SIZE-1-index]; } else { x =3D initial[TABLE_SIZE-1-index]; y =3D initial[index]; } /* Subtract half the sector angle from Z */ z -=3D 1 << (CORDIC_BITS-1); /* Now do standard CORDIC, with a lot of work already done,=20 so less repetitions are needed for the same accuracy */ for(i =3D 0; i < CORDIC_REPS; i++ ) { int64_t tx,ty; // Add rounding and divide to scale the current vector tx =3D x >> (INDEX_BITS+i); ty =3D y >> (INDEX_BITS+i); // Either add or subtract at right angles x -=3D (z > 0 ? ty : -ty); y +=3D (z > 0 ? tx : -tx); z -=3D (z > 0 ? ANGLE_CONSTANT : -ANGLE_CONSTANT); z <<=3D 1;=20 } Advantages of this method =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D If you have fully unrolled it to generate a full value per cycle, you end u= p with: - 1 BRAM block used (bad) - 9 less CORDIC stages (good) - 8 or 9 cycles less latency (good) For 16-bit values this may only need 5 stages, rather than 14. If you are trying to minimize area, generating an n-bit value every ~n cycl= es you end up with: - 1 BRAM block used (bad) - 8 or 9 cycles less latency (good) - no need for the angles[] table (good) - Less levels of logic, for faster FMAX (good) For 16-bit values, this could double the number of calculations you can com= pute at a given clock rate. You can also tune things some what - you can always throw more BRAM blocks = at it to reduce the number of CORDIC stages/iterations required, if you hav= e blocks to spare - but one block to remove 9 stages is pretty good. What do you think?Article: 160663
On 05.09.2018 8:40, Mike Field wrote: > On Wednesday, 5 September 2018 02:48:41 UTC+12, kkoorndyk wrote: >> On Monday, September 3, 2018 at 6:17:54 AM UTC-4, Mike Field wrote: >>> >>> I think I've got a really good way to improve a commonly used & well established algorithm that is often used in FPGAs, and it all checks out. The implementation completes the same tasks in 2/3rds the cycles and using 2/3rds the resources of an standard Xilinx IP block, with comparable timing). >>> >>> I've verified that the output is correct over the entire range of 32-bit input values. >>> >>> I can't find anything similar designs in a Google patent search, or looking through journal articles. Once you are familiar with the original algorithm, and the optimization is explained it becomes pretty self-evident in retrospect. It just seems the right way to do things. >>> >>> What should I do? >>> >>> Should I just throw the implementation on a website somewhere as a curiosity? >>> >>> Publish it in an article? >>> >>> Pass it to a local student to make a paper from it? (I'm not studying at all) >>> >>> Attempt to patent and then commercialize it? >>> >>> Thanks! >>> >>> Mike >> >> Licensing and selling IP comes with a bit of a learning curve and requires an investment on your part. As Michael mentions, without some of that framework already in place, a license vetted by an IP attorney, and a good marketing plan, you might not see a return on that investment. >> >> If you want your name more prominently attached to it, I'd suggest posting up on a personal Github account rather than opencores.org which makes you conform to their requirements (such as wishbone interface, etc.). >> >> Xilinx always welcomes guest articles on their blogs (although those have been in flux since the recent reorg), and their e-magazine Xcell Journal (again, seems to have been discontinued and the Xcell Daily Blog archived) >> >> https://forums.xilinx.com/t5/Xilinx-Xclusive-Blog/bg-p/xilinx_xclusive >> https://forums.xilinx.com/t5/Adaptable-Advantage-Blog/bg-p/tech_blog >> >> https://www.xilinx.com/about/xcell-publications/xcell-journal.html >> >> >> --Kris > > I never though I would agree with Rick, but.... > > All sounds like too much work. So here is a quick summary with C-like pseudo-code. I'll put the HDL code up somewhere soon once I am happy with it. I am removing the last rounding errors. > > I've been playing with CORDIC, and have come up with what looks to be an overlooked optimization. I've done a bit of googling, and haven't found anything - maybe it is a novel approach? > > I've tested it with 32-bit inputs and outputs, and it is within +/-2, and and average error of around 0.6.I a am sure with a bit more analysis of where the errors are coming from I can get it more accurate. > > This has two parts to it, both by themselves seem quite trivial, but complement each other quite nicely. > > Scaling Z > --------- > 1. The 'z' value in CORDIC uses becomes smaller and smaller as stages increase: > > The core of CORDIC for SIN() and COS() is: > x = INITIAL; > y = INITIAL; > for(i = 0; i < CORDIC_REPS; i++ ) { > int64_t tx,ty; > // divide to scale the current vector > tx = x >> (i+1); > ty = y >> (i+1); > > // Either add or subtract at right angles to the current > x -= (z > 0 ? ty : -ty); > y += (z > 0 ? tx : -tx); > z -= (z > 0 ? angles[i] : -angles[i]); > } > > > The value for angle[] is all important, for example: > > angle[0] = 1238021 > angle[1] = 654136 > angle[2] = 332050 > angle[3] = 166670 > angle[4] = 83415 > angle[5] = 41718 > angle[6] = 20860 > angle[7] = 10430 > angle[8] = 5215 > angle[9] = 2607 > angle[10] = 1303 > angle[11] = 652 > angle[12] = 326 > angle[13] = 163 > angle[14] = 81 > angle[15] = 41 > angle[16] = 20 > angle[17] = 10 > angle[18] = 5 > angle[19] = 3 > angle[20] = 1 > > If you make the following change: > > for(i = 0; i < CORDIC_REPS; i++ ) { > int64_t tx,ty; > // divide to scale the current vector > tx = x >> (i+1); > ty = y >> (i+1); > > // Either add or subtract at right angles > x -= (z > 0 ? ty : -ty); > y += (z > 0 ? tx : -tx); > z -= (z > 0 ? angles[i] : -angles[i]); > > //!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > z <<= 1; // Double the result of 'z' > //!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > } > > Then you can use all the bits in angle[], because you can scale by 2^i (this is data from a different set of parameters, hence the values and count is different): > angle[0] = 1238021 > angle[1] = 1308273 > angle[2] = 1328199 > angle[3] = 1333354 > angle[4] = 1334654 > angle[5] = 1334980 > angle[6] = 1335061 > angle[7] = 1335082 > angle[8] = 1335087 > angle[9] = 1335088 > angle[10] = 1335088 > angle[11] = 1335088 > angle[12] = 1335088 > angle[13] = 1335088 > angle[14] = 1335088 > angle[15] = 1335088 > angle[16] = 1335088 > angle[17] = 1335088 > angle[18] = 1335088 > angle[19] = 1335088 > angle[20] = 1335088 > angle[21] = 1335088 > angle[22] = 1335088 > angle[23] = 1335088 > angle[24] = 1335088 > angle[25] = 1335088 > angle[26] = 1335088 > angle[27] = 1335088 > angle[28] = 1335088 > angle[29] = 1335088 > > ...and angle[i] rapidly becomes a constant value after the first 9 or 10 iterations. This is what you would expect, as the angle gets smaller and smaller. > > > Part 2: Add a lookup table > ========================== > If you split the input into: > > [2 MSB] quadrant > [next 9 bits] an lookup table index > [the rest] The starting CORDIC Z value, offset by 1<<(num_of_bits-1) > > And have a lookup table of 512 x 36-bit values (i.e. a block RAM), which hold the SIN/COS values at the center of the range = e.g. initial[i] = scale_factor * sin(PI/2.0/1024*(2*i+1)); > > Because you need both the SIN() and COS() starting point, you can get them from the same table (screaming out "dual port memory!" to me) > > You can then do a standard lookup to get the starting points, 9 cycles into the CORDIC: > > /* Use Dual Port memory for this */ > if(quadrant & 1) { > x = initial[index]; > y = initial[TABLE_SIZE-1-index]; > } else { > x = initial[TABLE_SIZE-1-index]; > y = initial[index]; > } > > /* Subtract half the sector angle from Z */ > z -= 1 << (CORDIC_BITS-1); > > /* Now do standard CORDIC, with a lot of work already done */ > ... > > This removes ~8 cycles of latency. > > The end result > ============== > If you combine both of these you can get rid of the angles[] table completely - it is now a constant. > > /* Use Dual Port memory for this */ > if(quadrant & 1) { > x = initial[index]; > y = initial[TABLE_SIZE-1-index]; > } else { > x = initial[TABLE_SIZE-1-index]; > y = initial[index]; > } > > /* Subtract half the sector angle from Z */ > z -= 1 << (CORDIC_BITS-1); > > /* Now do standard CORDIC, with a lot of work already done, > so less repetitions are needed for the same accuracy */ > > for(i = 0; i < CORDIC_REPS; i++ ) { > int64_t tx,ty; > // Add rounding and divide to scale the current vector > tx = x >> (INDEX_BITS+i); > ty = y >> (INDEX_BITS+i); > > // Either add or subtract at right angles > x -= (z > 0 ? ty : -ty); > y += (z > 0 ? tx : -tx); > z -= (z > 0 ? ANGLE_CONSTANT : -ANGLE_CONSTANT); > z <<= 1; > } > > Advantages of this method > ========================= > If you have fully unrolled it to generate a full value per cycle, you end up with: > - 1 BRAM block used (bad) > - 9 less CORDIC stages (good) > - 8 or 9 cycles less latency (good) > > For 16-bit values this may only need 5 stages, rather than 14. > > If you are trying to minimize area, generating an n-bit value every ~n cycles you end up with: > > - 1 BRAM block used (bad) > - 8 or 9 cycles less latency (good) > - no need for the angles[] table (good) > - Less levels of logic, for faster FMAX (good) > > For 16-bit values, this could double the number of calculations you can compute at a given clock rate. > > You can also tune things some what - you can always throw more BRAM blocks at it to reduce the number of CORDIC stages/iterations required, if you have blocks to spare - but one block to remove 9 stages is pretty good. > > What do you think? > As far as your "revised" angle[i] converging to a constant is concerned, there's a simple explanation using the first two terms of the taylor series for the arctan function: arctan(x) = x - 1/3 * x^3 + o(x^3) So that angle[i] = arctan(2^-i) / pi * 2^i = (1/pi) * ( 1 - 1/3 * 2^-(2*i)) + o(2^-(2*i)) Based on which you can easily say how many stages of the conventional cordic algorithm do you need to skip (i.e. store the outputs in a lookup table) for a given bit precision. I don't know the literature well, but I think it would be cool if you actually write an article detailing your approach! GeneArticle: 160664
Hello, I have a few packages that I have written like this: package A; -- -- endpackage package B; import A::* --- -- endpackage package C; import A::*; import B::*; endpackage In the file using package C, the error I am getting is as follows: Error (10864): SystemVerilog error at C.sv(26): TMP was imported from multiple packages with ::* - none of the imported declarations are visible. Is this problem because I am importing A::* in both package A and package C? Any help to rsolve this is greatly appreciated. Thanks in Advance~ -- Nikhil PratapArticle: 160665
Hello Folks, I am trying to interface MAX9850 Audio DAC with spartan 6 FPGA with I2C Interfacing. I'm Using VHDL Language For coding. Does Someone worked on this before? or worked related to this. Things need to know I am using only these two for interfacing.So For clocking what should i do?(can i use fpga clock for driving master clock) what audio data format to choose?Right justified or left or I2s I just need to hear audio from Max9850 audio jack. what other factor i need to take into account....Article: 160666
On 07/09/2018 11:11, Swapnil Patil wrote: > Hello Folks, > > I am trying to interface MAX9850 Audio DAC with spartan 6 FPGA with I2C Interfacing. > I'm Using VHDL Language For coding. > Does Someone worked on this before? or worked related to this. > > Things need to know I am using only these two for interfacing.So For clocking what should i do?(can i use fpga clock for driving master clock) > > what audio data format to choose?Right justified or left or I2s > > I just need to hear audio from Max9850 audio jack. > what other factor i need to take into account.... > I think that you need to read the data sheets for the chips very carefully - it is not possible to drive the data to an audio DAC via I2C. Like many others the MAX9850 has an I2C interface for configuration but uses the classic MCLK, BCLK,LRCLK and SD (Serial Data) for the audio data. If you drive from an FPGA the choice between left justified, right justified or I2S is unimportant - all can easily be achieved. It isn't that hard to implement both the I2C and audio data interfaces on an FPGA. You can use an FPGA derived clock for the master clock (assuming you are not looking for ultimate audio quality so don't care about jitter). The Maxim data sheet will tell you about relationships between master clock and data clocks and audio data. MKArticle: 160667
On Monday, September 3, 2018 at 6:17:54 AM UTC-4, Mike Field wrote: > > The implementation completes the same tasks in 2/3rds > the cycles and using 2/3rds the resources of an > standard Xilinx IP block, with comparable timing). > If perchance this is related to your recent CORDIC rotator code, I've seen a number of CORDIC optimization schemes over the years to reduce the number of rotation stages, IIRC typically either by a 'jump start' or merging/optimizing rotation stages. If I ever manage to find my folder of CORDIC papers, I'll post some links... ------- some notes on your CORDIC implementation http://hamsterworks.co.nz/mediawiki/index.php/CORDIC - instead of quadrant folding, given I & Q you can do octant folding (0-45 degrees) using the top three bits of the phase - if you pass in the bit widths and stages as generics, you can initialize the constant arctan table on-the-fly in a function using VHDL reals -BrianArticle: 160668
earlier, I wrote: > > If perchance this is related to your recent CORDIC rotator code, > I've seen a number of CORDIC optimization schemes over the years > to reduce the number of rotation stages, IIRC typically either > by a 'jump start' or merging/optimizing rotation stages. > oops, for some reason, when first reading this thread I didn't see the later posts with the explanation... I'd swear they weren't there, but maybe I was just scroll-impaired. -BrianArticle: 160669
On Wednesday, 5 September 2018 23:43:28 UTC+12, Gene Filatov wrote: > > I don't know the literature well, but I think it would be cool if you > actually write an article detailing your approach! > > Gene I'll work on writing one up over the next few days, as well as posting a sample implementation. MikeArticle: 160670
Hello folks, I am trying to get a VHDL testbench running with the VHDL I2C core model. I am using spartan 6 fpga and using a simple state machine. The problem with simulation result is that it is writing data properly but not reading it.I do not understand what is problem? here is my testbench Data in sent internally via array. ENTITY mainfiletb12 IS END mainfiletb12; ARCHITECTURE behavior OF mainfiletb12 IS -- Component Declaration for the Unit Under Test (UUT) COMPONENT Main_file PORT( CLK_10_MHz : IN std_logic; ResetN : IN std_logic; SCL1 : OUT std_logic; --serial clock input SDA1 : INOUT std_logic; ---serial data Read_WriteN : IN std_logic;---read write bit 1 for read, 0 for write AddressIn : IN std_logic_vector(7 downto 0); --Register address Ack1 : OUT std_logic; --acknoledge bit DataOut1 : OUT std_logic_vector(7 downto 0); --output data StateOute : OUT std_logic_vector(7 downto 0); -- state register RegisterAddressOut1 : OUT std_logic_vector(7 downto 0) ); END COMPONENT; --Inputs signal CLK_10_MHz : std_logic := '0'; signal ResetN : std_logic := '0'; signal Read_WriteN : std_logic := '0'; signal AddressIn : std_logic_vector(7 downto 0) := (others => '0'); --BiDirs signal SDA1 : std_logic; --Outputs signal SCL1 : std_logic; signal Ack1 : std_logic; signal DataOut1 : std_logic_vector(7 downto 0); signal StateOute : std_logic_vector(7 downto 0); signal RegisterAddressOut1 : std_logic_vector(7 downto 0); -- Clock period definitions constant CLK_10_MHz_period : time := 100 ns; BEGIN -- Instantiate the Unit Under Test (UUT) uut: Main_file PORT MAP ( CLK_10_MHz => CLK_10_MHz, ResetN => ResetN, BCLK => BCLK, LRCLK => LRCLK, SCL1 => SCL1, SDA1 => SDA1, Read_WriteN => Read_WriteN, SDIN => SDIN, AddressIn => AddressIn, Ack1 => Ack1, DataOut1 => DataOut1, Dataw => Dataw, StateOute => StateOute, StateOut3 => StateOut3, RegisterAddressOut1 => RegisterAddressOut1 ); -- Clock process definitions CLK_10_MHz_process :process begin CLK_10_MHz <= '0'; wait for CLK_10_MHz_period/2; CLK_10_MHz <= '1'; wait for CLK_10_MHz_period/2; end process; -- Stimulus process stim_proc: process begin -- hold reset state for 100 ns. wait for 100 ns; wait for CLK_10_MHz_period*10; -- insert stimulus here ----intial reset condition---- ResetN <= '0'; wait for 50 ns; ResetN <= '1'; wait for 100 ns; ---Address register --- AddressIn <= x"02"; wait; end process; END;Article: 160671
Hi, can you please explain a bit more detailed what the problem is? What signals are showing unexpected behavior and which behavior do you expect? The component main_file is currently a black box, I should also provide the source code of this component. -- Viele Grüße, Tobi https://www.elpra.deArticle: 160672
I bought HDMI extender over optical fiber for $125, from Alibaba. HDMI Extender works well when Source is my laptop, but when source is my FPGA board,there is a problem. I enabled TMDS, HPD, DDC, 5+, ground as in Hamsterwork's project , it doesnot works,I dont know why? Who had the same situation ? What to do ? Alibaba seller know nothing about this, but they always tell me they manufacturer, and cannot help me, i think just reseller sit there, and there no technical support.Article: 160673
On Saturday, September 22, 2018 at 11:09:02 AM UTC-4, abi...@gmail.com wrot= e: > I bought HDMI extender over optical fiber for $125, from Alibaba. >=20 > HDMI Extender works well when Source is my laptop, but when source is my = FPGA board,there is a problem. > I enabled TMDS, HPD, DDC, 5+, ground as in Hamsterwork's project , it d= oesnot works,I dont know why? Who had the same situation ? What to do ? >=20 > Alibaba seller know nothing about this, but they always tell me they manu= facturer, and cannot help me, i think just reseller sit there, and there no= technical support. Yeah, you got that right! I've written to Ali and eBay vendors for more in= fo on a product and the response is nearly always that the info is in the l= isting... even when it is not. The vendors are probably like what we in th= e US call "fulfillment centers". All they do is post the listings and ship= the products. =20 A couple of times I've returned items explaining that they don't work right= in some way and the vendor has asked for details to feedback to the factor= y, but I think this was more so they could get a refund. In one case it wa= s a laptop battery that was slightly oversized so it wouldn't fit properly = into the laptop without forcing. At $50 I expected them to ask me to retur= n it, but no, they just wanted details so they could get their refund from = the factory.=20 Rick C.Article: 160674
On Saturday, September 22, 2018 at 11:09:02 AM UTC-4, abi...@gmail.com wrot= e: > I bought HDMI extender over optical fiber for $125, from Alibaba. >=20 > HDMI Extender works well when Source is my laptop, but when source is my = FPGA board,there is a problem. > I enabled TMDS, HPD, DDC, 5+, ground as in Hamsterwork's project , it d= oesnot works,I dont know why? Who had the same situation ? What to do ? >=20 > Alibaba seller know nothing about this, but they always tell me they manu= facturer, and cannot help me, i think just reseller sit there, and there no= technical support. I meant to ask, does your FPGA work with the target without the extender? = I assume you've checked this. If so, I'd say ask for a refund and try to f= ind a name brand extender even if it's three times the price. If your FPGA= design works with this then the cause of the problem is obvious.=20 Rick C.
Site Home Archive Home FAQ Home How to search the Archive How to Navigate the Archive
Compare FPGA features and resources
Threads starting:
Authors:A B C D E F G H I J K L M N O P Q R S T U V W X Y Z