Configuring an FPGA from a processor

Vendor Xilinx
FAQ Entry Author Philip Freidin
FAQ Entry Editor Philip Freidin
FAQ Entry Date 06/28/2003

Q. How do I configure an FPGA from a processor


There are many ways to do this. One example is at
Downloading a Bitstream under Linux

There seems to be ongoing confusion about .BIT and .RBT files.
The following should help you
Understanding The Bitstream Format for Parallel Configuration
Tell me about the MCS file format
Tell me about bit files
Configuring an FPGA with a cheap serial EPROM and a PIC processor

I would have thought the above was enough, but people seem to still ask questions. Here is something I posted in June 2003

(the question was whether configuring with a .RBT was slower than with a .BIT,
and whether it made sense to convert an .RBT into a .BIT)

While you probably could convert an RBT to BIT, you probably
don't need to.

The advantage of RBT is that it is human readable, so it is easy to
look at with an editor, and see how it is laid out, and what to
look for when parsing it (in a program you might write).

Look here for some examples:

MCS file format

The time it takes you to configure is typically not limited by
the file format. It is limited by how your configuration device
talks to the FPGA.

If you are configuring from a PROM, then the configuration clock
and the PROM (EPROM, FLASH, EEPROM, SerialEPROM, .....) access
time will set the speed.

If you are configuring with a local processor, then the time
taken depends on the interface, and the speed of the configuration

If you are configuring via a host computer via a download cable
connected to the parallel (printer) port, in current technology
computers, you are limited by the speed of the processor talking
to the parallel port, which is about 500K updates per second,
regardless of whether it is a 25MHz 80486, or a 2GHz Pentia/Athalon.

In the end it is just bits.

If you are storing the bitstream in a PROM, or using a local
processor for configuration, you need to get it into the
right format. Neither the RBT or BIT files are exactly right.

Either use a PROM formatting progam that reads these files and
formats it appropriately, or write your own.

Getting back to your original question: The difference in speed
of the two file formats is how long it takes you to read the file,
which is about 8:1 . Depending on environment, this is irrelevant.

And then the following article:

(how do I get from an RBT to the processor that is going to do the down load)

There are so many possibilities, and you haven't given enough
info to guess what way you are going. So here is an example:

Loading FPGA with serial slave mode.
Bitstream is in RBT format.
Local processor is an 8 bit micro, with assorted parallel port
bits that can be individually set high and low.
Micro has sufficient EPROM/FLASH memory to hold its program
and the config data for the FPGA.

Connect uP parallel output port bits to FPGA Din, CCLK, PROGRAM
Connect uP parallel input bits to FPGA DONE, INIT, Pullup resistor on both

Go read MCS file format

Take the .RBT data, and write a dinky program to take the data
starting at line with 1111111... and turn it into initialization
data for your microprocessor:

i.e. (spaces added for clarity on lines 8 and 9)

Xilinx ASCII Bitstream
Created by Bitstream E.35
Design name:        lin_prod.ncd
Architecture:       xc4000xl
Part:               4085xlbg560
Date:               Mon Jun 10 14:29:19 2003
Bits:               1924992
11111111 00100001 11010101 11110111 10011111
01011011 11111111 11111111 11011111 11111101
11011111 11011101 11111101 11011111 11011101

  etc ....

becomes (if you are programming the micro in assembler)

        DB        $FF, $21, $D5, $F7, $9F
        DB        $5B, $FF, $FF, $DF, $FD
        DB        $DF, $DD, $FD, $DF, $DD

        etc ....

becomes (if you are programming the micro in C)

char        config_data[240626] = {
        0xFF, 0x21, 0xD5, 0xF7, 0x9F,
        0x5B, 0xFF, 0xFF, 0xDF, 0xFD,
        0xDF, 0xDD, 0xFD, 0xDF, 0xDD,
        etc ....


You will find that after the end of the bitstream, you NEED
to send some extra "1" bits for the chip to sequence through
the startup states. I make the data array longer by 2 bytes,
and set all bits to a "1". This works fine.

Assemble/compile in with your micro's program.

Write code that reads this data when your system starts up,
and sends it out to the FPGA one bit at a time.

The data is sent MSB first, in byte sequence as given
in the above examples of config-data. If you had assembled
this data backwards (which happens because people are careless)
then you would send it out LSB first. Just be consistent. The
data MUST (MUST , MUST, MUST) arrive at the FPGA in the same
order that it appears in the .RBT file if you read it from
left to right, top to bottom.

Before starting to send the data, to the FPGA, You need to get
it ready.

Set the PROGRAM signal low.
Check that the DONE and INIT are both LOW
Set the PROGRAM signal high
Wait for INIT to go HIGH
Wait 5 more microseconds

Start sending data

To bit-bang the data out to the FPGA, the sequence looks
something like this:

  Set CCLK line low.
  Set Din line to the next bit
  set CCLK line high.
  are we done? if not go to repeat

While sending the data, if INIT goes low, you have a framing
or CRC error.

When you have finished sending all the data, the DONE signal
should go high (sometime during the trailing 16 "1" bits).

You can see some more detailed code at:

     Downloading a Bitstream under Linux

There are several other ways to configure the FPGAs, including
parallel and JTAG. The mechanism of getting the data to the
FPGA is different, but in the limit, the data that is loaded is
exactly the same, and the bit order must be exactly as given.

Denis Gleeson then used this info and various other postings/app notes and wrote
the following which he posted to the news group for all to enjoy:

** Project: Use a Dos Executable to configure a spartan xl FPGA.
**          This software operates with a xilinx parallel cable.
**          Im using Parallel cable IV but any should work.
**          The project was a WIN 32 console application. Built using MSVC 6.
**          Run code in debug mode as  _inp and _outp will not operate
**          otherwise. Running the .exe in a dos window appears to succeed but
**          in fact it doesnt. I used Win ME but 95 or 98 should be OK also.
**          Not so sure about 2K, XP etc.
** File   : CFG_bit.c 
** Author : Denis Gleeson
** Notes 1.  Assumptions made:  i.e. things that need changing for other FPGAs.  
**           (a) Parallel port address starts at 0x378
**           (b) configuration file is a .bit file.
**           (c) configuration file is in the specified directory.
**           (e) configuration file has less than 64K Bytes.
**           (f) Device is a xilinx xcs05xl.
**       2.  Program based on the work of Brittle. Thanks to Philip Freidin
**           for the FPGA- FAQ and Question: How can I download a FPGA from 
**           a linux System
**       3.  Use this code at your own risk. I guarantee nothing.
**           It worked for me, it may not work for you.  
**           Its just a quick test I put together to allow me to move
**           on to my required solution.
#include <stdio.h> 
#include <conio.h>> 

#define DATA 0x378 
#define STATUS DATA+1 
#define CONTROL DATA+2 
char buf[64000]; 

int main(void) 

 FILE *bitfile; 
 unsigned char head_key; 
 unsigned long int length=0;
 unsigned char length1;
 unsigned char length2;
 unsigned char length3;
 unsigned char length4;

 unsigned int i =0; 
 int j=0; 
 unsigned char tmp=0; 

 bitfile = fopen("C:/work/projects/comisn10.bit", "rb"); 
 if(bitfile== NULL)
         printf("Cant open File \n");

 head_key = 0; 
 while (head_key != 0x65) 
    fread(&head_key, 1, 1, bitfile); 
    printf("%c ",head_key);
          if (head_key == 0x65) 
                        // Cant use fread to read 4 bytes in one go because 
                        // I end up with words switched and bytes within words switched.
                        // read Byte 1
                        fread(&length1, 1, 1, bitfile); 
                        length = length | (length1<<24);
                        // read Byte 2
                        fread(&length2, 1, 1, bitfile); 
                        length = length | (length2<<16);
                        // read Byte 3
                        fread(&length3, 1, 1, bitfile); 
                        length = length | (length3<<8);
                        // read Byte 4
                        fread(&length4, 1, 1, bitfile); 
                        length = length | (length4);
                        // Now read length bytes into the buffer.
                        fread(buf,length, 1, bitfile); 


 // sense VCC 
 tmp =_inp(STATUS);
 tmp = (tmp>>3) & 1; 
 if (!tmp) 
  printf("cabel not found\n"); 
  printf("cabel detected\n"); 

 // clear config 
 printf("configuration memory cleared\n"); 

 // Actually configure the FPGA
 printf("start loading %d bytes\n", length); 
 printf("Programming %d bits\n", length *8); 
 for (i=0; i<length; i++) 
  for (j=7; j>=0; j--) 
    tmp = (buf[i]>>j) & 1; 
        _outp( DATA,tmp|0x14); 
        _outp( DATA,tmp|0x16); 


 printf("finish loading\n"); 

 // Done 
 tmp =_inp(STATUS);
 tmp = (tmp>4)&1; 
 if (tmp) 

 printf("Hit any Key to Continue. \n"); 


 return (0); 

Kasper Pedersen noted a small problem (acknowledged by Denis)

Looking at the init/load code:

>  // clear config
>  _outp(DATA,0x10);
>  printf("configuration memory cleared\n");
>  _outp(DATA,0x14);
>  // Actually configure the FPGA
>  printf("start loading %d bytes\n", length);
>  printf("Programming %d bits\n", length *8);
>  for (i=0; i<length; i++)
>  {
>   for (j=7; j>=0; j--)
>   {
>     tmp = (buf[i]>>j) & 1;
> _outp( DATA,tmp|0x14);
> _outp( DATA,tmp|0x16);
>   }

The code has no guaranteed delay between releasing /program and clocking
in the first bit, and it does not check /init. This bit us on an XCS30

What you see is that once you release /prog, /init will stay low for
some time (max 4 ms for the XCS30), and only then can you start clocking
data into the device.

My guess is that, in this case, the printf's are supplying enough delay
that it works for the 05xl (console output under windows is slooow).