Sunday, November 29, 2015

Byte Alignment

Byte alignment is associated with how efficiently a processor can read & write information stored to & from memory. A processor may provide instruction to access data from memory in chunks of 4-bytes, 2-bytes or 1-byte. It may only be possible to access a memory location whose address is multiple of 2 or 4. Some processor throw exception if a program tries to access memory location which is not an integral multiple of 2 or 4. In such case, if the operating system does not take care of such situation the program may crash.  In other case, the memory access at such address take more CPU cycles thereby reducing the efficiency. In order to handle such scenario padding is done so the data ends at 2-byte or 4 byte boundary. Lets look at an example. How is following structure is stored in memory?
struct recordType1 {
 unsigned char name[10];
 unsigned int id;
 unsigned int age;
 unsigned char grade;
 float exp;
};

On my X86/Linux PC, it consumes 28 bytes. I have initialized the above structure and print it byte by byte.
#include <stdio.h>
#include <stdlib.h>

void printStruct(unsigned char *ptr, unsigned int size) {
 unsigned int indx = 0;
 printf("Size: %dn", size);

 for(; indx < size; indx++)
  printf("%02x ", ptr[indx]);

 printf("n");
}

int main() {
 struct recordType1 emp1 = {"MyName", 1102, 31, 'A', 7.2};

 printStruct((unsigned char *) &emp1, sizeof(emp1));

 return 0;
}

Here is the output.

Size in bytes : 28
4d 79 4e 61 6d 65 00 00 00 00 b4 bf 4e 04 00 00 1f 00 00 00 41 80 79 b7 66 66 e6 40

First 10 underlined bytes is name field, then four underlined bytes is for id field and so on. There are 2 bytes after name field & 3 bytes after grade field. These additional bytes are padded so that each filed in struct recordType1 falls on 2-byte boundary. And that is why the structure is taking 28 bytes of memory. Let's do some modification to this structure.
 struct recordType2 {
 unsigned int id;
 unsigned int age;
 float exp;
 unsigned char name[10];
 unsigned char grade;
};

With the same initialization as above, lets check the byte of memory consumed and layout.

Length in bytes : 24
4e 04 00 00 1f 00 00 00 66 66 e6 40 4d 79 4e 61 6d 65 00 00 00 00 41 08

We saved 4 bytes with such a simple modification. This is on the memory consumption part.

Let's see how the speed of execution is affected by byte aligned access. Here is a simple program. There are two pointer ptr1 and ptr2, which write a buffer with 10 meg of data. The ptr1 starts with the buffer address which is divisible by 4 but ptr2 is not.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <sys/time.h>

#define MEGA_10    (1024*1024*10)
#define MEGA_100 (10 * MEGA_10)

int main()
{
 unsigned char *buffer;
 unsigned int *ptr1, *ptr2, bufSize = (MEGA_100/4), indx = 0;
 unsigned isOn4bdry = 0;
 struct timeval t1, t2;

 buffer = (unsigned char *)calloc(MEGA_100+1, sizeof(unsigned char));

 isOn4bdry = (((unsigned int)buffer % 4) == 0) ? 1 : 0;
 printf("Buffer pointer: %p, On 4 boundary: %dn", buffer, isOn4bdry);

 gettimeofday(&t1, (struct timezone *)NULL);
 ptr1 = (unsigned int *)buffer;
 for( ; indx < bufSize; indx++){
  *ptr1 = 0xDEADBEEF;
  ptr1++;
 }

 gettimeofday(&t2, (struct timezone *)NULL);
 printf("Time taken: %dn", (int)(t2.tv_usec - t1.tv_usec));

 memset((void *)buffer, 0x0, bufSize);

 gettimeofday(&t1, (struct timezone *)NULL);
 buffer += 1;
 ptr2 = (unsigned int *)buffer;
 isOn4bdry = (((unsigned int)buffer % 4) == 0) ? 1 : 0;
 printf("Buffer pointer: %p, On 4 boundary: %dn", buffer, isOn4bdry);
 bufSize -= 1;

 for(indx = 0 ; indx < bufSize; indx++){
  *ptr2 = 0xDEADBEEF;
  ptr2++;
 }

 gettimeofday(&t2, (struct timezone *)NULL);
 printf("Time taken: %dn", (int)(t2.tv_usec - t1.tv_usec));

 buffer -= 1;
 free(buffer);

 return(0);
}

And the output.

Buffer pointer: 0xb11c6008, On 4 boundary: 1
Time taken: 58653
Buffer pointer: 0xb11c6009, On 4 boundary: 0
Time taken: 60675

Please note, I have not taken care of wrapping of tv_usec field and the above time values are in microseconds.

Look at the difference time taken to initialize the same buffer. In first case i.e. byte aligned access, we are saving a little more than 2.022 ms.

No comments:

Post a Comment