Forums | developer.brewmp.com Forums | developer.brewmp.com

Developer

Forums

Anyone had luck in setting struct alignments? I can't for the life of me get the struct alignment right on the device.

I'm using the Gnude tools (arm-elf-g++)

On the emulator I get sizeof(Field) = 23 which is what I expect.
On the device sizeof(Filed) = 28. I can't for the life of me get it to compact...

here is the struct definition

#ifdef WIN32
#define __packed
#pragma pack(1)
#endif

typedef GCCPACKED struct _Player{
unsigned char id;
unsigned char x;
unsigned char y;
unsigned char speed;
unsigned char points;
unsigned char assists;
long seconds;
unsigned short penalties;
unsigned char state;
Player;

typedef GCCPACKED struct _Field{
unsigned short id;
unsigned char flag;
unsigned char playersOnField;
unsigned char playersInGroup;
unsigned char groupIndex;
unsigned char id_prev1;
unsigned char id_prev2;
unsigned char id_next1;
unsigned char id_next2;
Player player[1];
Field;

#ifdef WIN32
#pragma pack()
#endif

Is GCCPACKED aliased to __attribute__ ((packed)) ?

Is GCCPACKED aliased to __attribute__ ((packed)) ?

yeah it is. I even tried replacing GCCPACKED in my struct definition with
__attribute__ ((packed))
typedef __attribute__ ((packed)) struct _Player{
...etc
That didn't seem to solve it either.
Then I tried using the -fpack-struct compiler switch, but that ended up adding 150k to the size of my mod file.

yeah it is. I even tried replacing GCCPACKED in my struct definition with
__attribute__ ((packed))
typedef __attribute__ ((packed)) struct _Player{
...etc
That didn't seem to solve it either.
Then I tried using the -fpack-struct compiler switch, but that ended up adding 150k to the size of my mod file.

Your problem is :
typedef GCCPACKED struct _Player
should be the other way round, IE:
typedef struct GCCPACKED _Player
As a rule I always pack / pad my structures by hand, this avoids this kind of problem. Be very cafeful with __attribute__ ((packed)) (I did some tests). It generates some seriously bad code (as you would expect). Packing your structures forces GCC to do unaligned loads if elements of your structure are not aligned on the correct boundaries (byte = 1, short = 2, int = 4). The only way it can do this is by reading the members as shorts or bytes. For example:
Reading all the members from this struct in packed mode, generates 6 byte reads + other overhead.
typedef struct _bad {
char a;
int b;
char c;
bad;
While reading the same members from this struct generates 3 reads. That's a whopping great 50% saving.
typedef struct _good {
int b;
char a;
charb;
good;
Memory access is usually pretty bad, so packing structures may slow your basketball game down even if you get some memory back. Swings and roundabouts...
Regards,
Steve.

Your problem is :
typedef GCCPACKED struct _Player
should be the other way round, IE:
typedef struct GCCPACKED _Player
As a rule I always pack / pad my structures by hand, this avoids this kind of problem. Be very cafeful with __attribute__ ((packed)) (I did some tests). It generates some seriously bad code (as you would expect). Packing your structures forces GCC to do unaligned loads if elements of your structure are not aligned on the correct boundaries (byte = 1, short = 2, int = 4). The only way it can do this is by reading the members as shorts or bytes. For example:
Reading all the members from this struct in packed mode, generates 6 byte reads + other overhead.
typedef struct _bad {
char a;
int b;
char c;
bad;
While reading the same members from this struct generates 3 reads. That's a whopping great 50% saving.
typedef struct _good {
int b;
char a;
charb;
good;
Memory access is usually pretty bad, so packing structures may slow your basketball game down even if you get some memory back. Swings and roundabouts...
Regards,
Steve.

Doh, that was the problem. Works like champ now! Thanks.
Unfortunately I am stuck using an existing data stream. I CAN read through the binary stream one data item at a time, but I was trying to simplify that piece.
Luckily this struct is only used on occasion, if speed turns out to be an issue, I'll implement the stream reader anyway and populate a more device friendly structure.

Doh, that was the problem. Works like champ now! Thanks.
Unfortunately I am stuck using an existing data stream. I CAN read through the binary stream one data item at a time, but I was trying to simplify that piece.
Luckily this struct is only used on occasion, if speed turns out to be an issue, I'll implement the stream reader anyway and populate a more device friendly structure.

To declare a struct we should declare the bigger size variable first on the struct to better aligment and performance issue
For example
// example 1:
struct _packed demo {
uint32 a; - data within 1st 32bit
uint16 b; -- data within 2st 32bit
uint8 c; -- data within 2st 32bit
uint8 d; -- data within 2st 32bit
uint8 e; -- data within 3st 32bit

example 2:
struct _packed demo {
uint8 d; data within 1st 32bits
uint32 a; data among 1st and 2nd 32bits
uint16 b; - data among 2nd and 3rd 32bits
uint16 c; - data among 3rd 32bits
}
For a 32bit CPU, it will get data from memory 4 bytes for once, so if we define the struct as packed style in example 2, we will encounter the performance problem. Some of the variable need to access memory twice to get the result...
Although CPU is fast ~~~ but as a professional programer, we should consider this issue.
Alex.

To declare a struct we should declare the bigger size variable first on the struct to better aligment and performance issue
For example
// example 1:
struct _packed demo {
uint32 a; - data within 1st 32bit
uint16 b; -- data within 2st 32bit
uint8 c; -- data within 2st 32bit
uint8 d; -- data within 2st 32bit
uint8 e; -- data within 3st 32bit

example 2:
struct _packed demo {
uint8 d; data within 1st 32bits
uint32 a; data among 1st and 2nd 32bits
uint16 b; - data among 2nd and 3rd 32bits
uint16 c; - data among 3rd 32bits
}
For a 32bit CPU, it will get data from memory 4 bytes for once, so if we define the struct as packed style in example 2, we will encounter the performance problem. Some of the variable need to access memory twice to get the result...
Although CPU is fast ~~~ but as a professional programer, we should consider this issue.
Alex.

alextkhsieh wrote:To declare a struct we should declare the bigger size variable first on the struct to better aligment and performance issue
For example
// example 1:
struct _packed demo {
uint32 a; - data within 1st 32bit
uint16 b; -- data within 2st 32bit
uint8 c; -- data within 2st 32bit
uint8 d; -- data within 2st 32bit
uint8 e; -- data within 3st 32bit

example 2:
struct _packed demo {
uint8 d; data within 1st 32bits
uint32 a; data among 1st and 2nd 32bits
uint16 b; - data among 2nd and 3rd 32bits
uint16 c; - data among 3rd 32bits
}
For a 32bit CPU, it will get data from memory 4 bytes for once, so if we define the struct as packed style in example 2, we will encounter the performance problem. Some of the variable need to access memory twice to get the result...
Although CPU is fast ~~~ but as a professional programer, we should consider this issue.
Alex.
Alex, you are incorrect.
Specifying __attribute__((__packed__)) for the entire structure as above, forces the compiler to pack member variables *and* structure elements. This means the compiler cannot assume the initial address of the structure is 4 byte aligned. Therefore both structures produce 9 bytes reads (as you would expect).
If instead of packing the entire structure, you pack the individual members by specifying __attribute__((__packed__)) for each member, the compiler packs the individual members and leaves the structure element packing at the default level (4 byte alignment). Then the compiler assumes the first member is 4 byte aligned the code is improved and your optimisations apply.
struct _packed demo {
uint32 a; - data within 1st 32bit
uint16 b; -- data within 2st 32bit
uint8 c; -- data within 2st 32bit
uint8 d; -- data within 2st 32bit
uint8 e; -- data within 3st 32bit

becomes :
struct demo {
uint32 a __attribute__((__packed__));
uint16 b __attribute__((__packed__));
uint8 c __attribute__((__packed__));
uint8 d __attribute__((__packed__));
uint8 e __attribute__((__packed__));

Also it is not necessary to put the members in size order to improve performance.
struct foo {
uint8 a;
uint8 b;
uint16 c;
uint32 d;

is just as good (performance wise) as
struct foo {
uint32 d;
uint16 c;
uint8 a;
uint8 b;

Rather it is the alignment of the individual member that is important.
As professional programmers, we should be aware of this.
Steve.

alextkhsieh wrote:To declare a struct we should declare the bigger size variable first on the struct to better aligment and performance issue
For example
// example 1:
struct _packed demo {
uint32 a; - data within 1st 32bit
uint16 b; -- data within 2st 32bit
uint8 c; -- data within 2st 32bit
uint8 d; -- data within 2st 32bit
uint8 e; -- data within 3st 32bit

example 2:
struct _packed demo {
uint8 d; data within 1st 32bits
uint32 a; data among 1st and 2nd 32bits
uint16 b; - data among 2nd and 3rd 32bits
uint16 c; - data among 3rd 32bits
}
For a 32bit CPU, it will get data from memory 4 bytes for once, so if we define the struct as packed style in example 2, we will encounter the performance problem. Some of the variable need to access memory twice to get the result...
Although CPU is fast ~~~ but as a professional programer, we should consider this issue.
Alex.
Alex, you are incorrect.
Specifying __attribute__((__packed__)) for the entire structure as above, forces the compiler to pack member variables *and* structure elements. This means the compiler cannot assume the initial address of the structure is 4 byte aligned. Therefore both structures produce 9 bytes reads (as you would expect).
If instead of packing the entire structure, you pack the individual members by specifying __attribute__((__packed__)) for each member, the compiler packs the individual members and leaves the structure element packing at the default level (4 byte alignment). Then the compiler assumes the first member is 4 byte aligned the code is improved and your optimisations apply.
struct _packed demo {
uint32 a; - data within 1st 32bit
uint16 b; -- data within 2st 32bit
uint8 c; -- data within 2st 32bit
uint8 d; -- data within 2st 32bit
uint8 e; -- data within 3st 32bit

becomes :
struct demo {
uint32 a __attribute__((__packed__));
uint16 b __attribute__((__packed__));
uint8 c __attribute__((__packed__));
uint8 d __attribute__((__packed__));
uint8 e __attribute__((__packed__));

Also it is not necessary to put the members in size order to improve performance.
struct foo {
uint8 a;
uint8 b;
uint16 c;
uint32 d;

is just as good (performance wise) as
struct foo {
uint32 d;
uint16 c;
uint8 a;
uint8 b;

Rather it is the alignment of the individual member that is important.
As professional programmers, we should be aware of this.
Steve.

That's correct. When you specify "packed" compiler does not follow default alignment, instead it ensures that no space is wasted.
Order of the alignment:
It can be either increasing order or decreasing order, as long as elements are placed in order it should fine.

That's correct. When you specify "packed" compiler does not follow default alignment, instead it ensures that no space is wasted.
Order of the alignment:
It can be either increasing order or decreasing order, as long as elements are placed in order it should fine.