The ARM/NetWinder Structure Alignment FAQ

Last changed: 99/05/28
1) What is structure alignment?
2) Why is this an issue for ARM/NetWinder systems?
3) How is this related to the alignment trap?
4) Which distributions are affected?
5) Which compilers are affected?
6) What are the advantages of word alignment?
7) What are the disadvantages of word alignment?
8) What is the magnitude of the porting problem?
9) Why can't we just change the compiler?
10) What about mixed distributions?
11) Some examples of code with problems?
12) How do I find alignment problems in code from other platforms?
13) How do I fix alignment problems?
14) What about C++?
15) Which system header files are affected?
16) Who wrote this and what do they want?

1) What is structure alignment?

All modern CPUs expect that fundamental types, such as int's long's and float's, are stored in memory at addresses that are multiples of their length. CPUs are optimized for accessing memory aligned in this way. Some CPUs (such as the Intel x86 series) allow unaligned access but at a performance penalty. Some CPUs trap unaligned accesses to the operating system where they can either be ignored, simulated or reported as errors. Other CPUs (just as the early ARM processors) use the unaligned address as a means to do special operations during the load or store.

When a C compiler processes a strucure declaration, it can add extra bytes between the fields to ensure that all of them that require alignment are properly aligned. It will also ensure that instances of the structure as a whole are properly aligned when defined. It will add additional bytes to the end of a structure to ensure that arrays of the structure are properly aligned. "malloc" and friends always return memory pointers that are aligned for the strictest fundamental type of the machine.

The specifications for the C and C++ language state that the existence and nature of these padding bytes are completely "implementation defined" meaning that each CPU/OS/Compiler combination is free to use whatever alignment and padding rules are best for their purposes. Programmers are not supposed to assume that specific padding and alignment rules will be followed. There are no controls defined within the language for indicating special handing of alignment and padding, although many compilers (including gcc) have non-standard extensions to permit this.

"Structure Alignment" is then the choice of rules for when and where padding is inserted and the optimizations the compiler is thus able to effect in generated code.

2) Why is this an issue for ARM/NetWinder systems?

The early ARM processors had very limited abilities to access memory not aligned on a word (4 byte) boundary. Current ARM processors (as opposed to the StrongArm) have less support for accessing halfword (short int) values. The first compilers were designed for embedded system applications. The compiler writers chose to allow the declaration of types shorter than a word (char's and short's), but aligned all structures to a word boundary to increase performance when accessing these items.

These rules are perfectly OK according to the C and C++ language specifications, but they are different from the rules used for virtually all other 32bit and 64bit microprocessors. Linux and it's applications have never been ported to a platform with these alignment rules before, so there are latent defects in the code where programmers have incorrectly assumed certain alignment rules. These defects show up when these application are ported to the ARM platform. The Linux kernel itself contains these types of assumptions.

These latent defects can result in decreased performance, corrupted data, and program crashes. The exact effect depends on how the compiler and OS are configured as well as the nature of the defective code. There are three ways of fixing these latent defects:

  1. Change the compiler's alignment rules to match those of other Linux platforms.
  2. Use an alignment trap to fixup incorrectly aligned memory references
  3. Find and fix all of the latent defects individually

The three alternatives are to some extent mutually exclusive. All of them have advantages and disadvantages discussed below. All of them have been applied in the past so there is some experience with each. The "correct" solution, of course, depends on your goals.

3) How is this related to the alignment trap?

On the StrongArm processor the OS can establish a trap to handle unaligned memory references. Unaligned memory references are a frequent consequence of alignment defects, but they are not the only consequence. Thus, some, but not all, alignment defects can be fixed within an alignment trap.

Furthermore, not every unaligned access indicates a defect. In particular, compilers for processors without halfword access will use unaligned accesses to efficiently load and store these values. If the alignment trap "fixes" these memory references, the program will produce incorrect results.

On the ARM and StrongArm, if you ask for a non-aligned word and you don't take the alignment trap, then you get the aligned word rotated such that the byte align you asked for is in the LSB. Eg,

Consider:
        Address: 0  1  2  3  4  5  6  7
        Value  : 10 21 66 23 ab 5e 9c 1d

Using *(unsigned long*)2 would give:
        on x86: 0x5eab2366
        on ARM: 0x21102366

An alignment trap can distinguish between kernel code and application code and do different things for each. The basic choices for the alignment trap are:

  1. It can be turned off. The unaligned access will then behave like unaligned accesses on other members of the ARM family without performance penalty.
  2. It can "fixup" the access to simulate a processor that allows unaligned access.
  3. It can fixup the access and generate a kernel message.
  4. It can terminate the application or declare a kernel panic.

There is a signifigant performance penalty for fixing up unaligned memory references.

4) Which distributions are affected?

I am aware of three planned or current distributions of Linux for the ARM series of processors.

The ARM Linux distribution

ARM Linux is based on the RedHat distribution of Linux for X86 processors and was started on the ARM3 processor. ARM Linux currently (Spring 1999) runs on the following processor series:

A key goal of this distribution is the creation and use of a single binary standard for all of the ARM processors. Binary files are compiled so that they will run on any processor, so the extended instructions of the newer processors are not used and the compiler will generate deliberate unaligned memory references that it expects will not be "fixed".

The plan for the alignment trap on the StrongArm processor in this distribution is to have the trap fix unaligned memory references within the kernel and ignore them in applications. Latent alignment defects in Linux applications are handled by fixing them one at a time and submitting changes to the originator of the application.

The "Corel Netwinder" distribution

This distribution is also based on RedHat and closely tied to the ARM Linux distribution. The kernel and applications, however, are compiled specifically for the StrongArm processor.

The key goal of this distribution is the establishment of the Netwinder line of computers in specific application areas such as net serving, software development, office desktops, and Java computing.

The latest versions of this release have a configurable alignment trap that, by default, fixes-up unaligned memory references for both kernel and application code.

The current compilers (GCC and egcs) align all structures to a word boundary. This seems unlikely to change at this point.

At one point, RedHat wa contracted with Corel to provide the subsequent releases of this distribution. With the sale of the NetWinder to Rebel.com, the status of this work is unclear.

The "Debian GNU/Linux" distribution

This is an alternative distribution intended to "replace the Red Hat-based environment the the NetWinder ships with." It is based on the Debian distribution for other platforms.

The packages are compiled to run on both the 26bit and 32bit processors.

Currently (spring 1999), it is a work in progress and not a complete distribution. It uses the same GCC and egcs configuration as the other ARM distributions.

NetBSD/arm32 Distribution

"NetBSD/arm32 is a port of the OS to a variety of ARM- and StrongARM-powered computing platforms."

While not a Linux port, this distribution has many of the same goals and many applications in common with the Linux distributions. They use a compiler that aligns structures in a way similar to other microprocessor platforms. Their experience with this approach is valuable for Linux porting efforts.

5) Which compilers are affected?

The ARM port of GCC can be configured to either

  1. align all structures to a word boundary -- even those containing just chars and shorts (this is the way ARM-Linux is distributed)
  2. align structures based on the alignment constraints of the most strict structure member (this is the same alignment as is used on the x86), or
  3. follow other rules

Changing between 1. and 2. is a one line change in the gcc or egcs source. With additional effort, these could be modified with an additional compile time parameter selecting the alignment rules to be used. Some other architectures already have such a flag, so these could be used as a model.

The compiler supplied with the ARM SDT defaults to align all structures on a word boundary. It as a "packed structure option" (-zas1) that changes alignment to match the x86 rules. In future, this option will be the default, "since word-alignment causes too much user trouble, and the performance/codesize improvement has never been proven (typically the affected structures are small, and generally not copied around a lot)."

6) What are the advantages of word structure alignment?

The averall performance impact on StrongArm processors is hotly debated and ranges from "most of the system would run faster" and "Pretty much ANYTHING that you care about memory bandwidth and performance issues on will or could seriously be impacted by this." to "Although in theory it produces faster code, in practice most code and thus the system will run a lot slower."

The performance impact on other processors is less debated, but there is not complete consensus there either.

The only way to resolve this debate is to measure the relative performance.

7) What are the disadvantages of word alignment?

There is hot debate on both the number of Linux packages that have latent alignment defects and how difficult these defects will be to find and fix. Estimates of the magnitude of the problem range from

"The only programs that I found that were violating this when I did the original port were very few and far between. I think it was in the order of 1 in 200. However, as of lately, maybe because of the commercialisation of the Internet, this figure appears to be increasing"

and

"Generally, the defects I've found stick out like a sore thumb."

to

"These problems are so severe that I'd be very surprised if any major Linux application runs reliably or can be made to run reliably without superhuman effort."

Unless other measures are taken, this debate will not be resolved until ARM distributions that align all structures are complete and widely deployed or the attempt is abandoned. Distributions that elect to not align all structures avoid the problem and thus never find out its magnitude in detail.

The alignment trap for application code can be used to produce an estimate of the problem magnitude earlier than this. Application code will execute unaligned memory references in the following circumstances:

  1. It was compiled for an ARM processor and is using "legal unaligned load word instructions" to reference halfword and/or byte data.
  2. The application has code that deliberately does unaligned memory references. This indicates that the application is not portable to a variety of platforms.
  3. The application has latent alignment defects exposed by alignment rule differences.

When the alignment trap is set to generate a count of traps from application code and code compiled for the StrongArm is run, then every trap signals the existence of a defect that needs to be fixed. If the problem magnitude is large, many messages/counts will be recorded. If the problems are rare or have already been fixed, the trap will be silent.

The early results of this testing on the NetWinder have been:

This picture has changed as more and more packages are updated to newer versions and compiled with newer compiler versions to the point that the number of traps has declined to about 1,000 per CPU minute even with X windows use.

Setting the alignment trap to produce messages or counts is obviously useful for debugging as well. However, it produces only an estimate of the magnitude because there are potential latent defects that will cause applications to fail without ever doing an unaligned memory reference.

The argument that aligned structures are effectively slower is based several opinions:

  1. The fixes to alignment defects often result in slower code.
  2. The alignment trap would be called less frequently if the compiler didn't align all structures.
  3. Code compiled for ARM processors will execute slower than code compiled specifically for the StrongArm.

8) What is the magnitude of the porting problem?

At this point, several years of fixing alignment defects in Linux packages have reduced the problems in the most common packages. Packages known to have had alignment defects are:

This list is *very* incomplete. At this time (Spring 1999), the NetWinder distribution has 408 packages. RedHat 5.2 has 524 packages and RedHat 6.0 has 646 packages, so roughly 60 to 80% of Linux has been ported.

9) Why can't we just change the compiler?

The problem with changing the compiler is one of compatibility and transition. A completely new distribution for the ARM or StrongArm could use whatever alignment rules meet it's goals. However, there would be problems running binaries from other distributions. For commercial applications, this would split the ARM market in two and they would need to decide which distribution(s) to support. Those familiar with UNIX history know the potential costs of these splits.

Since StrongArm binaries cannot be run on the ARM processors, this is the natural dividing point for this split. To some extent, this split has already occurred since many packages are being ported specifically for the StrongArm. Changing alignment in a StrongArm distribution will affect it's ability to run ARM binaries.

From this perspective, the worst case is having two binary standards on the StrongArm processor for the same OS.

The upgrade from aligned to unaligned or vice versa is particularly tricky because of interdependencies between programs and shared libraries. When the upgrade is in progress, the system is really some kind of "mixed distribution". Also, local programs compiled before the upgrade need to be recompiled to ensure compatibility.

10) What about mixed distributions?

It is possible to create header files for libraries and system calls that are independent of which alignment rules are used by the compiler and thus ensure binary compatibility between distributions even if different compilers are used. All of the distributions would need to standardize on these modified headers for this to work.

If these changes were in place, different applications could be compiled with different rules within the same distribution as the needs of the application itself dictate. Some people are going to be experimenting with alternatively configured compilers and will need to make at least a start on these changes in order to do this experimentation. Later in this FAQ is a list of the system header files that would be affected.

11) Some examples of code with problems?

All of the following examples are defective in a way that works for most Linux platforms and fails under the ARM-Linux distribution. The behaviour of the ARM-Linux distribution is described.

Example A)

Suppose, I'm doing something to a truecolour image in C++ (brightening it for instance) and I have a pointer to the image in memory.

struct Pixel
{
        unsigned char red;
        unsigned char green;
        unsigned char blue;
};

unsigned char* image;

Pixel*  ptr = (Pixel*)image;

inline brighten(Pixel* pix)
{
        //...a bunch of code that references *pix
}

for (int x=0; x<1024; x++)
{
        brighten(ptr++);
}

The Pixel structure will be padded with an extra byte at the end and will be aligned to a word boundary. Each ptr++ will step the pointer by four bytes instead of three as intended and thus the image will be corrupted. If image is aligned on a word boundary (this is random chance), no unaligned memory references will be made.

If I change the loop so that ptr is incremented by three bytes instead of four, then the image may be corrupted depending on what brighten does and the optimization level.

Example B)

Suppose now, I have an alpha field

struct RGBAPixel
{
        unsigned char   alpha;
        Pixel           pxl;
};

This is an 8 byte structure with a layout totally different from

struct RGBAPixel
{
        unsigned char   alpha;
        unsigned char   red;
        unsigned char   green;
        unsigned char   blue;
};

which is the layout on most other Linux platforms.

Example C)

struct Date
{
        char    hasHappened;
        char    year[4];
        char    month[2];
        char    day[2];
};

struct Record
{
        char    name[20];
        Date    birthday;
        Date    marriage;
        Date    death;
        Date    last_taxes_paid;
} inbuf;

#define RECORD_LENGTH (20+4*9)

read(fd, &inbuf, RECORD_LENGTH);

All of the date fields will be corrupt after the read.

Example D)

This example is from the Kernel source.

struct nls_unicode {
	unsigned char uni1;
	unsigned char uni2;
};

static struct nls_unicode charset2uni[256] = {...};

Each unicode character consumes four bytes instead of 2 as on other platforms. Although in this case, the only impact is benign (extra memory consumption), attempting to to read, write, or copy unicode strings based on this definition would lead to problems.

12) How do I find alignment problems in code from other platforms?

This section is fairly specific to ARM-Linux application porting. Fixing all alignment problems, including those that may cause problems in future or on other platforms, is beyond the scope of this FAQ.

The gcc compiler for ARM-Linux distribution aligns all structures containing int's, long's, float's and pointers in the same way as gcc on x86 and other 32bit platforms. The differences that may result in exposing latent alignment defects are all related to structures consisting entirely of char's and short's either signed or unsigned. On ARM-Linux, these are aligned to a word (4 byte) boundary. On other platforms these are aligned to a character boundary (ie: unaligned) for structures containing only char's and a halfword boundary for structures containing shorts or shorts and chars.

In practice, structures of this nature are relatively rare, so this is a good place to start looking. The uses of these structures that may cause problems are:

13) How do I fix alignment problems?

This really depends on your goals.

If you are concerned with the long term portability of the code, you will find and remove all expectations about padding and alignment from it. How to do this is beyond the scope of this FAQ.

If you want to port a package to ARM-Linux with minimal code changes or you suspect alignment problems and want a quick test, using the gcc extension __attribute__((packed)) will help in many cases.

If you want to arrange the header files of a library for binary compatibility between different alignment settings on StrongArm compilers, use __attribute__((packed)), explicitly insert padding bytes, and/or force alignment with unions or zero length arrays.

14) What about C++?

A C++ class is an extension of a struct and many of the same comments apply. In addition, inheritance and template classes introduce new ways of combining structures that can cause interior padding that is different between on ARM-Linux and x86 systems. Name mangling may or may not be affected. This makes the problems more difficult to identify from the source code. C++ programs in particular need to be devoid of all expectations of interior padding and alignment.

15) Which system header files are affected?

This section presents a list of system header files that contain structures with potential alignment or interior padding problems. For those porting to ARM-Linux, this list may be helpful in localizing latent alignment defects. For those experimenting with mixed environments, it indicates areas that may cause problems. For those making new distributions, it's a rough idea of the scope of work involved to support mixed alignment binaries.

This list was created by examining header files on an x86 system running Linux 2.1.78 derived from RedHat 5.0 using egcs as the only compiler. The list of header files for an actual Linux on ARM distribution would be somewhat different. I looked at all header files in:

The header files in /usr/include/asm and /usr/include/linux are a mix of public headers included from other header files and header files that are strictly internal to the kernel. Even amongst the public headers, many are for devices that are not supported on the ARM or StrongArm. Even though most of the linux/ headers are irrelevent for application porting I've included them in this list in case they are usefull to kernel hackers.

Header files with structures consisting entirely of chars and shorts:

/usr/lib/gcc-lib/i686-pc-linux-gnu/egcs-2.90.23/include/f2c.h

ar.h
form.h
gdbm.h
ioctl-types.h
jpeglib.h  (actually not, but only because boolean is defined as an int)
png.h
sockaddrcom.h
socketbits.h
socket.h

sys/gmon_out.h
sys/sem.h
sys/socket.h
sys/ttychars.h
sys/un.h
sys/utsname.h

asm/smp.h (i386 specific)
asm/termbits.h (termios)
asm/termios.h (winsize, termio)

linux/arcdevice.h
linux/digi1.h
linux/coff.h  (but seems benign)
linux/cdrom.h
linux/cdk.h (part of Stallion multiport driver)
linux/bpqether.h
linux/fb.h  (related to cursor!)
linux/icmpv6.h
linux/if_arcnet.h
linux/if_ether.h
linux/if_frad.h
linux/if_packet.h
linux/if_strip.h
linux/if_tr.h
linux/if_wic.h
linux/ipv6.h
linux/ipx.h
linux/isdnif.h
linux/iso_fs.h
linux/kdb_kern.h
linux/minix_fs.h (directory entries)
linux/msdos_fs.h (directory entries)
linux/msdos_fs_sb.h
linux/mtio.h (seems benign)
linux/netbeui.h
linux/nfs.h
linux/nls.h
linux/rtnetlink.h
linux/smb.h
linux/socket.h
linux/soundcard.h
linux/udp.h
linux/un.h
linux/utsname.h
linux/videodev.h
linux/vt.h
linux/wireless.h

Header files with structures or arrays with different internal padding from x86

utmpbits.h
vgagl.h

asm/sigcontext.h (i386)

linux/atalk.h
linux/awe_voice.h
linux/ax25.h
linux/digiFep1.h (array)
linux/if.h
linux/epca.h
linux/if_arp.h
linux/isdn.h (termios)
linux/stallion.h (termios)
linux/kbd_diacr.h
linux/kd.h (interaction with X windows in mixed environments)
linux/rose.h
linux/sem.h (array)
linux/serialP.h (termios)
linux/tpqic02.h (sizeof)
linux/tty.h (termios)
linux/tty_driver.h (termios)
linux/vt_kern.h
linux/x25.h

16) Who wrote this and what do they want?

I've tried to keep the majority of this FAQ as unbiased as possible, but it is also only fair to state my biases. I am a strong supporter of using a non-aligned compiler on StrongArm processors and having an alignment trap that warns about or even terminates processes with alignment problems. I believe that this choice will result in the highest performing, most reliable, most complete and quickest to be finished port of Linux to the StrongArm platform.

I would like to reconcile this objective with the legitimate goals for porting Linux to other ARM machines and avoid unwarranted binary incompatibilities. I hope this FAQ advances this aim for all Linux on ARM/StrongArm enthusiasts.

This FAQ was written by Brian Bray <brianbr@ibm.net>, Minoru Development Corporation, starting in September 1998 based on a series of e-mail messages on the devel@NetWinder.org mailing list. I thank everyone who participated in the discusions and particularly Russell King <rmk@arm.uk.linux.org>, Mark Brinicombe <mark@netbsd.org>, and Andrew Mileski <andrewm@corelcomputer.com> who provided key information that enabled me to understand this issue. All errors are of course, my fault.