Whether the data is already in cache or not will have a far greater effect. Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Collectives on Stack Overflow. Learn more. Purpose of memory alignment Ask Question. Asked 12 years, 11 months ago. Active 8 months ago. Viewed 92k times. Improve this question. Dan Hook 5, 7 7 gold badges 32 32 silver badges 50 50 bronze badges.
After doing some additional Googling I found this great link, that explains the problem really well. Check out this small article for people who start learning this: blog. JohnJiang I think I found the new link here: developer. Add a comment. Active Oldest Votes. Speed Modern processors have multiple levels of cache memory that data must be pulled through; supporting single-byte reads would make the memory subsystem throughput tightly bound to the execution unit throughput aka cpu-bound ; this is all reminiscent of how PIO mode was surpassed by DMA for many of the same reasons in hard drives.
Say you had a packed version of the struct, maybe from the network where it was packed for transmission efficiency; it might look something like this: Reading the first byte is going to be the same. Range For any given address space, if the architecture can assume that the 2 LSBs are always 0 e. Atomicity The CPU can operate on an aligned word of memory atomically, meaning that no other instruction can interrupt that operation. Conclusion The memory system of a processor is quite a bit more complex and involved than described here; a discussion on how an x86 processor actually addresses memory can help many processors work similarly.
Bonus: Caches Another alignment-for-performance that I alluded to previously is alignment on cache lines which are for example, on some CPUs 64B. For more info on how much performance can be gained by leveraging caches, take a look at Gallery of Processor Cache Effects ; from this question on cache-line sizes Understanding of cache lines can be important for certain types of program optimizations.
Improve this answer. If I understand correctly, the reason WHY a computer cannot read an unaligned word in one step is because the addesses use 30 bits and not 32 bits?? The is an interesting study of the tradeoffs between speed and cost, it was basically a bit which had a full bit external bus but with only half the bus-lines to save production costs.
Because of this the needed twice the clock cycles to access memory than the since it had to do two reads to get the full bit word. The interesting part, the can do a word aligned bit read in a single cycle, unaligned reads take 2. The fact that the had a half-word bus masked this slowdown. Because of the slow memory interface, execution time on based machines is usually dominated by instruction fetches. On the , code fetching can sometimes keep up with execution, but on the unless one uses Note each 16 bit object starts at a even address since each object is 2 bytes long.
If you were to access at a address which was not a multiple of the size of the objects stored in main memory a misalignment would occur. In this case, misalignment would occur at address 1,3 since no objects begin at those locations assuming all 16 bit objects are stored consecutively and no empty locations between them.
This is the structure data alignment. For example, suppose the word size of the computer is 4 bytes, so the first address of the variable in the memory is aligned with the 4 address.
The CPU can only read addresses that are multiples of 4, and it can read 4 each time. Data of bytes in size. It can be seen from the above example that adopting a memory address alignment strategy is the key to improving program performance.
How much space does the above structure occupy in memory? Maybe you would say that this is simple. Calculate the size of each type and add them together. Then this structure occupies a total of 11 bytes of space. Well, then we use practice to prove whether it is correct, we use the sizeof operator to find the size of the memory space occupied by this structure, sizeof MemAlign , unexpectedly, the result is 12?
It seems we are wrong? Of course not, but this structure is optimized. This optimization has another name called "Alignment". So what kind of optimization is done for this alignment? Let me explain slowly. Before explaining, let's look at a picture. I believe that friends who have studied assembly are familiar with this picture.
This picture is a model of how CPU and memory exchange data. Here our picture is represented by a bit CPU. It is precisely because of this that another problem is caused. Then the problem is what? When this flag is set, the system automatically corrects for misaligned data accesses. When this flag is reset, the system does not correct for misaligned data accesses but instead raises data misalignment exceptions.
Note that changing this flag affects all threads contained within the process that owns the thread that makes the call. In other words, changing this flag will not affect any threads contained in any other processes.
You should also note that a process's error mode flags are inherited by any child processes. Therefore, you might want to temporarily reset this flag before calling the CreateProcess function although you usually don't do this. However, the results are not always the same. For x 86 systems, this flag is always on and cannot be turned off. For Alpha systems, you can turn this flag off only if the EnableAlignmentFaultExceptions registry value is set to 1. You can use the Windows MMC Performance Monitor snap-in to see how many alignment fixups per second the system is performing.
The following figure shows what the Add Counter dialog box looks like just before you add this counter to the chart. What this counter really shows is the number of times per second the CPU notifies the operating system of misaligned data accesses.
If you monitor this counter on an x 86 machine, you'll see that it always reports zero fixups per second. This is because the x 86 CPU itself is performing the fixups and doesn't notify the operating system.
Because the x 86 CPU performs the fixup instead of the operating system, accessing misaligned data on an x 86 machine is not nearly as bad a performance hit as that of CPUs that require software the Windows operating system code to do the fixup. As you can see, simply calling SetErrorMode is enough to make your application work correctly. But this solution is definitely not the most efficient.
In fact, the Alpha Architecture Reference Manual, published by Digital Press, states that the system's emulation code to silently correct for misaligned data accesses could take as much as times longer to execute!
0コメント