x64 assembler fun-facts(转载)

news/2024/7/7 19:31:01 标签: 嵌入式, runtime

原文地址

While implementing the x64 built-in assembler for Delphi 64bit, I got to “know” the AMD64/EM64T architecture a lot more. The good thing about the x64 architecture is that it really builds on the existing instruction format and design. However, unlike the move from 16bit to 32bit where most existing instruction encodings were automatically promoted to using 32bit arguments, the x64 design takes a different approach.

One myth about the x64 instructions is that “everything’s wider.” That’s not the case. In fact many addressing modes which were taken as absolute addresses (actually offsets within a segment, but the segments are 4G in 32bit), are actually now 32bit relative offsets now. There are very few addressing modes which use a full 64bit absolute address. Most addressing modes are 32bit offsets relative to one of the 64bit registers. One interesting addressing mode that is “implied” in many instruction encodings is the notion of RIP-relative addressing. RIP, is the 64bit equivalent of the 32bit EIP, or 16bit IP, or Instruction Pointer. This represents from which address the CPU will fetch the next instruction for execution. Most hard-coded addresses within many instructions are now relative offsets from the current RIP register. This is probably the biggest thing you have to wrap your head around when moving from 32bit assembler.

Even though many instructions will implicitly use the RIP-relative addressing mode, there are some instruction addressing modes that continue to use a 32bit offset, and are not RIP-relative. This can really bite you when doing simple mechanical translations from 32bit to 64bit. These are the SIB form with a 32bit (or even 8bit) offset. What can happen is that you end up forming an address that can only address 32bits, and is thus limited to addressing items below the 4G boundary! And this is a perfectly legal instruction! To demonstration this, consider the following 32bit assembler that we’ll translate to 64bits.

  var
TestArray: array[0..255] of Word;

function GetValue(Index: Integer): Word;
asm
MOV AX,[EAX * 2 + TestArray]
end;

Let’s now translate this for use in 64bit using a simple mechanical translation.

  var
TestArray: array[0..255] of Word;

function GetValue(Index: Integer): Word;
asm
MOVSX RAX,ECX
MOV AX,[RAX * 2 + TestArray]
end;

Pretty straight forward, right? Not so fast there partner. Let’s see; I know that I need to use a full 64bit register for the offset but since Integer is still 32bits, I need to “sign-extend” it to 64bits. The venerable MOVSX (Move with sign extension) instruction “promotes” the signed 32bit offset to 64bits while preserving the sign. Nope, that’s not a problem. The only thing I changed in the next instruction was EAX to RAX, so how could that be a problem? Well, when you compile this code you’ll get a rather strange error message:



[DCC Error] Project7.dpr(18): E2577 Assembler instruction requires a 32bit absolute address fixup which is invalid for 64bit


Huh? Remember the little note above about the SIB instruction form? Because the RAX (or EAX in 32bit) register is being scaled (the * 2), this instruction must use the SIB (Scale-Index-Base) instruction form. When using the SIB form RIP isn’t considered when calculating the actual address. Additionally, the offset encoded in the instruction can still only be 8 or 32bits. No 64bit offsets.


In 32bit, the compiler would generate a “fixup” to ensure that the encoding of the instruction offset field to the global “TestArray” variable was properly “fixed up” at runtime should the image happened to be relocated to another address. This is a 32bit absolute address. The 64bit version of this instruction, while actually a truly valid instruction, would only have 32bits in which to place the address of “TestArray.” The “fixup” generated would have to remain 32bit. This could lead to creating an image that were it ever relocated above the 4G boundary, would likely crash at best or read the wrong memory address at worst!


Ok, so now what? There is a SIB form that we can use to work around this problem, but it requires burning another register. The good news is that we now have another 8 registers with which to work. So if you have a rather complicated chunk of 32bit assembler code that burns up all the existing usable 32bit registers, you now have another group of registers that can help solve this problem without having to rework the code even more. So here’s how to fix this for 64 bit:

  var
TestArray: array[0..255] of Word;

function GetValue(Index: Integer): Word;
asm
MOVSX RAX,ECX
LEA R10,[TestArray]
MOV AX,[RAX * 2 + R10]
end;

Here, I used the volatile R10 register (R8 an R9 are used for parameter passing) to get the absolute address of TestArray using the LEA instruction. While the “address” portion of this instruction is still 32bits, it is taken as RIP-relative. In other words, this value is the “distance” from the next instruction to the variable TestArray in memory. After this instruction, R10 now contains a true 64bit address of the TestArray variable. I must still use the SIB form in the next instruction, but instead of a hard-coded “offset” I use the value in R10. Yes, there is still an implicit offset of 0, which uses the 8bit offset form.


You can see that mindless, mechanical translations of assembler code is likely to cause you some grief due to some of the subtle changes in instruction behaviors. For this very reason, we strongly recommend you use all Object Pascal code instead of resorting to assembler when possible. This will not only better ensure that your code will more likely move unchanged to other processor architectures (think ARM here folks), but you’ll not have to worry about such assembler gotchas in the future. If you’re using assembler code because “it’s faster,” I would encourage you to look closely at the algorithm used. There are many cases where the proper algorithm written in Object Pascal will yield greater gains than a simple translation to assembler using the same algorithm. Yes there are some things which you simply must do in assembler (strange, off-beat calling conventions, “LOCK” instructions for concurrency, etc…), but I would contend that many assembler functions can be moved back to Object Pascal with little impact on performance.


http://www.niftyadmin.cn/n/834744.html

相关文章

PHPMailer出现SMTP connect() failed.

很可能是端口问题,最好把$mailer->SMTPSecure和$mailer->Port分别设置为ssl与465或者tls与587,否则某些浏览器不接受不安全的链接,导致$mailer->send()时非常慢,从而导致SMTP connect() failed(我最初就是没有…

Linux系统的根目录下主要包括哪些文件夹,各自的作用

/boot: 系统启动相关的文件,如内核、initrd,以及grub(bootloader)/dev: 设备文件设备文件:块设备:随机访问,数据块字符设备:线性访问,按字符为单位设备号:主设备号(major…

VIM系统复制粘贴

1 需求 系统复制粘贴主要是满足下面两个需求。 在多个对象之间复制粘贴 vim窗口与vim窗口之间外部界面与vim窗口之间不变复制粘贴。从外部界面复制粘贴到vim窗口时,文本不发生任何变化。2 vim寄存器 2.1 寄存器介绍 不得不介绍以下vim寄存器,它是复制粘贴…

T-MBA·活动报道 | 第三期第五课:互联网商业与技术发展你真的了解吗?

随着科技与社会的发展,互联网在我们的生活中占据了越来越重要的地位。人潮拥挤的上班高峰期,我们在地铁上拿出手机打开微博、QQ、新闻资讯、APP.....看看这个世界昨天又发生了什么。到了公司,坐在电脑桌前打开RSS ,看看这个行业又有了哪些新的…

记一次简单的vue组件单元测试

记录一些在为项目引入单元测试时的一些困惑,希望可以对社区的小伙伴们有所启迪,少走一些弯路少踩一些坑。 jest, mocha, karma, chai, sinon, jsmine, vue-test-utils都是些什么东西?chai,sinon是什么?为什么以spec.js命名&#…

在verilog中使用格雷码

格雷码的一些知识:https://baike.baidu.com/item/%E6%A0%BC%E9%9B%B7%E7%A0%81/6510858?fraladdin绿色框起来的是0--15的格雷码,用红线将格雷码分为上下两部分。通过观察格雷码相邻位每次只有1位发生变化,且上下两部分,除了最高位…

隐藏权限lsattr_chattr

设置隐藏权限的命令:chattri是不能修改增加3.txt的隐藏权限:chattr i 3.txt减少3.txt的隐藏权限:chattr -i 3.txt 查看隐藏权限的命令:lsattr(查看目录下的子文件)lsattr -d 目录 (查看该目录文…

Java入门教程五(数字和日期处理)

2019独角兽企业重金招聘Python工程师标准>>> Java 提供了处理相关问题的类,包括 Math 类、Random 类、BigInteger 类、Date 类等。      Math类      Math 类封装了常用的数学运算,提供了基本的数学操作,如指数、对数、平…