There can be a lot of subtle changes going from one uarch to another.
Eg, C/C++ for x64 and ARM both use a coprocessor register to store the pointer to thread-local storage. On x64, you can offset that address and read it from memory as an atomic operation. On ARM, you need to first load it into a core register, then you can read the address with offset from memory. This makes accessing thread-local memory on ARM more complicated to do in a thread safe manner than on x64 because you need to be sure you don’t get pre-empted between those two instructions or one thread can end up with another’s thread-local memory pointer. Some details might be off, it’s been a while since I dealt with this issue. I think there was another thing that had to line up perfectly for the bug to happen (like have it happen during a user-mode context switch).
And that’s an example for two more similar uarchs. I’m not familiar with cell but I understand it to be a lot more different than x64 vs ARM. Sure, they’ve got all the documentation and probably still even have the collective expertise such that everything is known by at least someone without needing to look it up, but those individuals might not have that same understanding on the x64 side of things to see the pitfalls before running into them.
And even once they experience various bugs, they still need to be debugged to figure out what’s going on, and there’s potential that the solution isn’t even possible in the paradigm used to design whatever go-between system they were currently working on.
They are both Turing complete, so there is a 1:1 functional equivalence between them (ie, anything one can do, the other can). But it doesn’t mean both will be able to do it as fast as the other. An obvious example of this is desktops with 2024 hardware and desktops with 1990 hardware also have that 1:1 functional equivalence, but the more recent machines run circles around the older ones.
There can be a lot of subtle changes going from one uarch to another.
Eg, C/C++ for x64 and ARM both use a coprocessor register to store the pointer to thread-local storage. On x64, you can offset that address and read it from memory as an atomic operation. On ARM, you need to first load it into a core register, then you can read the address with offset from memory. This makes accessing thread-local memory on ARM more complicated to do in a thread safe manner than on x64 because you need to be sure you don’t get pre-empted between those two instructions or one thread can end up with another’s thread-local memory pointer. Some details might be off, it’s been a while since I dealt with this issue. I think there was another thing that had to line up perfectly for the bug to happen (like have it happen during a user-mode context switch).
And that’s an example for two more similar uarchs. I’m not familiar with cell but I understand it to be a lot more different than x64 vs ARM. Sure, they’ve got all the documentation and probably still even have the collective expertise such that everything is known by at least someone without needing to look it up, but those individuals might not have that same understanding on the x64 side of things to see the pitfalls before running into them.
And even once they experience various bugs, they still need to be debugged to figure out what’s going on, and there’s potential that the solution isn’t even possible in the paradigm used to design whatever go-between system they were currently working on.
They are both Turing complete, so there is a 1:1 functional equivalence between them (ie, anything one can do, the other can). But it doesn’t mean both will be able to do it as fast as the other. An obvious example of this is desktops with 2024 hardware and desktops with 1990 hardware also have that 1:1 functional equivalence, but the more recent machines run circles around the older ones.