Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Support for more CPU types. LLVM is limited to the mainstream architectures.


Eh, it’s not as one-sided as that. GCC has a larger number of targets, but LLVM supports several newer targets that GCC doesn’t, like WebAssembly and eBPF (although the latter is coming in GCC 10). But it would certainly be nice for Rust to support both sets of targets.


Compare the number of architectures by GCC:

> https://en.wikipedia.org/wiki/GNU_Compiler_Collection#Archit...

And the number of architectures by LLVM:

> https://en.wikipedia.org/wiki/LLVM#Back_ends

GCC supports vastly more targets.


That's the current state but LLVM is adding new targets (like the newly added AVR target), while the AVR target in GCC is under threat of removal: https://www.bountysource.com/issues/84630749-avr-convert-the...


GCC still supports more architectures even if AVR support is removed.


What part of my comment does that contradict?


Can the semantics of Rust handle some of the stranger architectures that GCC supports?


In theory, both GCC and LLVM take a front-end (in this case rust) and compile it down to an intermediate representation (IR). There will likely be some differences between the output from a front-end, but after successive optimisations have been applied this will likely disappear. By the time you get to generating assembly, you can't really tell the difference anymore so the semantics of the original language don't make an impact.


I'm sure there are a number of "reasonable" assumptions that aren't true–probably things like the number of bits in a byte, or the size of a particular integral type, or support for a particular platform behavior.


https://gankra.github.io/blah/rust-layouts-and-abis/ lists assumptions. As you suspected, one assumption is 8-bit bytes.


Wait, isn't this the definition of a byte?


It isn't in Texas Instruments TMS320. I will quote from http://processors.wiki.ti.com/index.php/C89_Support_in_TI_Co...

> The C standard uses the term byte to mean the minimum addressable unit in the implementation, which is char, which means a byte on these targets is 16 bits. This is in conflict with the widespread use of byte to mean 8 bits exactly. This is an unfortunate disagreement between C terminology and widespread industry terminology that TI can't do anything about.


Absolutely not. A byte is the smallest block of memory with an address. E.g you can't take the address of 7 combined bits on x86 but you can for 8.

In the past, architectures differed wildly in number of bits per byte, e.g 36 for the machine where the Pascal language was created.

Today, the industry mostly standardized on 8 bits per byte, but see e.g the PIC architecture for an example relevant today with a different choice: 8 bit bytes for data, but 10 bit bytes for instructions.

https://en.m.wikipedia.org/wiki/PIC_microcontrollers


> A byte is the smallest block of memory with an address. E.g you can't take the address of 7 combined bits on x86 but you can for 8.

I think that's an anachronistic/incorrect usage. A lot of machines (including several with 36-bit words that you mentioned) supported larger basic addressable units of memory, but didn't call these larger units "bytes", and distinguished between "bytes" and "words". In fact, one of the elements of the early RISC philosophy was that CPU support for byte accesses (as opposed to word accesses) was extraneous, based on statistics gathered from real programs. Early MIPS/Alpha/etc. machines did not support byte addressing, but the people using them still called 8 bits a byte.


Arguably the first Alphas could have had a C compiler with 64 bit bytes but that would have made porting hard. Even then they were forced to add byte operations pretty early on.


Byte is also often defined as the smallest addressable unit in a computer. Which nowadays most commonly is 8 bit, to the point where you can generally assume it, but this was different in the past (6 and 9 bit being especially common alternatives) and is still in some niches like DSPs, which sometimes only can work on wider types. But at least those then are typically powers of two, which makes it easier for many tools.


There are architectures on which sizeof(char)==sizeof(short)==sizeof(int) in C implementations, because it's the only way to produce efficient code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: