Funny fact about bootstrapping

by Manu (modified: 2015 May 29)

eiffelstudio
compiler

A few weeks ago, we had our 29th ECMA meeting. While we were reviewing the standard, we went over the special characters and their codes (Section 8.32.23). Special characters are %N, %R, %T, %(, .... and they all have a specific meaning. For example, %N means the ASCII character 0x0A, %R 0x0D, etc...

While I was looking at our code to verify that we handle special characters properly I was surprised to see that our lexer was doing something like:

if input.item (1) = '%%' then inspect input.item (2) when 'N' then output.append_character ('%N') when 'R' then output.append_character ('%R') .... end

which is quite surprising since we are saying here that if you read %N, interpret it as %N. But what does %N actually means? It is either what the standard says or something completely different.

Luckily, the first Eiffel compiler that compiled this code did a good job and it does what the standard says! And today, we rely on that first Eiffel compiler since there is nothing in the code that tells how the mapping is done.

But maybe we should not be that trusting and simply put the actual mapping which is not that much more complicated and certainly less troubling.

Comments

Eric Bezault (8 years ago 30/5/2015)
Trusting the Eiffel compiler

On the other hand, when the lexer reads the letter a, it interprets it as the Eiffel character 'a'. Should we trust the Eiffel compiler to interpret the Eiffel character 'a' as the letter a? I see no reason not to trust the Eiffel compiler. If the lexer were written in C, you would trust the C compiler just as well.

If one day we change the Eiffel language to map %N to the tab character, then the lexer would be changed like that:

when 'N' then output.append_character ('%T')

If we cannot trust the Eiffel compiler when programming in Eiffel, we would be in big trouble.

One thing which is for sure is that even though they have the same name, we have two languages. The one that we use to write our program (the lexer/compiler) and the one which will be accepted as input by our program. In this particular case we just say that %N has the same meaning in both languages (%N maps to %N). If one language was C and the other Eiffel, then the mapping would be different (%N maps to \n). And if as in the example above we have old Eiffel and new Eiffel, then %N in new Eiffel would map to %T in old Eiffel.

Let's trust the (old) Eiffel compiler.