Optimisation mixes up registers in inline assembly

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimisation mixes up registers in inline assembly

Michael Kwasnicki
Hello dear list members,

I am new to this list because I need someone to discuss a potential compiler bug.


I try to mix inline assembly in C code. In the assembly I use passed in variables which are named:


The register mapping is:
[rZero] -> R20
[rOne] -> R21
[rStart] -> R18
[rStop] -> R19
[rSize] -> R22
[bits] -> R19

If I look at the output assembly I can see the compiler did some optimisation.
R19 is shared between [bits] and [rStop].
In this case this is okay as the bit count reaches zero at the end and thus [bits] and [rStop] have the same value then.


But when I change the initial value of `bitcount` in C to 7 all hell breaks loose.


The register mapping is:
[rZero] -> R20
[rOne] -> R18
[rStart] -> R19
[rStop] -> R21
[rSize] -> R22
[bits] -> R18

This time the compiler optimisation corrupted the program.
R18 is shared between [rOne] and [bits]. [rOne] was supposed to be a constant that is cached in a register. But [bits] gets decremented.
Thus the value passed to OCR0B varies and is not constant any more changing the logic of the inline assembly.

I am not experienced with inline assembly so I might have missed something but my general expectation for assembly is that if I name things differently, they are different things (registers in this case).

The issue does not show up with -O0 but anything above.
Locally I am running avr-gcc (GCC) 9.1.0. Not sure what the exact version at godbolt is.

Cheers,

Michael

_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: Optimisation mixes up registers in inline assembly

Joseph Sible
Since you intend to modify the inputs, you need to declare them as outputs as well (with the + constraint). If you don't do this, GCC is allowed to assume you don't write to them, and can share any that have the same values.

Joseph C. Sible


On Wed, Jul 24, 2019, 13:04 Michael Kwasnicki <[hidden email]> wrote:
Hello dear list members,

I am new to this list because I need someone to discuss a potential compiler bug.


I try to mix inline assembly in C code. In the assembly I use passed in variables which are named:


The register mapping is:
[rZero] -> R20
[rOne] -> R21
[rStart] -> R18
[rStop] -> R19
[rSize] -> R22
[bits] -> R19

If I look at the output assembly I can see the compiler did some optimisation.
R19 is shared between [bits] and [rStop].
In this case this is okay as the bit count reaches zero at the end and thus [bits] and [rStop] have the same value then.


But when I change the initial value of `bitcount` in C to 7 all hell breaks loose.


The register mapping is:
[rZero] -> R20
[rOne] -> R18
[rStart] -> R19
[rStop] -> R21
[rSize] -> R22
[bits] -> R18

This time the compiler optimisation corrupted the program.
R18 is shared between [rOne] and [bits]. [rOne] was supposed to be a constant that is cached in a register. But [bits] gets decremented.
Thus the value passed to OCR0B varies and is not constant any more changing the logic of the inline assembly.

I am not experienced with inline assembly so I might have missed something but my general expectation for assembly is that if I name things differently, they are different things (registers in this case).

The issue does not show up with -O0 but anything above.
Locally I am running avr-gcc (GCC) 9.1.0. Not sure what the exact version at godbolt is.

Cheers,

Michael
_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list

_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: Optimisation mixes up registers in inline assembly

Michael Kwasnicki
So the inputs to inline assembly are by value and not by reference, as I thought. Right?
The correlation between my named registers and those C variables was pure coincidence.
And by making the named registers both - input and output - it creates a reference to the original C variable.

Regards,

Michael K.



Am 24.07.2019 um 19:17 schrieb Joseph C. Sible <[hidden email]>:

Since you intend to modify the inputs, you need to declare them as outputs as well (with the + constraint). If you don't do this, GCC is allowed to assume you don't write to them, and can share any that have the same values.

Joseph C. Sible


On Wed, Jul 24, 2019, 13:04 Michael Kwasnicki <[hidden email]> wrote:
Hello dear list members,

I am new to this list because I need someone to discuss a potential compiler bug.


I try to mix inline assembly in C code. In the assembly I use passed in variables which are named:


The register mapping is:
[rZero] -> R20
[rOne] -> R21
[rStart] -> R18
[rStop] -> R19
[rSize] -> R22
[bits] -> R19

If I look at the output assembly I can see the compiler did some optimisation.
R19 is shared between [bits] and [rStop].
In this case this is okay as the bit count reaches zero at the end and thus [bits] and [rStop] have the same value then.


But when I change the initial value of `bitcount` in C to 7 all hell breaks loose.


The register mapping is:
[rZero] -> R20
[rOne] -> R18
[rStart] -> R19
[rStop] -> R21
[rSize] -> R22
[bits] -> R18

This time the compiler optimisation corrupted the program.
R18 is shared between [rOne] and [bits]. [rOne] was supposed to be a constant that is cached in a register. But [bits] gets decremented.
Thus the value passed to OCR0B varies and is not constant any more changing the logic of the inline assembly.

I am not experienced with inline assembly so I might have missed something but my general expectation for assembly is that if I name things differently, they are different things (registers in this case).

The issue does not show up with -O0 but anything above.
Locally I am running avr-gcc (GCC) 9.1.0. Not sure what the exact version at godbolt is.

Cheers,

Michael
_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list


_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: Optimisation mixes up registers in inline assembly

Joseph Sible
That's not a very good analogy, as it's really a "reference" in both cases. Rather than value vs. reference, think const vs. non-const. When you only mark something as an input, you're effectively making it a const reference, so the compiler will make optimizations that rely on you not changing it (such as combining it with other things that happen to be equal to it). When you make it an output too, you're effectively making it a non-const reference.

Joseph C. Sible


On Wed, Jul 24, 2019, 15:41 Michael Kwasnicki <[hidden email]> wrote:
So the inputs to inline assembly are by value and not by reference, as I thought. Right?
The correlation between my named registers and those C variables was pure coincidence.
And by making the named registers both - input and output - it creates a reference to the original C variable.

Regards,

Michael K.



Am 24.07.2019 um 19:17 schrieb Joseph C. Sible <[hidden email]>:

Since you intend to modify the inputs, you need to declare them as outputs as well (with the + constraint). If you don't do this, GCC is allowed to assume you don't write to them, and can share any that have the same values.

Joseph C. Sible


On Wed, Jul 24, 2019, 13:04 Michael Kwasnicki <[hidden email]> wrote:
Hello dear list members,

I am new to this list because I need someone to discuss a potential compiler bug.


I try to mix inline assembly in C code. In the assembly I use passed in variables which are named:


The register mapping is:
[rZero] -> R20
[rOne] -> R21
[rStart] -> R18
[rStop] -> R19
[rSize] -> R22
[bits] -> R19

If I look at the output assembly I can see the compiler did some optimisation.
R19 is shared between [bits] and [rStop].
In this case this is okay as the bit count reaches zero at the end and thus [bits] and [rStop] have the same value then.


But when I change the initial value of `bitcount` in C to 7 all hell breaks loose.


The register mapping is:
[rZero] -> R20
[rOne] -> R18
[rStart] -> R19
[rStop] -> R21
[rSize] -> R22
[bits] -> R18

This time the compiler optimisation corrupted the program.
R18 is shared between [rOne] and [bits]. [rOne] was supposed to be a constant that is cached in a register. But [bits] gets decremented.
Thus the value passed to OCR0B varies and is not constant any more changing the logic of the inline assembly.

I am not experienced with inline assembly so I might have missed something but my general expectation for assembly is that if I name things differently, they are different things (registers in this case).

The issue does not show up with -O0 but anything above.
Locally I am running avr-gcc (GCC) 9.1.0. Not sure what the exact version at godbolt is.

Cheers,

Michael
_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list


_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: Optimisation mixes up registers in inline assembly

David Brown-40
In reply to this post by Michael Kwasnicki
On 24/07/2019 21:41, Michael Kwasnicki wrote:
> So the inputs to inline assembly are by value and not by reference, as I
> thought. Right?

Correct.  The inputs are used to initialise the registers.

> The correlation between my named registers and those C variables was
> pure coincidence.

No, it is pure optimisation, not coincidence.  And of course you don't
necessarily get registers for variables in C - certainly not for const
values.

> And by making the named registers both - input and output - it creates a
> reference to the original C variable.

Yes, roughly.

The big question here is why you are using assembly at all.  Why not
just write in C?

void test2(void) {
     uint8_t size = 42;
     const uint8_t start = _BV(WGM02) |  _BV(CS00);
     const uint8_t stop = 0x00;
     TCCR0A = _BV(COM0B1) | _BV(WGM01) | _BV(WGM00);
     OCR0A = 9;                              // TOP
     const uint8_t zero = 1;                 // 0 < i <= TOP
     const uint8_t one = 7;                        // 0 < i <= TOP

     OCR0B = one;
     TCCR0B = start;

     while (size--) {
         for (uint8_t bits = 0; bits < 8; bits++) {
             OCR0B = one;
         }
         OCR0B = zero;
     }
     TCCR0B = stop;
}

It is simpler, easier to understand, much easier to get right, and gives
code that is marginally smaller and faster than your hand-written
assembly.  (It is even better with -O2, which is the optimisation level
I recommend as usually giving smaller and faster code than -Os.)

mvh.,

David



>
> Regards,
>
> Michael K.
>
>
>
>> Am 24.07.2019 um 19:17 schrieb Joseph C. Sible <[hidden email]
>> <mailto:[hidden email]>>:
>>
>> Since you intend to modify the inputs, you need to declare them as
>> outputs as well (with the + constraint). If you don't do this, GCC is
>> allowed to assume you don't write to them, and can share any that have
>> the same values.
>>
>> Joseph C. Sible
>>
>>
>> On Wed, Jul 24, 2019, 13:04 Michael Kwasnicki <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>     Hello dear list members,
>>
>>     I am new to this list because I need someone to discuss a
>>     potential compiler bug.
>>
>>
>>     I try to mix inline assembly in C code. In the assembly I use
>>     passed in variables which are named:
>>
>>     _https://godbolt.org/z/w8wuJy_
>>     _
>>     _
>>     The register mapping is:
>>     [rZero] -> R20
>>     [rOne] -> R21
>>     [rStart] -> R18
>>     [rStop] -> R19
>>     [rSize] -> R22
>>     [bits] -> R19
>>
>>     If I look at the output assembly I can see the compiler did some
>>     optimisation.
>>     R19 is shared between [bits] and [rStop].
>>     In this case this is okay as the bit count reaches zero at the end
>>     and thus [bits] and [rStop] have the same value then.
>>
>>
>>     But when I change the initial value of `bitcount` in C to 7 all
>>     hell breaks loose.
>>
>>     https://godbolt.org/z/F7rrB_
>>
>>     The register mapping is:
>>     [rZero] -> R20
>>     [rOne] -> R18
>>     [rStart] -> R19
>>     [rStop] -> R21
>>     [rSize] -> R22
>>     [bits] -> R18
>>
>>     This time the compiler optimisation corrupted the program.
>>     R18 is shared between [rOne] and [bits]. [rOne] was supposed to be
>>     a constant that is cached in a register. But [bits] gets decremented.
>>     Thus the value passed to OCR0B varies and is not constant any more
>>     changing the logic of the inline assembly.
>>
>>     I am not experienced with inline assembly so I might have missed
>>     something but my general expectation for assembly is that if I
>>     name things differently, they are different things (registers in
>>     this case).
>>
>>     The issue does not show up with -O0 but anything above.
>>     Locally I am running avr-gcc (GCC) 9.1.0. Not sure what the exact
>>     version at godbolt is.
>>
>>     Cheers,
>>
>>     Michael
>>     _______________________________________________
>>     AVR-GCC-list mailing list
>>     [hidden email] <mailto:[hidden email]>
>>     https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
>>
>
>
> _______________________________________________
> AVR-GCC-list mailing list
> [hidden email]
> https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
>


_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: Optimisation mixes up registers in inline assembly

David Brown-40
In reply to this post by Joseph Sible
Hi,

Inputs to inline assembly are not exactly like references either,
because they don't have to come from variables already in registers.
They can be initialised by expressions, data in memory, etc.  They
simply ask the compiler to ensure that it finds a suitable register (or
memory address, for "m" constraints) and makes sure that the value that
is in the input expression is in that register before the inline
assembly starts.  So it is unnecessary to write "const uint_t one = 7;"
and use "one" in the inline assembly input value - it would be fine to
use "7" directly.  Of course, using the named constant can make the code
clearer and easier to maintain, which is always a good thing.

(Inline assembly inputs are closer to C++ rvalue references, given that
the inline assembly can modify the registers as it wants.  If it were a
C++ const lvalue reference, that would not be allowed.)

mvh.,

David


On 24/07/2019 22:44, Joseph C. Sible wrote:

> That's not a very good analogy, as it's really a "reference" in both
> cases. Rather than value vs. reference, think const vs. non-const. When
> you only mark something as an input, you're effectively making it a
> const reference, so the compiler will make optimizations that rely on
> you not changing it (such as combining it with other things that happen
> to be equal to it). When you make it an output too, you're effectively
> making it a non-const reference.
>
> Joseph C. Sible
>
>
> On Wed, Jul 24, 2019, 15:41 Michael Kwasnicki <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     So the inputs to inline assembly are by value and not by reference,
>     as I thought. Right?
>     The correlation between my named registers and those C variables was
>     pure coincidence.
>     And by making the named registers both - input and output - it
>     creates a reference to the original C variable.
>
>     Regards,
>
>     Michael K.


_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: Optimisation mixes up registers in inline assembly

Michael Kwasnicki
In reply to this post by David Brown-40
Hello David.
Thanks for the explanation.
I intentionally reduced the code to not bother you with details that didn't help with pointing out my issue.
So as a conclusion it was my misconception of how inline assembly works.
In the full code I not only count 8 bits and count some bytes but also load each byte, shift each bit out and change the actual duty cycle according to the bit that lands in the carry bit.
I am new to inline assembly so the first steps are not always useful but playful and educational. So I learn more from the inner workings of AVR MCUs.

Cheers,

Michael K.




> Am 24.07.2019 um 23:00 schrieb David Brown <[hidden email]>:
>
> On 24/07/2019 21:41, Michael Kwasnicki wrote:
>> So the inputs to inline assembly are by value and not by reference, as I thought. Right?
>
> Correct.  The inputs are used to initialise the registers.
>
>> The correlation between my named registers and those C variables was pure coincidence.
>
> No, it is pure optimisation, not coincidence.  And of course you don't necessarily get registers for variables in C - certainly not for const values.
>
>> And by making the named registers both - input and output - it creates a reference to the original C variable.
>
> Yes, roughly.
>
> The big question here is why you are using assembly at all.  Why not just write in C?
>
> void test2(void) {
>    uint8_t size = 42;
>    const uint8_t start = _BV(WGM02) |  _BV(CS00);
>    const uint8_t stop = 0x00;
>    TCCR0A = _BV(COM0B1) | _BV(WGM01) | _BV(WGM00);
>    OCR0A = 9;                              // TOP
>    const uint8_t zero = 1;                 // 0 < i <= TOP
>    const uint8_t one = 7;                        // 0 < i <= TOP
>
>    OCR0B = one;
>    TCCR0B = start;
>
>    while (size--) {
>        for (uint8_t bits = 0; bits < 8; bits++) {
>            OCR0B = one;
>        }
>        OCR0B = zero;
>    }
>    TCCR0B = stop;
> }
>
> It is simpler, easier to understand, much easier to get right, and gives code that is marginally smaller and faster than your hand-written assembly.  (It is even better with -O2, which is the optimisation level I recommend as usually giving smaller and faster code than -Os.)
>
> mvh.,
>
> David
>
>
>
>> Regards,
>> Michael K.
>>> Am 24.07.2019 um 19:17 schrieb Joseph C. Sible <[hidden email] <mailto:[hidden email]>>:
>>>
>>> Since you intend to modify the inputs, you need to declare them as outputs as well (with the + constraint). If you don't do this, GCC is allowed to assume you don't write to them, and can share any that have the same values.
>>>
>>> Joseph C. Sible
>>>
>>>
>>> On Wed, Jul 24, 2019, 13:04 Michael Kwasnicki <[hidden email] <mailto:[hidden email]>> wrote:
>>>
>>>    Hello dear list members,
>>>
>>>    I am new to this list because I need someone to discuss a
>>>    potential compiler bug.
>>>
>>>
>>>    I try to mix inline assembly in C code. In the assembly I use
>>>    passed in variables which are named:
>>>
>>>    _https://godbolt.org/z/w8wuJy_
>>>    _
>>>    _
>>>    The register mapping is:
>>>    [rZero] -> R20
>>>    [rOne] -> R21
>>>    [rStart] -> R18
>>>    [rStop] -> R19
>>>    [rSize] -> R22
>>>    [bits] -> R19
>>>
>>>    If I look at the output assembly I can see the compiler did some
>>>    optimisation.
>>>    R19 is shared between [bits] and [rStop].
>>>    In this case this is okay as the bit count reaches zero at the end
>>>    and thus [bits] and [rStop] have the same value then.
>>>
>>>
>>>    But when I change the initial value of `bitcount` in C to 7 all
>>>    hell breaks loose.
>>>
>>>    https://godbolt.org/z/F7rrB_
>>>
>>>    The register mapping is:
>>>    [rZero] -> R20
>>>    [rOne] -> R18
>>>    [rStart] -> R19
>>>    [rStop] -> R21
>>>    [rSize] -> R22
>>>    [bits] -> R18
>>>
>>>    This time the compiler optimisation corrupted the program.
>>>    R18 is shared between [rOne] and [bits]. [rOne] was supposed to be
>>>    a constant that is cached in a register. But [bits] gets decremented.
>>>    Thus the value passed to OCR0B varies and is not constant any more
>>>    changing the logic of the inline assembly.
>>>
>>>    I am not experienced with inline assembly so I might have missed
>>>    something but my general expectation for assembly is that if I
>>>    name things differently, they are different things (registers in
>>>    this case).
>>>
>>>    The issue does not show up with -O0 but anything above.
>>>    Locally I am running avr-gcc (GCC) 9.1.0. Not sure what the exact
>>>    version at godbolt is.
>>>
>>>    Cheers,
>>>
>>>    Michael
>>>    _______________________________________________
>>>    AVR-GCC-list mailing list
>>>    [hidden email] <mailto:[hidden email]>
>>>    https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
>>>
>> _______________________________________________
>> AVR-GCC-list mailing list
>> [hidden email]
>> https://lists.nongnu.org/mailman/listinfo/avr-gcc-list


_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: Optimisation mixes up registers in inline assembly

David Brown-40
On 24/07/2019 23:53, Michael Kwasnicki wrote:

> Hello David. Thanks for the explanation. I intentionally reduced the
> code to not bother you with details that didn't help with pointing
> out my issue. So as a conclusion it was my misconception of how
> inline assembly works. In the full code I not only count 8 bits and
> count some bytes but also load each byte, shift each bit out and
> change the actual duty cycle according to the bit that lands in the
> carry bit. I am new to inline assembly so the first steps are not
> always useful but playful and educational. So I learn more from the
> inner workings of AVR MCUs.
>
> Cheers,
>
> Michael K.

I did not think it was the full code - after all, it's called "test" and
doesn't do anything useful.  (Though a depressingly large number of
functions called "test" end up in production code...).

Inline assembly is quite specialised.  It is very rarely needed - it
should not be needed for the code you describe here.  Unless you are
writing something like task switching code for an RTOS, any inline
assembly of more than two or three instructions is probably a poor
choice from the viewpoints of maintainability, flexibility, portability,
readability, and testability.  In many cases I have seen, it is also
bigger and slower than more careful use of C or C++ and a better
understanding of the compiler.

However, it is interesting to learn how it works, and how to use it.
And it is always a good idea to understand more about the workings of
the microcontroller, especially for a small cpu like the AVR.
Understanding the assembly and ISA of such a device is important for
getting the most out of it, even when programming in C or C++.

So my advice to you is to play around with this inline assembly and
learn.  Then for your real program, scrap it and write the code in C or C++.

mvh.,

David

_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list