Avr-libc-user-manual: "Problems with reordering code"

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Avr-libc-user-manual: "Problems with reordering code"

Marcin Godlewski
Dear all,
 
I'm writing with reference to the following paragraph in avr libc manual:
 
It is stated there: "However, memory barrier works well in ensuring that all volatile accesses before and after the barrier occur in the given order with respect to the barrier. However, it does not ensure the compiler moving non-volatile-related statements across the barrier. Peter Dannegger provided a nice example of this effect:(...)". The text is followed by an example.
 
My understanding of the example is that what _probably_ made the operation being moved accross the barrier is that it involved only local data. This example doesn't show that any operations on global data can be moved accross the barrier. I think that the conclusion that a barrier prevents only operations on volatile data from being reordered accross the barrier is wrong. No operation on global data should be reordered accross the barrier, even on non-volatile data. But apparently, operations on local data can, which by the way makes some sense.
 
The definition of a barrier ensuring that "all volatile accesses before and after the barrier occur in the given order with respect to the barrier" doesn't make much sense as this is already guaranteed by the C language standard. Per standard, any operations on volatile variables must be evaluated according to the rules of the abstract machine.
 
Please let me know what is your view on the subject.
 
Best regards,Marcin Godlewski
 
 

_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Avr-libc-user-manual: "Problems with reordering code"

David Brown-4
On 07/12/16 00:42, Marcin Godlewski wrote:

> Dear all,
>  
> I'm writing with reference to the following paragraph in avr libc manual:
> http://www.nongnu.org/avr-libc/user-manual/optimization.html#optim_code_reorder
>  
> It is stated there: "However, memory barrier works well in ensuring that
> all volatile accesses before and after the barrier occur in the given
> order with respect to the barrier. However, it does not ensure the
> compiler moving non-volatile-related statements across the barrier.
> Peter Dannegger provided a nice example of this effect:(...)". The text
> is followed by an example.

There /is/ a good (IMHO) solution to this without using unnecessary
memory stores:

#define cli() __asm volatile( "cli" ::: "memory" )
#define sei() __asm volatile( "sei" ::: "memory" )
unsigned int ivar;
void test2( unsigned int val )
{
    val = 65535U / val;
    asm volatile("" :: "" (val));
    cli();
    ivar = val;
    sei();
}

The assembly here tells the compiler that we are going to use "val", so
it must be available before "executing" the assembly line.  But since
the assembly line does nothing, no extra work is involved - nothing is
stored or loaded unnecessarily.

For my own use, I have it in a macro.  There is also a macro here that
tells the compiler to "forget" what it knows about a variable - the
compiler needs to know it's state before the assembly (since the
asssembly "uses" it), and it needs to assume that the assembly might
change it.  But again, no unnecessary code is generated, and the
variables can happily remain in registers all the time.


// Ensure that "val" has been calculated before next volatile access
// by requiring it as an assembly input.  Note that only volatiles are
ordered!
#define forceDependency(val) \
                asm volatile("" :: "" (val) : )

// Tell the compiler that it no longer knows about "v", without actually
changing it.
// This can be used to break relationships or ranges that the compiler
knows due
// to array information, type-based analysis, etc.
#define forgetCompilerKnowledge(v) \
                asm volatile ("" : "+g" (v))


>  
> My understanding of the example is that what _probably_ made the
> operation being moved accross the barrier is that it involved only local
> data. This example doesn't show that any operations on global data can
> be moved accross the barrier. I think that the conclusion that a barrier
> prevents only operations on volatile data from being reordered accross
> the barrier is wrong. No operation on global data should be reordered
> accross the barrier, even on non-volatile data. But apparently,
> operations on local data can, which by the way makes some sense.
>  
> The definition of a barrier ensuring that "all volatile accesses before
> and after the barrier occur in the given order with respect to the
> barrier" doesn't make much sense as this is already guaranteed by the C
> language standard. Per standard, any operations on volatile variables
> must be evaluated according to the rules of the abstract machine.
>  
> Please let me know what is your view on the subject.
>  
> Best regards,Marcin Godlewski
>  
>  
>
>
> _______________________________________________
> AVR-GCC-list mailing list
> [hidden email]
> https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
>


_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Avr-libc-user-manual: "Problems with reordering code"

Marcin Godlewski
In reply to this post by Marcin Godlewski
Dear all,

Thanks for the reply to David. However I'm not trying to find a solution for the described issue. What I'm trying to say in this e-mail is that this part of Atmel documentation: http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html is innacurate and should be corrected. The conclusion says:

    memory barriers ensure proper ordering of volatile accesses

    memory barriers don't ensure statements with no volatile accesses to be reordered across the barrier

while it should say:

    memory barriers ensure proper ordering of global variables accesses

    memory barriers don't ensure local variables accesses to be reordered across the barrier

I don't know whether this group is the right place to post it however I do not know any better place. Hope someone here can trigger the change of the documentation and I also hope to be corrected if I am wrong.

Thanks and regards,
Marcin

W dniu 2016-12-08 13:10:29 użytkownik David Brown <[hidden email]> napisał:

> On 07/12/16 00:42, Marcin Godlewski wrote:
> > Dear all,
> >  
> > I'm writing with reference to the following paragraph in avr libc manual:
> > http://www.nongnu.org/avr-libc/user-manual/optimization.html#optim_code_reorder
> >  
> > It is stated there: "However, memory barrier works well in ensuring that
> > all volatile accesses before and after the barrier occur in the given
> > order with respect to the barrier. However, it does not ensure the
> > compiler moving non-volatile-related statements across the barrier.
> > Peter Dannegger provided a nice example of this effect:(...)". The text
> > is followed by an example.
>
> There /is/ a good (IMHO) solution to this without using unnecessary
> memory stores:
>
> #define cli() __asm volatile( "cli" ::: "memory" )
> #define sei() __asm volatile( "sei" ::: "memory" )
> unsigned int ivar;
> void test2( unsigned int val )
> {
>     val = 65535U / val;
>     asm volatile("" :: "" (val));
>     cli();
>     ivar = val;
>     sei();
> }
>
> The assembly here tells the compiler that we are going to use "val", so
> it must be available before "executing" the assembly line.  But since
> the assembly line does nothing, no extra work is involved - nothing is
> stored or loaded unnecessarily.
>
> For my own use, I have it in a macro.  There is also a macro here that
> tells the compiler to "forget" what it knows about a variable - the
> compiler needs to know it's state before the assembly (since the
> asssembly "uses" it), and it needs to assume that the assembly might
> change it.  But again, no unnecessary code is generated, and the
> variables can happily remain in registers all the time.
>
>
> // Ensure that "val" has been calculated before next volatile access
> // by requiring it as an assembly input.  Note that only volatiles are
> ordered!
> #define forceDependency(val) \
> asm volatile("" :: "" (val) : )
>
> // Tell the compiler that it no longer knows about "v", without actually
> changing it.
> // This can be used to break relationships or ranges that the compiler
> knows due
> // to array information, type-based analysis, etc.
> #define forgetCompilerKnowledge(v) \
> asm volatile ("" : "+g" (v))
>
>
> >  
> > My understanding of the example is that what _probably_ made the
> > operation being moved accross the barrier is that it involved only local
> > data. This example doesn't show that any operations on global data can
> > be moved accross the barrier. I think that the conclusion that a barrier
> > prevents only operations on volatile data from being reordered accross
> > the barrier is wrong. No operation on global data should be reordered
> > accross the barrier, even on non-volatile data. But apparently,
> > operations on local data can, which by the way makes some sense.
> >  
> > The definition of a barrier ensuring that "all volatile accesses before
> > and after the barrier occur in the given order with respect to the
> > barrier" doesn't make much sense as this is already guaranteed by the C
> > language standard. Per standard, any operations on volatile variables
> > must be evaluated according to the rules of the abstract machine.
> >  
> > Please let me know what is your view on the subject.
> >  
> > Best regards,Marcin Godlewski
> >  
> >  
> >
> >
> > _______________________________________________
> > AVR-GCC-list mailing list
> > [hidden email]
> > https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
> >
>
>




_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Avr-libc-user-manual: "Problems with reordering code"

Georg-Johann Lay-2
Marcin Godlewski schrieb:

> Dear all,
>
> Thanks for the reply to David. However I'm not trying to find a solution for the described issue. What I'm trying to say in this e-mail is that this part of Atmel documentation: http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html is innacurate and should be corrected. The conclusion says:
>
>     memory barriers ensure proper ordering of volatile accesses
>
>     memory barriers don't ensure statements with no volatile accesses to be reordered across the barrier
>
> while it should say:
>
>     memory barriers ensure proper ordering of global variables accesses
>
>     memory barriers don't ensure local variables accesses to be reordered across the barrier

At least the "local" vs. "global" is not completely correct.  After
all it's about memory accesses, and it doesn't matter if the memory
is local (e.g. local static) or if you are dereferencing a pointer
(which might point to a local auto or to an object on heap).

The code example you quoted above is actually due to a subtle
implementation detail of division, modulo and some other arithmetic
of GCC's avr backend (the division is _not_ a call actually).

IIRC the solution for me back then was -fno-tree-ter as any messing
with inline asm doesn't hit the point.

Johann

> I don't know whether this group is the right place to post it however I do not know any better place. Hope someone here can trigger the change of the documentation and I also hope to be corrected if I am wrong.
>
> Thanks and regards,
> Marcin

_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Avr-libc-user-manual: "Problems with reordering code"

David Brown-4
On 08/12/16 21:46, Georg-Johann Lay wrote:

> Marcin Godlewski schrieb:
>> Dear all,
>>
>> Thanks for the reply to David. However I'm not trying to find a
>> solution for the described issue. What I'm trying to say in this
>> e-mail is that this part of Atmel documentation:
>> http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html
>> is innacurate and should be corrected. The conclusion says:
>>
>>     memory barriers ensure proper ordering of volatile accesses
>>
>>     memory barriers don't ensure statements with no volatile accesses
>> to be reordered across the barrier
>> while it should say:
>>
>>     memory barriers ensure proper ordering of global variables accesses
>>
>>     memory barriers don't ensure local variables accesses to be
>> reordered across the barrier
>
> At least the "local" vs. "global" is not completely correct.  After
> all it's about memory accesses, and it doesn't matter if the memory
> is local (e.g. local static) or if you are dereferencing a pointer
> (which might point to a local auto or to an object on heap).
>
> The code example you quoted above is actually due to a subtle
> implementation detail of division, modulo and some other arithmetic
> of GCC's avr backend (the division is _not_ a call actually).
>
> IIRC the solution for me back then was -fno-tree-ter as any messing
> with inline asm doesn't hit the point.

Yes, that is the solution you proposed when we discussed it a good while
back (on the avrlibc list, I think).  I disagree with you somewhat here
(as I did then, though I can't remember if we discussed the details).

Changing an optimisation option like this means that code that looks
right, will run as expected - and that is a good thing.  But it also
means that the code will /only/ be correct if particular optimisation
flags are set in particular ways.  That is a very fragile situation, and
should always be avoided.  To be safe, this optimisation would have to
be completely disabled in the avr-gcc port.  I don't know how useful
this particular optimisation is in terms of generating more efficient
code, though from the gcc manual it appears very useful and is enabled
at -O1.  Clearly that determines the "cost" of this solution to the
re-ordering problem.


The use of the assembly dependency (or a nicely named macro with the
same effect) fixes the problem in this situation.  It does so regardless
of optimisation options - the compiler is required to have calculated
the result of the division before disabling interrupts, and cannot
re-order the operations.  It does so without adding any extra assembly
code or hindering any optimisations - it merely forces an order on
operations that are to be done anyway.

It has the clear disadvantage of needing extra code in the user's
source.  Like memory barriers, it is a way of giving the compiler extra
information that cannot be expressed in normal C, and which the compiler
cannot (at the moment) figure out for itself.

You say that the assembly dependency does not "hit the point".  I think
you are correct there - it is treating the symptom, not the disease.  It
is not telling the compiler that an operation should not be re-ordered,
or that division is a costly operation.  It simply tells the compiler
that we need the results of that computation /here/.  But it is a very
effective and efficient cure for this sort of problem.  Unless and until
there is some /safe/ fix in the compiler to avoid this (and I don't
count "put this compiler option in your command line" as safe), I really
do think it is the best we have.


Note, however, that the "forceDependency" macro only solves half the
problem.  Consider :

unsigned int test2b(void)
{
        unsigned int val;

        cli();
        val = ivar;
        sei();
        val = 65535 / val;
        return val;
}

In this case, the compiler could move the division backwards above the
sei(), giving a similar problem.  (It did not make the move in my brief
tests - but it /could/ do.)  I don't know if the -fno-tree-ter flag
stops that too, but the forceDependency() macro is not enough.  The
forgetCompilerKnowledge macro is the answer:

unsigned int test2b(void)
{
        unsigned int val;

        cli();
        val = ivar;
        sei();
        asm volatile ("" : "+g" (val));
        val = 65535 / val;
        return val;
}

This tells the compiler that it needs to stabilise the value of "val",
and it can't assume anything about "val" after this point in the code,
because it /might/ be read and /might/ change in the assembly code.
Again, nothing is actually generated in the assembly and we are only
forcing an ordering on the code.


Nothing would please me better here here than to have the compiler
understand that users would not want such re-ordering around cli() and
sei(), so that the problem simply goes away.  But it should not require
particular choices of compiler flags, nor should it require disabling
useful optimisations and thus generating poorer code elsewhere.

It is also worth noting that though this situation occurs because
division does not work like a normal function call, despite it using a
library call for implementation, there is nothing fundamental to stop
the compiler moving a call to foo() back or forth across a cli() or
sei() as long as the compiler is sure that no memory is accessed, no
volatiles are accessed, and there are no other externally visible
effects in foo().  If the definition of foo() is available when
compiling the code, then the compiler could well know this to be the
case.  If we replace "val = 65535U / val;" with "val = foo(val);", where
we have :

        unsigned int foo(unsigned int v) {
                return (v * v) - v;
        }

in the same compilation unit, some or all of the calculation from foo()
will be inlined and mixed with the cli().  Again, -fno-tree-ter fixes
this - at the cost of killing such mixing and optimisation in cases
where it is useful.  And again, the inline assembly fixes it at the cost
of knowing that you have to add this line of source code.

As gcc gets ever smarter with its inlining, function cloning, link-time
optimisations, etc., then this will become more and more of an issue.



Maybe the answer is that gcc needs an "execution barrier" that is
stronger than a memory barrier, because it will also block calculations
- it would act as a barrier to all local variables.  I cannot think of
any way to make such a barrier with inline assembly or the C11 fences -
I think it would take a new __builtin for gcc.  Such a feature would
have use on all embedded targets, not just AVR.

mvh.,

David



>
> Johann
>
>> I don't know whether this group is the right place to post it however
>> I do not know any better place. Hope someone here can trigger the
>> change of the documentation and I also hope to be corrected if I am
>> wrong.
>>
>> Thanks and regards,
>> Marcin


_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Loading...