Re: Avr-libc-user-manual: "Problems with reordering code"

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Avr-libc-user-manual: "Problems with reordering code"

Marcin Godlewski
Dear All,

The site http://www.nongnu.org/avr-libc/user-manual/optimization.html#optim_code_reorder/optimization_1optim_code_reorder.html still contains buggy description of memory barriers in avr-gcc. As this site is popular among avr users I think it's really worth fixing. What is more the same inaccurate article is available on Atmel doc site: http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html . Is there anybody subscribed to this mailing list who can contact the authors/maintainers of the site in order to discuss correction of the content?

Marcin Godlewski

W dniu 2016-12-10 23:25:17 użytkownik Marcin Godlewski <[hidden email]> napisał:

> W dniu 2016-12-09 10:11:55 użytkownik David Brown <[hidden email]> napisał:
> > On 08/12/16 21:46, Georg-Johann Lay wrote:
> > > Marcin Godlewski schrieb:
> > >> Dear all,
> > >>
> > >> Thanks for the reply to David. However I'm not trying to find a
> > >> solution for the described issue. What I'm trying to say in this
> > >> e-mail is that this part of Atmel documentation:
> > >> http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html
> > >> is innacurate and should be corrected. The conclusion says:
> > >>
> > >>     memory barriers ensure proper ordering of volatile accesses
> > >>
> > >>     memory barriers don't ensure statements with no volatile accesses
> > >> to be reordered across the barrier
> > >> while it should say:
> > >>
> > >>     memory barriers ensure proper ordering of global variables accesses
> > >>
> > >>     memory barriers don't ensure local variables accesses to be
> > >> reordered across the barrier
> > >
> > > At least the "local" vs. "global" is not completely correct.  After
> > > all it's about memory accesses, and it doesn't matter if the memory
> > > is local (e.g. local static) or if you are dereferencing a pointer
> > > (which might point to a local auto or to an object on heap).
> > >
> > > The code example you quoted above is actually due to a subtle
> > > implementation detail of division, modulo and some other arithmetic
> > > of GCC's avr backend (the division is _not_ a call actually).
> > >
> > > IIRC the solution for me back then was -fno-tree-ter as any messing
> > > with inline asm doesn't hit the point.
> >
> > Yes, that is the solution you proposed when we discussed it a good while
> > back (on the avrlibc list, I think).  I disagree with you somewhat here
> > (as I did then, though I can't remember if we discussed the details).
> >
> > Changing an optimisation option like this means that code that looks
> > right, will run as expected - and that is a good thing.  But it also
> > means that the code will /only/ be correct if particular optimisation
> > flags are set in particular ways.  That is a very fragile situation, and
> > should always be avoided.  To be safe, this optimisation would have to
> > be completely disabled in the avr-gcc port.  I don't know how useful
> > this particular optimisation is in terms of generating more efficient
> > code, though from the gcc manual it appears very useful and is enabled
> > at -O1.  Clearly that determines the "cost" of this solution to the
> > re-ordering problem.
> >
> >
> > The use of the assembly dependency (or a nicely named macro with the
> > same effect) fixes the problem in this situation.  It does so regardless
> > of optimisation options - the compiler is required to have calculated
> > the result of the division before disabling interrupts, and cannot
> > re-order the operations.  It does so without adding any extra assembly
> > code or hindering any optimisations - it merely forces an order on
> > operations that are to be done anyway.
> >
> > It has the clear disadvantage of needing extra code in the user's
> > source.  Like memory barriers, it is a way of giving the compiler extra
> > information that cannot be expressed in normal C, and which the compiler
> > cannot (at the moment) figure out for itself.
> >
> > You say that the assembly dependency does not "hit the point".  I think
> > you are correct there - it is treating the symptom, not the disease.  It
> > is not telling the compiler that an operation should not be re-ordered,
> > or that division is a costly operation.  It simply tells the compiler
> > that we need the results of that computation /here/.  But it is a very
> > effective and efficient cure for this sort of problem.  Unless and until
> > there is some /safe/ fix in the compiler to avoid this (and I don't
> > count "put this compiler option in your command line" as safe), I really
> > do think it is the best we have.
> >
> >
> > Note, however, that the "forceDependency" macro only solves half the
> > problem.  Consider :
> >
> > unsigned int test2b(void)
> > {
> > unsigned int val;
> >
> > cli();
> > val = ivar;
> > sei();
> > val = 65535 / val;
> > return val;
> > }
> >
> > In this case, the compiler could move the division backwards above the
> > sei(), giving a similar problem.  (It did not make the move in my brief
> > tests - but it /could/ do.)  I don't know if the -fno-tree-ter flag
> > stops that too, but the forceDependency() macro is not enough.  The
> > forgetCompilerKnowledge macro is the answer:
> >
> > unsigned int test2b(void)
> > {
> > unsigned int val;
> >
> > cli();
> > val = ivar;
> > sei();
> > asm volatile ("" : "+g" (val));
> > val = 65535 / val;
> > return val;
> > }
> >
> > This tells the compiler that it needs to stabilise the value of "val",
> > and it can't assume anything about "val" after this point in the code,
> > because it /might/ be read and /might/ change in the assembly code.
> > Again, nothing is actually generated in the assembly and we are only
> > forcing an ordering on the code.
> >
> >
> > Nothing would please me better here here than to have the compiler
> > understand that users would not want such re-ordering around cli() and
> > sei(), so that the problem simply goes away.  But it should not require
> > particular choices of compiler flags, nor should it require disabling
> > useful optimisations and thus generating poorer code elsewhere.
> >
> > It is also worth noting that though this situation occurs because
> > division does not work like a normal function call, despite it using a
> > library call for implementation, there is nothing fundamental to stop
> > the compiler moving a call to foo() back or forth across a cli() or
> > sei() as long as the compiler is sure that no memory is accessed, no
> > volatiles are accessed, and there are no other externally visible
> > effects in foo().  If the definition of foo() is available when
> > compiling the code, then the compiler could well know this to be the
> > case.  If we replace "val = 65535U / val;" with "val = foo(val);", where
> > we have :
> >
> > unsigned int foo(unsigned int v) {
> > return (v * v) - v;
> > }
> >
> > in the same compilation unit, some or all of the calculation from foo()
> > will be inlined and mixed with the cli().  Again, -fno-tree-ter fixes
> > this - at the cost of killing such mixing and optimisation in cases
> > where it is useful.  And again, the inline assembly fixes it at the cost
> > of knowing that you have to add this line of source code.
> >
> > As gcc gets ever smarter with its inlining, function cloning, link-time
> > optimisations, etc., then this will become more and more of an issue.
> >
> >
> >
> > Maybe the answer is that gcc needs an "execution barrier" that is
> > stronger than a memory barrier, because it will also block calculations
> > - it would act as a barrier to all local variables.  I cannot think of
> > any way to make such a barrier with inline assembly or the C11 fences -
> > I think it would take a new __builtin for gcc.  Such a feature would
> > have use on all embedded targets, not just AVR.
> >
> > mvh.,
> >
> > David
> >
> >
>
> I totally agree with you - a feature like "execution barrier" would be very useful. C11 made good job standardizing multi-threading features but unfortunately the features not always fits firmware development. Controlling what exactly goes into critical section is a fundamental problem, so I would even go further - why don't you propose the "execution barrier" as a new feature for the future C language standard?
>
> >
> > >
> > > Johann
> > >
> > >> I don't know whether this group is the right place to post it however
> > >> I do not know any better place. Hope someone here can trigger the
> > >> change of the documentation and I also hope to be corrected if I am
> > >> wrong.
> > >>
> > >> Thanks and regards,
> > >> Marcin
> >
> >
>
>
>
>




_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: Avr-libc-user-manual: "Problems with reordering code"

David Brown-4
You could file this as a bug on the website:
<https://savannah.nongnu.org/bugs/?group=avr-libc>

As far as I understand it, the documentation (both on the website and
the Atmel documentation) is generated directly from the library code and
comments - so this would be a change to the library source.

Feel free to quote any parts of my mails on the subject when filing the bug.

mvh.,

David


On 09/02/17 13:11, Marcin Godlewski wrote:

> Dear All,
>
> The site
> http://www.nongnu.org/avr-libc/user-manual/optimization.html#optim_code_reorder/optimization_1optim_code_reorder.html
> still contains buggy description of memory barriers in avr-gcc. As
> this site is popular among avr users I think it's really worth
> fixing. What is more the same inaccurate article is available on
> Atmel doc site:
> http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html
> . Is there anybody subscribed to this mailing list who can contact
> the authors/maintainers of the site in order to discuss correction of
> the content?
>
> Marcin Godlewski
>
> W dniu 2016-12-10 23:25:17 użytkownik Marcin Godlewski
> <[hidden email]> napisał:
>> W dniu 2016-12-09 10:11:55 użytkownik David Brown
>> <[hidden email]> napisał:
>>> On 08/12/16 21:46, Georg-Johann Lay wrote:
>>>> Marcin Godlewski schrieb:
>>>>> Dear all,
>>>>>
>>>>> Thanks for the reply to David. However I'm not trying to find
>>>>> a solution for the described issue. What I'm trying to say in
>>>>> this e-mail is that this part of Atmel documentation:
>>>>> http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html
>>>>>
>>>>>
is innacurate and should be corrected. The conclusion says:

>>>>>
>>>>> memory barriers ensure proper ordering of volatile accesses
>>>>>
>>>>> memory barriers don't ensure statements with no volatile
>>>>> accesses to be reordered across the barrier while it should
>>>>> say:
>>>>>
>>>>> memory barriers ensure proper ordering of global variables
>>>>> accesses
>>>>>
>>>>> memory barriers don't ensure local variables accesses to be
>>>>> reordered across the barrier
>>>>
>>>> At least the "local" vs. "global" is not completely correct.
>>>> After all it's about memory accesses, and it doesn't matter if
>>>> the memory is local (e.g. local static) or if you are
>>>> dereferencing a pointer (which might point to a local auto or
>>>> to an object on heap).
>>>>
>>>> The code example you quoted above is actually due to a subtle
>>>> implementation detail of division, modulo and some other
>>>> arithmetic of GCC's avr backend (the division is _not_ a call
>>>> actually).
>>>>
>>>> IIRC the solution for me back then was -fno-tree-ter as any
>>>> messing with inline asm doesn't hit the point.
>>>
>>> Yes, that is the solution you proposed when we discussed it a
>>> good while back (on the avrlibc list, I think).  I disagree with
>>> you somewhat here (as I did then, though I can't remember if we
>>> discussed the details).
>>>
>>> Changing an optimisation option like this means that code that
>>> looks right, will run as expected - and that is a good thing.
>>> But it also means that the code will /only/ be correct if
>>> particular optimisation flags are set in particular ways.  That
>>> is a very fragile situation, and should always be avoided.  To be
>>> safe, this optimisation would have to be completely disabled in
>>> the avr-gcc port.  I don't know how useful this particular
>>> optimisation is in terms of generating more efficient code,
>>> though from the gcc manual it appears very useful and is enabled
>>> at -O1.  Clearly that determines the "cost" of this solution to
>>> the re-ordering problem.
>>>
>>>
>>> The use of the assembly dependency (or a nicely named macro with
>>> the same effect) fixes the problem in this situation.  It does so
>>> regardless of optimisation options - the compiler is required to
>>> have calculated the result of the division before disabling
>>> interrupts, and cannot re-order the operations.  It does so
>>> without adding any extra assembly code or hindering any
>>> optimisations - it merely forces an order on operations that are
>>> to be done anyway.
>>>
>>> It has the clear disadvantage of needing extra code in the
>>> user's source.  Like memory barriers, it is a way of giving the
>>> compiler extra information that cannot be expressed in normal C,
>>> and which the compiler cannot (at the moment) figure out for
>>> itself.
>>>
>>> You say that the assembly dependency does not "hit the point".  I
>>> think you are correct there - it is treating the symptom, not the
>>> disease.  It is not telling the compiler that an operation should
>>> not be re-ordered, or that division is a costly operation.  It
>>> simply tells the compiler that we need the results of that
>>> computation /here/.  But it is a very effective and efficient
>>> cure for this sort of problem.  Unless and until there is some
>>> /safe/ fix in the compiler to avoid this (and I don't count "put
>>> this compiler option in your command line" as safe), I really do
>>> think it is the best we have.
>>>
>>>
>>> Note, however, that the "forceDependency" macro only solves half
>>> the problem.  Consider :
>>>
>>> unsigned int test2b(void) { unsigned int val;
>>>
>>> cli(); val = ivar; sei(); val = 65535 / val; return val; }
>>>
>>> In this case, the compiler could move the division backwards
>>> above the sei(), giving a similar problem.  (It did not make the
>>> move in my brief tests - but it /could/ do.)  I don't know if the
>>> -fno-tree-ter flag stops that too, but the forceDependency()
>>> macro is not enough.  The forgetCompilerKnowledge macro is the
>>> answer:
>>>
>>> unsigned int test2b(void) { unsigned int val;
>>>
>>> cli(); val = ivar; sei(); asm volatile ("" : "+g" (val)); val =
>>> 65535 / val; return val; }
>>>
>>> This tells the compiler that it needs to stabilise the value of
>>> "val", and it can't assume anything about "val" after this point
>>> in the code, because it /might/ be read and /might/ change in the
>>> assembly code. Again, nothing is actually generated in the
>>> assembly and we are only forcing an ordering on the code.
>>>
>>>
>>> Nothing would please me better here here than to have the
>>> compiler understand that users would not want such re-ordering
>>> around cli() and sei(), so that the problem simply goes away.
>>> But it should not require particular choices of compiler flags,
>>> nor should it require disabling useful optimisations and thus
>>> generating poorer code elsewhere.
>>>
>>> It is also worth noting that though this situation occurs
>>> because division does not work like a normal function call,
>>> despite it using a library call for implementation, there is
>>> nothing fundamental to stop the compiler moving a call to foo()
>>> back or forth across a cli() or sei() as long as the compiler is
>>> sure that no memory is accessed, no volatiles are accessed, and
>>> there are no other externally visible effects in foo().  If the
>>> definition of foo() is available when compiling the code, then
>>> the compiler could well know this to be the case.  If we replace
>>> "val = 65535U / val;" with "val = foo(val);", where we have :
>>>
>>> unsigned int foo(unsigned int v) { return (v * v) - v; }
>>>
>>> in the same compilation unit, some or all of the calculation from
>>> foo() will be inlined and mixed with the cli().  Again,
>>> -fno-tree-ter fixes this - at the cost of killing such mixing and
>>> optimisation in cases where it is useful.  And again, the inline
>>> assembly fixes it at the cost of knowing that you have to add
>>> this line of source code.
>>>
>>> As gcc gets ever smarter with its inlining, function cloning,
>>> link-time optimisations, etc., then this will become more and
>>> more of an issue.
>>>
>>>
>>>
>>> Maybe the answer is that gcc needs an "execution barrier" that
>>> is stronger than a memory barrier, because it will also block
>>> calculations - it would act as a barrier to all local variables.
>>> I cannot think of any way to make such a barrier with inline
>>> assembly or the C11 fences - I think it would take a new
>>> __builtin for gcc.  Such a feature would have use on all embedded
>>> targets, not just AVR.
>>>
>>> mvh.,
>>>
>>> David
>>>
>>>
>>
>> I totally agree with you - a feature like "execution barrier" would
>> be very useful. C11 made good job standardizing multi-threading
>> features but unfortunately the features not always fits firmware
>> development. Controlling what exactly goes into critical section is
>> a fundamental problem, so I would even go further - why don't you
>> propose the "execution barrier" as a new feature for the future C
>> language standard?
>>
>>>
>>>>
>>>> Johann
>>>>
>>>>> I don't know whether this group is the right place to post it
>>>>> however I do not know any better place. Hope someone here can
>>>>> trigger the change of the documentation and I also hope to be
>>>>> corrected if I am wrong.
>>>>>
>>>>> Thanks and regards, Marcin
>>>
>>>
>>
>>
>>
>>
>
>
>


_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: Avr-libc-user-manual: "Problems with reordering code"

Bob Paddock-3
Are the functions like macros in atomic.h correct?
They attempt to deal properly with critical sections/code motion etc,
in what this thread is discussing.

http://www.nongnu.org/avr-libc/user-manual/group__util__atomic.html



On Thu, Feb 9, 2017 at 9:14 AM, David Brown <[hidden email]> wrote:

> You could file this as a bug on the website:
> <https://savannah.nongnu.org/bugs/?group=avr-libc>
>
> As far as I understand it, the documentation (both on the website and
> the Atmel documentation) is generated directly from the library code and
> comments - so this would be a change to the library source.
>
> Feel free to quote any parts of my mails on the subject when filing the bug.
>
> mvh.,
>
> David
>
>
> On 09/02/17 13:11, Marcin Godlewski wrote:
>> Dear All,
>>
>> The site
>> http://www.nongnu.org/avr-libc/user-manual/optimization.html#optim_code_reorder/optimization_1optim_code_reorder.html
>> still contains buggy description of memory barriers in avr-gcc. As
>> this site is popular among avr users I think it's really worth
>> fixing. What is more the same inaccurate article is available on
>> Atmel doc site:
>> http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html
>> . Is there anybody subscribed to this mailing list who can contact
>> the authors/maintainers of the site in order to discuss correction of
>> the content?
>>
>> Marcin Godlewski
>>
>> W dniu 2016-12-10 23:25:17 użytkownik Marcin Godlewski
>> <[hidden email]> napisał:
>>> W dniu 2016-12-09 10:11:55 użytkownik David Brown
>>> <[hidden email]> napisał:
>>>> On 08/12/16 21:46, Georg-Johann Lay wrote:
>>>>> Marcin Godlewski schrieb:
>>>>>> Dear all,
>>>>>>
>>>>>> Thanks for the reply to David. However I'm not trying to find
>>>>>> a solution for the described issue. What I'm trying to say in
>>>>>> this e-mail is that this part of Atmel documentation:
>>>>>> http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html
>>>>>>
>>>>>>
> is innacurate and should be corrected. The conclusion says:
>>>>>>
>>>>>> memory barriers ensure proper ordering of volatile accesses
>>>>>>
>>>>>> memory barriers don't ensure statements with no volatile
>>>>>> accesses to be reordered across the barrier while it should
>>>>>> say:
>>>>>>
>>>>>> memory barriers ensure proper ordering of global variables
>>>>>> accesses
>>>>>>
>>>>>> memory barriers don't ensure local variables accesses to be
>>>>>> reordered across the barrier
>>>>>
>>>>> At least the "local" vs. "global" is not completely correct.
>>>>> After all it's about memory accesses, and it doesn't matter if
>>>>> the memory is local (e.g. local static) or if you are
>>>>> dereferencing a pointer (which might point to a local auto or
>>>>> to an object on heap).
>>>>>
>>>>> The code example you quoted above is actually due to a subtle
>>>>> implementation detail of division, modulo and some other
>>>>> arithmetic of GCC's avr backend (the division is _not_ a call
>>>>> actually).
>>>>>
>>>>> IIRC the solution for me back then was -fno-tree-ter as any
>>>>> messing with inline asm doesn't hit the point.
>>>>
>>>> Yes, that is the solution you proposed when we discussed it a
>>>> good while back (on the avrlibc list, I think).  I disagree with
>>>> you somewhat here (as I did then, though I can't remember if we
>>>> discussed the details).
>>>>
>>>> Changing an optimisation option like this means that code that
>>>> looks right, will run as expected - and that is a good thing.
>>>> But it also means that the code will /only/ be correct if
>>>> particular optimisation flags are set in particular ways.  That
>>>> is a very fragile situation, and should always be avoided.  To be
>>>> safe, this optimisation would have to be completely disabled in
>>>> the avr-gcc port.  I don't know how useful this particular
>>>> optimisation is in terms of generating more efficient code,
>>>> though from the gcc manual it appears very useful and is enabled
>>>> at -O1.  Clearly that determines the "cost" of this solution to
>>>> the re-ordering problem.
>>>>
>>>>
>>>> The use of the assembly dependency (or a nicely named macro with
>>>> the same effect) fixes the problem in this situation.  It does so
>>>> regardless of optimisation options - the compiler is required to
>>>> have calculated the result of the division before disabling
>>>> interrupts, and cannot re-order the operations.  It does so
>>>> without adding any extra assembly code or hindering any
>>>> optimisations - it merely forces an order on operations that are
>>>> to be done anyway.
>>>>
>>>> It has the clear disadvantage of needing extra code in the
>>>> user's source.  Like memory barriers, it is a way of giving the
>>>> compiler extra information that cannot be expressed in normal C,
>>>> and which the compiler cannot (at the moment) figure out for
>>>> itself.
>>>>
>>>> You say that the assembly dependency does not "hit the point".  I
>>>> think you are correct there - it is treating the symptom, not the
>>>> disease.  It is not telling the compiler that an operation should
>>>> not be re-ordered, or that division is a costly operation.  It
>>>> simply tells the compiler that we need the results of that
>>>> computation /here/.  But it is a very effective and efficient
>>>> cure for this sort of problem.  Unless and until there is some
>>>> /safe/ fix in the compiler to avoid this (and I don't count "put
>>>> this compiler option in your command line" as safe), I really do
>>>> think it is the best we have.
>>>>
>>>>
>>>> Note, however, that the "forceDependency" macro only solves half
>>>> the problem.  Consider :
>>>>
>>>> unsigned int test2b(void) { unsigned int val;
>>>>
>>>> cli(); val = ivar; sei(); val = 65535 / val; return val; }
>>>>
>>>> In this case, the compiler could move the division backwards
>>>> above the sei(), giving a similar problem.  (It did not make the
>>>> move in my brief tests - but it /could/ do.)  I don't know if the
>>>> -fno-tree-ter flag stops that too, but the forceDependency()
>>>> macro is not enough.  The forgetCompilerKnowledge macro is the
>>>> answer:
>>>>
>>>> unsigned int test2b(void) { unsigned int val;
>>>>
>>>> cli(); val = ivar; sei(); asm volatile ("" : "+g" (val)); val =
>>>> 65535 / val; return val; }
>>>>
>>>> This tells the compiler that it needs to stabilise the value of
>>>> "val", and it can't assume anything about "val" after this point
>>>> in the code, because it /might/ be read and /might/ change in the
>>>> assembly code. Again, nothing is actually generated in the
>>>> assembly and we are only forcing an ordering on the code.
>>>>
>>>>
>>>> Nothing would please me better here here than to have the
>>>> compiler understand that users would not want such re-ordering
>>>> around cli() and sei(), so that the problem simply goes away.
>>>> But it should not require particular choices of compiler flags,
>>>> nor should it require disabling useful optimisations and thus
>>>> generating poorer code elsewhere.
>>>>
>>>> It is also worth noting that though this situation occurs
>>>> because division does not work like a normal function call,
>>>> despite it using a library call for implementation, there is
>>>> nothing fundamental to stop the compiler moving a call to foo()
>>>> back or forth across a cli() or sei() as long as the compiler is
>>>> sure that no memory is accessed, no volatiles are accessed, and
>>>> there are no other externally visible effects in foo().  If the
>>>> definition of foo() is available when compiling the code, then
>>>> the compiler could well know this to be the case.  If we replace
>>>> "val = 65535U / val;" with "val = foo(val);", where we have :
>>>>
>>>> unsigned int foo(unsigned int v) { return (v * v) - v; }
>>>>
>>>> in the same compilation unit, some or all of the calculation from
>>>> foo() will be inlined and mixed with the cli().  Again,
>>>> -fno-tree-ter fixes this - at the cost of killing such mixing and
>>>> optimisation in cases where it is useful.  And again, the inline
>>>> assembly fixes it at the cost of knowing that you have to add
>>>> this line of source code.
>>>>
>>>> As gcc gets ever smarter with its inlining, function cloning,
>>>> link-time optimisations, etc., then this will become more and
>>>> more of an issue.
>>>>
>>>>
>>>>
>>>> Maybe the answer is that gcc needs an "execution barrier" that
>>>> is stronger than a memory barrier, because it will also block
>>>> calculations - it would act as a barrier to all local variables.
>>>> I cannot think of any way to make such a barrier with inline
>>>> assembly or the C11 fences - I think it would take a new
>>>> __builtin for gcc.  Such a feature would have use on all embedded
>>>> targets, not just AVR.
>>>>
>>>> mvh.,
>>>>
>>>> David
>>>>
>>>>
>>>
>>> I totally agree with you - a feature like "execution barrier" would
>>> be very useful. C11 made good job standardizing multi-threading
>>> features but unfortunately the features not always fits firmware
>>> development. Controlling what exactly goes into critical section is
>>> a fundamental problem, so I would even go further - why don't you
>>> propose the "execution barrier" as a new feature for the future C
>>> language standard?
>>>
>>>>
>>>>>
>>>>> Johann
>>>>>
>>>>>> I don't know whether this group is the right place to post it
>>>>>> however I do not know any better place. Hope someone here can
>>>>>> trigger the change of the documentation and I also hope to be
>>>>>> corrected if I am wrong.
>>>>>>
>>>>>> Thanks and regards, Marcin
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>
>
> _______________________________________________
> AVR-GCC-list mailing list
> [hidden email]
> https://lists.nongnu.org/mailman/listinfo/avr-gcc-list

_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: Avr-libc-user-manual: "Problems with reordering code"

David Brown-4
The functions in atomic.h are correct, but their description is not
quite accurate.  The description says that inside an atomic block, the
code cannot be interrupted.  But as we have seen, /code/ can be moved
inside and outside of an atomic block just as it can be moved around the
"cli()" instruction (the atomic.h macros generate the same
instructions).  The blocks are atomic with respect to memory access -
any access to memory within an atomic block will be completed without
interruption.

Note that C, even with gcc extensions and inline assembly, has no way to
express "do this code here" - you can have barriers to movement of
memory accesses, but not instruction execution.  Note also that it is
only control of the memory access that is needed for code correctness -
moving instruction execution affects timing, but not the results.

mvh.,

David


On 09/02/17 18:35, Bob Paddock wrote:

> Are the functions like macros in atomic.h correct?
> They attempt to deal properly with critical sections/code motion etc,
> in what this thread is discussing.
>
> http://www.nongnu.org/avr-libc/user-manual/group__util__atomic.html
>
>
>
> On Thu, Feb 9, 2017 at 9:14 AM, David Brown <[hidden email]> wrote:
>> You could file this as a bug on the website:
>> <https://savannah.nongnu.org/bugs/?group=avr-libc>
>>
>> As far as I understand it, the documentation (both on the website and
>> the Atmel documentation) is generated directly from the library code and
>> comments - so this would be a change to the library source.
>>
>> Feel free to quote any parts of my mails on the subject when filing the bug.
>>
>> mvh.,
>>
>> David
>>
>>
>> On 09/02/17 13:11, Marcin Godlewski wrote:
>>> Dear All,
>>>
>>> The site
>>> http://www.nongnu.org/avr-libc/user-manual/optimization.html#optim_code_reorder/optimization_1optim_code_reorder.html
>>> still contains buggy description of memory barriers in avr-gcc. As
>>> this site is popular among avr users I think it's really worth
>>> fixing. What is more the same inaccurate article is available on
>>> Atmel doc site:
>>> http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html
>>> . Is there anybody subscribed to this mailing list who can contact
>>> the authors/maintainers of the site in order to discuss correction of
>>> the content?
>>>
>>> Marcin Godlewski
>>>
>>> W dniu 2016-12-10 23:25:17 użytkownik Marcin Godlewski
>>> <[hidden email]> napisał:
>>>> W dniu 2016-12-09 10:11:55 użytkownik David Brown
>>>> <[hidden email]> napisał:
>>>>> On 08/12/16 21:46, Georg-Johann Lay wrote:
>>>>>> Marcin Godlewski schrieb:
>>>>>>> Dear all,
>>>>>>>
>>>>>>> Thanks for the reply to David. However I'm not trying to find
>>>>>>> a solution for the described issue. What I'm trying to say in
>>>>>>> this e-mail is that this part of Atmel documentation:
>>>>>>> http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html
>>>>>>>
>>>>>>>
>> is innacurate and should be corrected. The conclusion says:
>>>>>>>
>>>>>>> memory barriers ensure proper ordering of volatile accesses
>>>>>>>
>>>>>>> memory barriers don't ensure statements with no volatile
>>>>>>> accesses to be reordered across the barrier while it should
>>>>>>> say:
>>>>>>>
>>>>>>> memory barriers ensure proper ordering of global variables
>>>>>>> accesses
>>>>>>>
>>>>>>> memory barriers don't ensure local variables accesses to be
>>>>>>> reordered across the barrier
>>>>>>
>>>>>> At least the "local" vs. "global" is not completely correct.
>>>>>> After all it's about memory accesses, and it doesn't matter if
>>>>>> the memory is local (e.g. local static) or if you are
>>>>>> dereferencing a pointer (which might point to a local auto or
>>>>>> to an object on heap).
>>>>>>
>>>>>> The code example you quoted above is actually due to a subtle
>>>>>> implementation detail of division, modulo and some other
>>>>>> arithmetic of GCC's avr backend (the division is _not_ a call
>>>>>> actually).
>>>>>>
>>>>>> IIRC the solution for me back then was -fno-tree-ter as any
>>>>>> messing with inline asm doesn't hit the point.
>>>>>
>>>>> Yes, that is the solution you proposed when we discussed it a
>>>>> good while back (on the avrlibc list, I think).  I disagree with
>>>>> you somewhat here (as I did then, though I can't remember if we
>>>>> discussed the details).
>>>>>
>>>>> Changing an optimisation option like this means that code that
>>>>> looks right, will run as expected - and that is a good thing.
>>>>> But it also means that the code will /only/ be correct if
>>>>> particular optimisation flags are set in particular ways.  That
>>>>> is a very fragile situation, and should always be avoided.  To be
>>>>> safe, this optimisation would have to be completely disabled in
>>>>> the avr-gcc port.  I don't know how useful this particular
>>>>> optimisation is in terms of generating more efficient code,
>>>>> though from the gcc manual it appears very useful and is enabled
>>>>> at -O1.  Clearly that determines the "cost" of this solution to
>>>>> the re-ordering problem.
>>>>>
>>>>>
>>>>> The use of the assembly dependency (or a nicely named macro with
>>>>> the same effect) fixes the problem in this situation.  It does so
>>>>> regardless of optimisation options - the compiler is required to
>>>>> have calculated the result of the division before disabling
>>>>> interrupts, and cannot re-order the operations.  It does so
>>>>> without adding any extra assembly code or hindering any
>>>>> optimisations - it merely forces an order on operations that are
>>>>> to be done anyway.
>>>>>
>>>>> It has the clear disadvantage of needing extra code in the
>>>>> user's source.  Like memory barriers, it is a way of giving the
>>>>> compiler extra information that cannot be expressed in normal C,
>>>>> and which the compiler cannot (at the moment) figure out for
>>>>> itself.
>>>>>
>>>>> You say that the assembly dependency does not "hit the point".  I
>>>>> think you are correct there - it is treating the symptom, not the
>>>>> disease.  It is not telling the compiler that an operation should
>>>>> not be re-ordered, or that division is a costly operation.  It
>>>>> simply tells the compiler that we need the results of that
>>>>> computation /here/.  But it is a very effective and efficient
>>>>> cure for this sort of problem.  Unless and until there is some
>>>>> /safe/ fix in the compiler to avoid this (and I don't count "put
>>>>> this compiler option in your command line" as safe), I really do
>>>>> think it is the best we have.
>>>>>
>>>>>
>>>>> Note, however, that the "forceDependency" macro only solves half
>>>>> the problem.  Consider :
>>>>>
>>>>> unsigned int test2b(void) { unsigned int val;
>>>>>
>>>>> cli(); val = ivar; sei(); val = 65535 / val; return val; }
>>>>>
>>>>> In this case, the compiler could move the division backwards
>>>>> above the sei(), giving a similar problem.  (It did not make the
>>>>> move in my brief tests - but it /could/ do.)  I don't know if the
>>>>> -fno-tree-ter flag stops that too, but the forceDependency()
>>>>> macro is not enough.  The forgetCompilerKnowledge macro is the
>>>>> answer:
>>>>>
>>>>> unsigned int test2b(void) { unsigned int val;
>>>>>
>>>>> cli(); val = ivar; sei(); asm volatile ("" : "+g" (val)); val =
>>>>> 65535 / val; return val; }
>>>>>
>>>>> This tells the compiler that it needs to stabilise the value of
>>>>> "val", and it can't assume anything about "val" after this point
>>>>> in the code, because it /might/ be read and /might/ change in the
>>>>> assembly code. Again, nothing is actually generated in the
>>>>> assembly and we are only forcing an ordering on the code.
>>>>>
>>>>>
>>>>> Nothing would please me better here here than to have the
>>>>> compiler understand that users would not want such re-ordering
>>>>> around cli() and sei(), so that the problem simply goes away.
>>>>> But it should not require particular choices of compiler flags,
>>>>> nor should it require disabling useful optimisations and thus
>>>>> generating poorer code elsewhere.
>>>>>
>>>>> It is also worth noting that though this situation occurs
>>>>> because division does not work like a normal function call,
>>>>> despite it using a library call for implementation, there is
>>>>> nothing fundamental to stop the compiler moving a call to foo()
>>>>> back or forth across a cli() or sei() as long as the compiler is
>>>>> sure that no memory is accessed, no volatiles are accessed, and
>>>>> there are no other externally visible effects in foo().  If the
>>>>> definition of foo() is available when compiling the code, then
>>>>> the compiler could well know this to be the case.  If we replace
>>>>> "val = 65535U / val;" with "val = foo(val);", where we have :
>>>>>
>>>>> unsigned int foo(unsigned int v) { return (v * v) - v; }
>>>>>
>>>>> in the same compilation unit, some or all of the calculation from
>>>>> foo() will be inlined and mixed with the cli().  Again,
>>>>> -fno-tree-ter fixes this - at the cost of killing such mixing and
>>>>> optimisation in cases where it is useful.  And again, the inline
>>>>> assembly fixes it at the cost of knowing that you have to add
>>>>> this line of source code.
>>>>>
>>>>> As gcc gets ever smarter with its inlining, function cloning,
>>>>> link-time optimisations, etc., then this will become more and
>>>>> more of an issue.
>>>>>
>>>>>
>>>>>
>>>>> Maybe the answer is that gcc needs an "execution barrier" that
>>>>> is stronger than a memory barrier, because it will also block
>>>>> calculations - it would act as a barrier to all local variables.
>>>>> I cannot think of any way to make such a barrier with inline
>>>>> assembly or the C11 fences - I think it would take a new
>>>>> __builtin for gcc.  Such a feature would have use on all embedded
>>>>> targets, not just AVR.
>>>>>
>>>>> mvh.,
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>
>>>> I totally agree with you - a feature like "execution barrier" would
>>>> be very useful. C11 made good job standardizing multi-threading
>>>> features but unfortunately the features not always fits firmware
>>>> development. Controlling what exactly goes into critical section is
>>>> a fundamental problem, so I would even go further - why don't you
>>>> propose the "execution barrier" as a new feature for the future C
>>>> language standard?
>>>>
>>>>>
>>>>>>
>>>>>> Johann
>>>>>>
>>>>>>> I don't know whether this group is the right place to post it
>>>>>>> however I do not know any better place. Hope someone here can
>>>>>>> trigger the change of the documentation and I also hope to be
>>>>>>> corrected if I am wrong.
>>>>>>>
>>>>>>> Thanks and regards, Marcin
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> AVR-GCC-list mailing list
>> [hidden email]
>> https://lists.nongnu.org/mailman/listinfo/avr-gcc-list

_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: Avr-libc-user-manual: "Problems with reordering code"

Bob Paddock-3
On Thu, Feb 9, 2017 at 1:13 PM, David Brown <[hidden email]> wrote:

> Note also that it is only control
> of the memory access that is needed for code correctness - moving
> instruction execution affects timing, but not the results.

As I'm sure you are aware even if the code if mathematically correct,
giving the correct answers, failing to meet timing goals can be
catastrophic in Real Time Systems.

So the only solution is to use different parts, such as ARM, with its
Instruction Synchronization Barrier (ISB) [that can waste thousands of
cycles]?

_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: Avr-libc-user-manual: "Problems with reordering code"

David Brown-4


On 09/02/17 19:49, Bob Paddock wrote:
> On Thu, Feb 9, 2017 at 1:13 PM, David Brown <[hidden email]> wrote:
>
>> Note also that it is only control
>> of the memory access that is needed for code correctness - moving
>> instruction execution affects timing, but not the results.
>
> As I'm sure you are aware even if the code if mathematically correct,
> giving the correct answers, failing to meet timing goals can be
> catastrophic in Real Time Systems.

Yes.

>
> So the only solution is to use different parts, such as ARM, with its
> Instruction Synchronization Barrier (ISB) [that can waste thousands of
> cycles]?
>

No.  That won't help.  The compiler can re-arrange "harmless" execution
and calculations around an inline assembly instruction with the "ISB"
opcode (or any similar opcode), even if the inline assembly is marked
"volatile".  Given "isb(); x = a/b; isb(); return x;" the compiler is
free to order the division before or after either of these isb() calls,
if the compiler knows that the isb() function/macro/inline assembly does
not affect the results of the calculation.

In C, the only things that can be ordered are /visible/ effects.  Those
are volatile memory accesses, file I/O, program start/stop, and calling
external code with unknown effect (since that external code could have
visible effects).  C11 adds some atomic access and synchronisation
functions, and C implementations can add more - gcc adds volatile inline
assembly.  Other things - non-volatile memory accesses, and
calculations, can be shuffled around at will, including back and forth
across volatile accesses.

The only way I know of to force control of the order of execution is to
make "visible" dependencies on the results or the perquisites to the
calculations.  (And if the calculation does not have any results, it is
not actually needed at all as far as C is concerned.)  You have to use
the techniques I gave in my earlier posts here - they are as convenient,
safe, and efficient as it gets.  But it does mean that there is /no/
"general" execution barrier, in the same way that "asm ("":::"memory")"
is a general memory barrier.

If someone knows differently, I'd be happy to be corrected here.


_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: Avr-libc-user-manual: "Problems with reordering code"

Marcin Godlewski
In reply to this post by Marcin Godlewski
David,

Thanks for pointing out the right place to submit the bug report. I have submitted one here:
https://savannah.nongnu.org/bugs/index.php?50270

Best regards,
Marcin Godlewski

W dniu 2017-02-09 15:14:11 użytkownik David Brown <[hidden email]> napisał:

> You could file this as a bug on the website:
> <https://savannah.nongnu.org/bugs/?group=avr-libc>
>
> As far as I understand it, the documentation (both on the website and
> the Atmel documentation) is generated directly from the library code and
> comments - so this would be a change to the library source.
>
> Feel free to quote any parts of my mails on the subject when filing the bug.
>
> mvh.,
>
> David
>
>
> On 09/02/17 13:11, Marcin Godlewski wrote:
> > Dear All,
> >
> > The site
> > http://www.nongnu.org/avr-libc/user-manual/optimization.html#optim_code_reorder/optimization_1optim_code_reorder.html
> > still contains buggy description of memory barriers in avr-gcc. As
> > this site is popular among avr users I think it's really worth
> > fixing. What is more the same inaccurate article is available on
> > Atmel doc site:
> > http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html
> > . Is there anybody subscribed to this mailing list who can contact
> > the authors/maintainers of the site in order to discuss correction of
> > the content?
> >
> > Marcin Godlewski
> >
> > W dniu 2016-12-10 23:25:17 użytkownik Marcin Godlewski
> > <[hidden email]> napisał:
> >> W dniu 2016-12-09 10:11:55 użytkownik David Brown
> >> <[hidden email]> napisał:
> >>> On 08/12/16 21:46, Georg-Johann Lay wrote:
> >>>> Marcin Godlewski schrieb:
> >>>>> Dear all,
> >>>>>
> >>>>> Thanks for the reply to David. However I'm not trying to find
> >>>>> a solution for the described issue. What I'm trying to say in
> >>>>> this e-mail is that this part of Atmel documentation:
> >>>>> http://www.atmel.com/webdoc/AVRLibcReferenceManual/optimization_1optim_code_reorder.html
> >>>>>
> >>>>>
> is innacurate and should be corrected. The conclusion says:
> >>>>>
> >>>>> memory barriers ensure proper ordering of volatile accesses
> >>>>>
> >>>>> memory barriers don't ensure statements with no volatile
> >>>>> accesses to be reordered across the barrier while it should
> >>>>> say:
> >>>>>
> >>>>> memory barriers ensure proper ordering of global variables
> >>>>> accesses
> >>>>>
> >>>>> memory barriers don't ensure local variables accesses to be
> >>>>> reordered across the barrier
> >>>>
> >>>> At least the "local" vs. "global" is not completely correct.
> >>>> After all it's about memory accesses, and it doesn't matter if
> >>>> the memory is local (e.g. local static) or if you are
> >>>> dereferencing a pointer (which might point to a local auto or
> >>>> to an object on heap).
> >>>>
> >>>> The code example you quoted above is actually due to a subtle
> >>>> implementation detail of division, modulo and some other
> >>>> arithmetic of GCC's avr backend (the division is _not_ a call
> >>>> actually).
> >>>>
> >>>> IIRC the solution for me back then was -fno-tree-ter as any
> >>>> messing with inline asm doesn't hit the point.
> >>>
> >>> Yes, that is the solution you proposed when we discussed it a
> >>> good while back (on the avrlibc list, I think).  I disagree with
> >>> you somewhat here (as I did then, though I can't remember if we
> >>> discussed the details).
> >>>
> >>> Changing an optimisation option like this means that code that
> >>> looks right, will run as expected - and that is a good thing.
> >>> But it also means that the code will /only/ be correct if
> >>> particular optimisation flags are set in particular ways.  That
> >>> is a very fragile situation, and should always be avoided.  To be
> >>> safe, this optimisation would have to be completely disabled in
> >>> the avr-gcc port.  I don't know how useful this particular
> >>> optimisation is in terms of generating more efficient code,
> >>> though from the gcc manual it appears very useful and is enabled
> >>> at -O1.  Clearly that determines the "cost" of this solution to
> >>> the re-ordering problem.
> >>>
> >>>
> >>> The use of the assembly dependency (or a nicely named macro with
> >>> the same effect) fixes the problem in this situation.  It does so
> >>> regardless of optimisation options - the compiler is required to
> >>> have calculated the result of the division before disabling
> >>> interrupts, and cannot re-order the operations.  It does so
> >>> without adding any extra assembly code or hindering any
> >>> optimisations - it merely forces an order on operations that are
> >>> to be done anyway.
> >>>
> >>> It has the clear disadvantage of needing extra code in the
> >>> user's source.  Like memory barriers, it is a way of giving the
> >>> compiler extra information that cannot be expressed in normal C,
> >>> and which the compiler cannot (at the moment) figure out for
> >>> itself.
> >>>
> >>> You say that the assembly dependency does not "hit the point".  I
> >>> think you are correct there - it is treating the symptom, not the
> >>> disease.  It is not telling the compiler that an operation should
> >>> not be re-ordered, or that division is a costly operation.  It
> >>> simply tells the compiler that we need the results of that
> >>> computation /here/.  But it is a very effective and efficient
> >>> cure for this sort of problem.  Unless and until there is some
> >>> /safe/ fix in the compiler to avoid this (and I don't count "put
> >>> this compiler option in your command line" as safe), I really do
> >>> think it is the best we have.
> >>>
> >>>
> >>> Note, however, that the "forceDependency" macro only solves half
> >>> the problem.  Consider :
> >>>
> >>> unsigned int test2b(void) { unsigned int val;
> >>>
> >>> cli(); val = ivar; sei(); val = 65535 / val; return val; }
> >>>
> >>> In this case, the compiler could move the division backwards
> >>> above the sei(), giving a similar problem.  (It did not make the
> >>> move in my brief tests - but it /could/ do.)  I don't know if the
> >>> -fno-tree-ter flag stops that too, but the forceDependency()
> >>> macro is not enough.  The forgetCompilerKnowledge macro is the
> >>> answer:
> >>>
> >>> unsigned int test2b(void) { unsigned int val;
> >>>
> >>> cli(); val = ivar; sei(); asm volatile ("" : "+g" (val)); val =
> >>> 65535 / val; return val; }
> >>>
> >>> This tells the compiler that it needs to stabilise the value of
> >>> "val", and it can't assume anything about "val" after this point
> >>> in the code, because it /might/ be read and /might/ change in the
> >>> assembly code. Again, nothing is actually generated in the
> >>> assembly and we are only forcing an ordering on the code.
> >>>
> >>>
> >>> Nothing would please me better here here than to have the
> >>> compiler understand that users would not want such re-ordering
> >>> around cli() and sei(), so that the problem simply goes away.
> >>> But it should not require particular choices of compiler flags,
> >>> nor should it require disabling useful optimisations and thus
> >>> generating poorer code elsewhere.
> >>>
> >>> It is also worth noting that though this situation occurs
> >>> because division does not work like a normal function call,
> >>> despite it using a library call for implementation, there is
> >>> nothing fundamental to stop the compiler moving a call to foo()
> >>> back or forth across a cli() or sei() as long as the compiler is
> >>> sure that no memory is accessed, no volatiles are accessed, and
> >>> there are no other externally visible effects in foo().  If the
> >>> definition of foo() is available when compiling the code, then
> >>> the compiler could well know this to be the case.  If we replace
> >>> "val = 65535U / val;" with "val = foo(val);", where we have :
> >>>
> >>> unsigned int foo(unsigned int v) { return (v * v) - v; }
> >>>
> >>> in the same compilation unit, some or all of the calculation from
> >>> foo() will be inlined and mixed with the cli().  Again,
> >>> -fno-tree-ter fixes this - at the cost of killing such mixing and
> >>> optimisation in cases where it is useful.  And again, the inline
> >>> assembly fixes it at the cost of knowing that you have to add
> >>> this line of source code.
> >>>
> >>> As gcc gets ever smarter with its inlining, function cloning,
> >>> link-time optimisations, etc., then this will become more and
> >>> more of an issue.
> >>>
> >>>
> >>>
> >>> Maybe the answer is that gcc needs an "execution barrier" that
> >>> is stronger than a memory barrier, because it will also block
> >>> calculations - it would act as a barrier to all local variables.
> >>> I cannot think of any way to make such a barrier with inline
> >>> assembly or the C11 fences - I think it would take a new
> >>> __builtin for gcc.  Such a feature would have use on all embedded
> >>> targets, not just AVR.
> >>>
> >>> mvh.,
> >>>
> >>> David
> >>>
> >>>
> >>
> >> I totally agree with you - a feature like "execution barrier" would
> >> be very useful. C11 made good job standardizing multi-threading
> >> features but unfortunately the features not always fits firmware
> >> development. Controlling what exactly goes into critical section is
> >> a fundamental problem, so I would even go further - why don't you
> >> propose the "execution barrier" as a new feature for the future C
> >> language standard?
> >>
> >>>
> >>>>
> >>>> Johann
> >>>>
> >>>>> I don't know whether this group is the right place to post it
> >>>>> however I do not know any better place. Hope someone here can
> >>>>> trigger the change of the documentation and I also hope to be
> >>>>> corrected if I am wrong.
> >>>>>
> >>>>> Thanks and regards, Marcin
> >>>
> >>>
> >>
> >>
> >>
> >>
> >
> >
> >
>
>




_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: Avr-libc-user-manual: "Problems with reordering code"

Stu Bell-2
In reply to this post by David Brown-4
I hate to stick my nose in this after years away from the list, but... David is
right about the problems with atomic functions in C. The good news is that there
/is/ another solution: Use assembly language.  That does not solve the problem
within the domain of C, but it /does/ solve the problem you're having. And, as
I'm sure someone will point out, you can use the intermediate assembly generated
from C to create your custom "atomic" code. Does this mean the atomic.h
functions are misleading? No more or less than someone assuming they know what
an RTOS is; The problem is in the assumption.

This is an old argument about atomic C.  Been discussed to death. Assembly was
the only solution that seemed reasonable at the time, and I've seen nothing to
change my mind on it.  I've been hip-deep in ARM code for the last 4 years and
David's observations are just as true there, bu definition. And I've heard the
same howls of anguish from those hit with code moved over an ISB boundary.

If you're dealing with Real Time Systems, you need to be completely familiar
with the tool set you're using, just as much as the hardware. This is a wart, a
product of the C definition. It is what it is. Accept it and move on.

Stu Bell

PS: Yeah, I still write volumes when a sentence (or silence) will do. :-)

On 2/9/2017 1:20 PM, David Brown wrote:

> On 09/02/17 19:49, Bob Paddock wrote:
>> On Thu, Feb 9, 2017 at 1:13 PM, David Brown <[hidden email]> wrote:
>> So the only solution is to use different parts, such as ARM, with its
>> Instruction Synchronization Barrier (ISB) [that can waste thousands of
>> cycles]?
>>
>
> No.  That won't help.  The compiler can re-arrange "harmless" execution and
> calculations around an inline assembly instruction with the "ISB" opcode (or
> any similar opcode), even if the inline assembly is marked "volatile".  Given
> "isb(); x = a/b; isb(); return x;" the compiler is free to order the division
> before or after either of these isb() calls, if the compiler knows that the
> isb() function/macro/inline assembly does not affect the results of the
> calculation.
>
> In C, the only things that can be ordered are /visible/ effects. Those are
> volatile memory accesses, file I/O, program start/stop, and calling external
> code with unknown effect (since that external code could have visible
> effects).  C11 adds some atomic access and synchronisation functions, and C
> implementations can add more - gcc adds volatile inline assembly.  Other
> things - non-volatile memory accesses, and calculations, can be shuffled
> around at will, including back and forth across volatile accesses.
>
> The only way I know of to force control of the order of execution is to make
> "visible" dependencies on the results or the perquisites to the calculations.  
> (And if the calculation does not have any results, it is not actually needed
> at all as far as C is concerned.)  You have to use the techniques I gave in my
> earlier posts here - they are as convenient, safe, and efficient as it gets.  
> But it does mean that there is /no/ "general" execution barrier, in the same
> way that "asm ("":::"memory")" is a general memory barrier.
>
> If someone knows differently, I'd be happy to be corrected here.


_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: Avr-libc-user-manual: "Problems with reordering code"

David Brown-4
In reply to this post by David Brown-4
Hi Stu,

Feel free to stick your nose in here - there is always room for experienced opinions.

Yes, exactly the same thing applies to Arm as to the AVR. And exactly the same macros with empty inline assembly will solve the problem. In particular, you do not need to write any actual assembly here - you just need to work with the compiler to make sure it knows your requirements. It would be nice if there were more user friendly ways to express these things in code, but we work with what we've got. 


Mvh.,

David Brown
Westcontrol
(From my mobile)


-------- Original message --------
From: Stu Bell <[hidden email]>
Date: 10/02/2017 03:58 (GMT+01:00)
To: David Brown <[hidden email]>, Bob Paddock <[hidden email]>, [hidden email]
Subject: Re: [avr-gcc-list] Avr-libc-user-manual: "Problems with reordering   code"

I hate to stick my nose in this after years away from the list, but... David is
right about the problems with atomic functions in C. The good news is that there
/is/ another solution: Use assembly language.  That does not solve the problem
within the domain of C, but it /does/ solve the problem you're having. And, as
I'm sure someone will point out, you can use the intermediate assembly generated
from C to create your custom "atomic" code. Does this mean the atomic.h
functions are misleading? No more or less than someone assuming they know what
an RTOS is; The problem is in the assumption.

This is an old argument about atomic C.  Been discussed to death. Assembly was
the only solution that seemed reasonable at the time, and I've seen nothing to
change my mind on it.  I've been hip-deep in ARM code for the last 4 years and
David's observations are just as true there, bu definition. And I've heard the
same howls of anguish from those hit with code moved over an ISB boundary.

If you're dealing with Real Time Systems, you need to be completely familiar
with the tool set you're using, just as much as the hardware. This is a wart, a
product of the C definition. It is what it is. Accept it and move on.

Stu Bell

PS: Yeah, I still write volumes when a sentence (or silence) will do. :-)

On 2/9/2017 1:20 PM, David Brown wrote:

> On 09/02/17 19:49, Bob Paddock wrote:
>> On Thu, Feb 9, 2017 at 1:13 PM, David Brown <[hidden email]> wrote:
>> So the only solution is to use different parts, such as ARM, with its
>> Instruction Synchronization Barrier (ISB) [that can waste thousands of
>> cycles]?
>>
>
> No.  That won't help.  The compiler can re-arrange "harmless" execution and
> calculations around an inline assembly instruction with the "ISB" opcode (or
> any similar opcode), even if the inline assembly is marked "volatile".  Given
> "isb(); x = a/b; isb(); return x;" the compiler is free to order the division
> before or after either of these isb() calls, if the compiler knows that the
> isb() function/macro/inline assembly does not affect the results of the
> calculation.
>
> In C, the only things that can be ordered are /visible/ effects. Those are
> volatile memory accesses, file I/O, program start/stop, and calling external
> code with unknown effect (since that external code could have visible
> effects).  C11 adds some atomic access and synchronisation functions, and C
> implementations can add more - gcc adds volatile inline assembly.  Other
> things - non-volatile memory accesses, and calculations, can be shuffled
> around at will, including back and forth across volatile accesses.
>
> The only way I know of to force control of the order of execution is to make
> "visible" dependencies on the results or the perquisites to the calculations. 
> (And if the calculation does not have any results, it is not actually needed
> at all as far as C is concerned.)  You have to use the techniques I gave in my
> earlier posts here - they are as convenient, safe, and efficient as it gets. 
> But it does mean that there is /no/ "general" execution barrier, in the same
> way that "asm ("":::"memory")" is a general memory barrier.
>
> If someone knows differently, I'd be happy to be corrected here.


_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list