push r1, pop r0

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

push r1, pop r0

Szikra Istvan-2
Hi all,

I have this interesting test case for you guys, and girls.


#include <avr/io.h>

unsigned int GetStackPointer()
{
    volatile unsigned int sp = (SPH << 8) | SPL;
    return sp;
}

int main(void)
{
    while(1)
    {
        PORTA.OUT = GetStackPointer();
    }
}

After building it with Atmel Studio 6.2 (default project settings, -mmcu=atxmega128a4u), AVR8/GNU C Compiler/Linker : 4.8.1 the .lss contains this gem:
 
unsigned int GetStackPointer()
{
 21a: cf 93        push r28
 21c: df 93        push r29
 21e: 1f 92        push r1
 220: 1f 92        push r1
 222: cd b7        in r28, 0x3d ; 61
 224: de b7        in r29, 0x3e ; 62
    volatile unsigned int sp = (SPH << 8) | SPL;
 226: 2e b7        in r18, 0x3e ; 62
 228: 8d b7        in r24, 0x3d ; 61
 22a: 90 e0        ldi r25, 0x00 ; 0
 22c: 92 2b        or r25, r18
 22e: 89 83        std Y+1, r24 ; 0x01
 230: 9a 83        std Y+2, r25 ; 0x02
    return sp;
 232: 89 81        ldd r24, Y+1 ; 0x01
 234: 9a 81        ldd r25, Y+2 ; 0x02
}
 236: 0f 90        pop r0
 238: 0f 90        pop r0
 23a: df 91        pop r29
 23c: cf 91        pop r28
 23e: 08 95        ret

(Atmel Studio 7 (Version: 7.0.1417), gcc version 5.4.0 (AVR_8_bit_GNU_Toolchain_3.6.0_1734) does the same.)
(If anyone interested I can attach the whole compressed project.)


I'm mostly interested in the push r1, pop r0 pairs.
Why? What are they doing? Who puts them there? Can someone "fix" the compiler/binutils?
This seems to be a bug for me.

AFAIK r1 is the zero reg, and r0 is temp reg, so this should not cause problems, so it's not a critical bug, only optimization issue. 
(Unnecessary __tmp_reg__ =__zero_reg__ )


I know that a better implementation would be
    unsigned int GetStackPointer()
    {
        unsigned int sp = (SPH << 8) | SPL;
        21a: 2e b7        in r18, 0x3e ; 62
        21c: 8d b7        in r24, 0x3d ; 61
        21e: 90 e0        ldi r25, 0x00 ; 0
        return sp;
    }
    220: 92 2b        or r25, r18
    222: 08 95        ret

(I'm not really sure why (SPH << 8) | SPL; isn't optimized correctly/fully, but it isn't the point here.)

And my optimal solution is
    unsigned int GetStackPointer()
    {
        union { unsigned int w; unsigned char b[2];} w;
        w.b[0] = SPL; w.b[1] = SPH;
        21a: 8d b7        in r24, 0x3d ; 61
        21c: 9e b7        in r25, 0x3e ; 62
        return w.w;
    }
    21e: 08 95        ret

But the weird volatile issue is not the point here. 
I see push r1, pop r0 in other functions, but this was the smallest test case that produced this anomaly.

What do you think?

Üdvözlettel,
Szikra Istvan

_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: push r1, pop r0

Matthijs Kooijman
Hi Szikra,

I believe these pushes and pops are just meant to allocate 2 bytes on
the stack.

>  222: cd b7        in r28, 0x3d ; 61
>  224: de b7        in r29, 0x3e ; 62
This loads Y with SP, so the location of the 2 pushed bytes

>  226: 2e b7        in r18, 0x3e ; 62
>  228: 8d b7        in r24, 0x3d ; 61
>  22a: 90 e0        ldi r25, 0x00 ; 0
>  22c: 92 2b        or r25, r18
This is the load SP from your code variable

>  22e: 89 83        std Y+1, r24 ; 0x01
>  230: 9a 83        std Y+2, r25 ; 0x02
This stores SP to your sp variable on the stack

>  232: 89 81        ldd r24, Y+1 ; 0x01
>  234: 9a 81        ldd r25, Y+2 ; 0x02
This loads your sp variable into the return value

I wonder why this even forces the value onto the stack, since that's
really not needed at all. Perhaps the "volatile" on your "sp" variable
causes this, or perhaps you're compiling without optimization?

Gr.

Matthijs

_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: push r1, pop r0

Szikra Istvan-2
Hi Matthijs,

I believe you are right, that makes sense.

(I should have realized that, but it was a quite a while ago I read/wrote AVR asm... well you learn something new every day;)

Thank you.

It was compiled with  -O1.
Compiling with -Os does not seem to make a difference.

Best regards,
Szikra Istvan

On Tue, Nov 7, 2017 at 9:48 PM, Matthijs Kooijman <[hidden email]> wrote:
Hi Szikra,

I believe these pushes and pops are just meant to allocate 2 bytes on
the stack.

>  222: cd b7        in r28, 0x3d ; 61
>  224: de b7        in r29, 0x3e ; 62
This loads Y with SP, so the location of the 2 pushed bytes

>  226: 2e b7        in r18, 0x3e ; 62
>  228: 8d b7        in r24, 0x3d ; 61
>  22a: 90 e0        ldi r25, 0x00 ; 0
>  22c: 92 2b        or r25, r18
This is the load SP from your code variable

>  22e: 89 83        std Y+1, r24 ; 0x01
>  230: 9a 83        std Y+2, r25 ; 0x02
This stores SP to your sp variable on the stack

>  232: 89 81        ldd r24, Y+1 ; 0x01
>  234: 9a 81        ldd r25, Y+2 ; 0x02
This loads your sp variable into the return value

I wonder why this even forces the value onto the stack, since that's
really not needed at all. Perhaps the "volatile" on your "sp" variable
causes this, or perhaps you're compiling without optimization?

Gr.

Matthijs


_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: push r1, pop r0

David Brown-4
In reply to this post by Szikra Istvan-2
On 07/11/17 20:36, Szikra Istvan wrote:

> Hi all,
>
> I have this interesting test case for you guys, and girls.
>
>
> #include <avr/io.h>
>
> unsigned int GetStackPointer()
> {
>     volatile unsigned int sp = (SPH << 8) | SPL;
>     return sp;
> }
>
> int main(void)
> {
>     while(1)
>     {
>         PORTA.OUT = GetStackPointer();
>     }
> }
>

You have got an explanation for the pushes and pops.  AVR gcc used to
use "rcall ." to make a little stack space, which also surprised people.

It makes no sense to use "volatile" here - you are not using the
variable "sp" in a volatile manner.  That just means wasted time and
space, and means the value returned by your function is not the stack
pointer at the point of the call.  (Also, if you are writing C rather
than C++, your function declaration is not correct.)  The sensible way
to write this is:

static inline uint16_t GetStackPointer(void)
{
        return (SPH << 8) | SPL;
}

There is no need to use unions here - the compiler should be able to
handle this well.  You don't need a temporary variable here - certainly
not a "volatile" one.  And a static inline function means that the stack
pointer you read is the correct value.  It is good practice to use the
<stdint.h> fixed size types when you want a fixed size - "uint16_t" is
more precise and informative than "unsigned int" in cases like this.


_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: push r1, pop r0

Sergey A. Borshch
On 08.11.2017 11:03, David Brown wrote:
> (Also, if you are writing C rather
> than C++, your function declaration is not correct.)  The sensible way
> to write this is:
>
> static inline uint16_t GetStackPointer(void)
> {
> return (SPH << 8) | SPL;
> }
It's not correct declaration too. Function name clearly states that it is
returning pointer, so it should return... pointer:
static inline void * GetStackPointer(void)
{
      return (void *)((SPH << 8) | SPL);
}

--
Regards,
   Sergey A. Borshch            mailto: [hidden email]
     SB ELDI ltd. Riga, Latvia

_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: push r1, pop r0

David Brown-4
On 08/11/17 10:21, Sergey A. Borshch wrote:

> On 08.11.2017 11:03, David Brown wrote:
>> (Also, if you are writing C rather
>> than C++, your function declaration is not correct.)  The sensible way
>> to write this is:
>>
>> static inline uint16_t GetStackPointer(void)
>> {
>>     return (SPH << 8) | SPL;
>> }
> It's not correct declaration too. Function name clearly states that it
> is returning pointer, so it should return... pointer:
> static inline void * GetStackPointer(void)
> {
>      return (void *)((SPH << 8) | SPL);
> }
>

I viewed it more as meaning "get the contents of the stack pointer
register", rather than "get the pointer to the current top-of-stack".
That is up to the OP, to choose what he wants here.

My point was merely that in C (but not C++), a function taking no
parameters should be declared with a "void" parameter list.  Omitting
that is allowed (and the code generated is identical), but it is an
obsolete feature in C.  If the OP was writing C++, then it is fine.


_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: push r1, pop r0

Szikra Istvan-2
Hi all,

I'm writing C++ code, but I made the example a C project. Yeah (void)

I do use stdint.h, stddef.h, stdbool.h I just try to avoid the include hell for examples. ;)
Note: avr/io.h already includes inttypes.h which includes stdint.h, so I probably should have used uint16_t.

I use GetStackPointer to get the value of the stack pointer, to calculate used and free stack space.

inlining the function is better... except when you have to debug your code ;) (and you are unable to place a breakpoint anywhere in AS because everything is inlined >:( )

> "There is no need to use unions here "
I agree that "the compiler should be able to handle this well."
But it doesn't !

Code:

#include <avr/io.h>
#include <avr/pgmspace.h>
#include <stdio.h>

static inline uint16_t GetStackPointer(void) {
    return (SPH << 8) | SPL;
}

int main(void) {
    while(1) {
        printf_P(PSTR("SP=%u\r\n"), GetStackPointer());
    }
}


Result:

int main(void) {
    while(1) {
        printf_P(PSTR("SP=%u\r\n"), GetStackPointer());
 238: cc ef        ldi r28, 0xFC ; 252
 23a: d1 e0        ldi r29, 0x01 ; 1
#include <avr/io.h>
#include <avr/pgmspace.h>
#include <stdio.h>

static inline uint16_t GetStackPointer(void) {
    return (SPH << 8) | SPL;
 23c: 2e b7        in r18, 0x3e ; 62
 23e: 8d b7        in r24, 0x3d ; 61
}

int main(void) {
    while(1) {
        printf_P(PSTR("SP=%u\r\n"), GetStackPointer());
 240: 90 e0        ldi r25, 0x00 ; 0
 242: 92 2b        or r25, r18
 244: 9f 93        push r25
 246: 8f 93        push r24
 248: df 93        push r29
 24a: cf 93        push r28
 24c: 0e 94 2d 01 call 0x25a ; 0x25a <printf_P>
 250: 0f 90        pop r0
 252: 0f 90        pop r0
 254: 0f 90        pop r0
 256: 0f 90        pop r0
 258: f1 cf        rjmp .-30      ; 0x23c <main+0x4>

(This was compiled with -Os
Invoking: AVR/GNU C Compiler : 5.4.0
"C:\Program Files (x86)\Atmel\Studio\7.0\toolchain\avr8\avr8-gnu-toolchain\bin\avr-gcc.exe"  -x c -funsigned-char -funsigned-bitfields -DDEBUG  -I"C:\Program Files (x86)\Atmel\Studio\7.0\Packs\atmel\XMEGAA_DFP\1.1.68\include"  -Os -ffunction-sections -fdata-sections -fpack-struct -fshort-enums -g2 -Wall -mmcu=atxmega128a4u -B "C:\Program Files (x86)\Atmel\Studio\7.0\Packs\atmel\XMEGAA_DFP\1.1.68\gcc\dev\atxmega128a4u" -c -std=gnu99 -MD -MP -MF "main.d" -MT"main.d" -MT"main.o"   -o "main.o" ".././main.c" 
)

So it is 2017, but the compiler still cannot optimize away bitwise or zero... :(

Here's my version:

int main(void) {
    while(1) {
        printf_P(PSTR("SP=%u\r\n"), GetStackPointer());
 238: cc ef        ldi r28, 0xFC ; 252
 23a: d1 e0        ldi r29, 0x01 ; 1
//}

static inline uint16_t GetStackPointer(void)
{
    union { unsigned int w; unsigned char b[2];} w; /// in 2017 you still need this to produce optimal asm code :(
    w.b[0] = SPL; w.b[1] = SPH;
 23c: 8d b7        in r24, 0x3d ; 61
 23e: 9e b7        in r25, 0x3e ; 62
return w.w;
}

int main(void) {
    while(1) {
        printf_P(PSTR("SP=%u\r\n"), GetStackPointer());
 240: 9f 93        push r25
 242: 8f 93        push r24
 244: df 93        push r29
 246: cf 93        push r28
 248: 0e 94 2b 01 call 0x256 ; 0x256 <printf_P>
 24c: 0f 90        pop r0
 24e: 0f 90        pop r0
 250: 0f 90        pop r0
 252: 0f 90        pop r0
 254: f3 cf        rjmp .-26      ; 0x23c <main+0x4>



Thanks for all your help.

Best regards,
Szikra Istvan

On Wed, Nov 8, 2017 at 11:17 AM, David Brown <[hidden email]> wrote:
On 08/11/17 10:21, Sergey A. Borshch wrote:
> On 08.11.2017 11:03, David Brown wrote:
>> (Also, if you are writing C rather
>> than C++, your function declaration is not correct.)  The sensible way
>> to write this is:
>>
>> static inline uint16_t GetStackPointer(void)
>> {
>>     return (SPH << 8) | SPL;
>> }
> It's not correct declaration too. Function name clearly states that it
> is returning pointer, so it should return... pointer:
> static inline void * GetStackPointer(void)
> {
>      return (void *)((SPH << 8) | SPL);
> }
>

I viewed it more as meaning "get the contents of the stack pointer
register", rather than "get the pointer to the current top-of-stack".
That is up to the OP, to choose what he wants here.

My point was merely that in C (but not C++), a function taking no
parameters should be declared with a "void" parameter list.  Omitting
that is allowed (and the code generated is identical), but it is an
obsolete feature in C.  If the OP was writing C++, then it is fine.


_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list


_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: push r1, pop r0

Matthijs Kooijman
Hi Szikra,

> So it is 2017, but the compiler still cannot optimize away bitwise or
> zero... :(
I think this is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41076
(which should be fixed in gcc 7 according to the comments). The fix
contains a peephole optimization that I think catches exactly this case:

         6831 (define_peephole2
         6832   [(set (match_operand:QI 0 "register_operand")
         6833         (const_int 0))
         6834    (set (match_dup 0)
         6835         (ior:QI (match_dup 0)
         6836                 (match_operand:QI 1 "register_operand")))]
         6837   ""
         6838   [(set (match_dup 0)
         6839         (match_dup 1))])

https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/avr/avr.md?r1=242907&r2=242906&pathrev=242907

Gr.

Matthijs

_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: push r1, pop r0

Georg-Johann Lay-2
In reply to this post by Szikra Istvan-2
Szikra Istvan schrieb:
> Hi all,
>
> I have this interesting test case for you guys, and girls.
>
> #include <avr/io.h>
>
> unsigned int GetStackPointer()
> {
>     volatile unsigned int sp = (SPH << 8) | SPL;

There is "SP" for you.  Ne need to hamper with bytes.

uint16_t GetStackPointer (void)
{
     return SP;
}

>     return sp;
> }
>
> int main(void)
> {
>     while(1)
>     {
>         PORTA.OUT = GetStackPointer();
>     }
> }
>
> After building it with Atmel Studio 6.2 (default project settings,
> -mmcu=atxmega128a4u), AVR8/GNU C Compiler/Linker : 4.8.1 the .lss contains
> this gem:
>
> unsigned int GetStackPointer()
> {
>  21a: cf 93        push r28
>  21c: df 93        push r29
>  21e: 1f 92        push r1
>  220: 1f 92        push r1
>  222: cd b7        in r28, 0x3d ; 61
>  224: de b7        in r29, 0x3e ; 62
>     volatile unsigned int sp = (SPH << 8) | SPL;
>  226: 2e b7        in r18, 0x3e ; 62
>  228: 8d b7        in r24, 0x3d ; 61
>  22a: 90 e0        ldi r25, 0x00 ; 0
>  22c: 92 2b        or r25, r18
>  22e: 89 83        std Y+1, r24 ; 0x01
>  230: 9a 83        std Y+2, r25 ; 0x02
>     return sp;
>  232: 89 81        ldd r24, Y+1 ; 0x01
>  234: 9a 81        ldd r25, Y+2 ; 0x02
> }
>  236: 0f 90        pop r0
>  238: 0f 90        pop r0
>  23a: df 91        pop r29
>  23c: cf 91        pop r28
>  23e: 08 95        ret
>
> (Atmel Studio 7 (Version: 7.0.1417), gcc version 5.4.0
> (AVR_8_bit_GNU_Toolchain_3.6.0_1734) does the same.)
> (If anyone interested I can attach the whole compressed project.)

A working test case is always preferred over binary clump :-)

> I'm mostly interested in the push r1, pop r0 pairs.
> Why? What are they doing? Who puts them there? Can someone "fix" the
> compiler/binutils?

You used volatile which forces "sp" into the frame of the function.
As avr-gcc has to set up a frame pointer, this needs to push 2 more
bytes.  Y is frame pointer and callee-saved here, this explains total
of 4 bytes of stack usage (not counting return address).

> This seems to be a bug for me.

Not to me.

> Can someone "fix" the compiler/binutils?

The only bug is in your code:

o Forcing sp onto stack by volatile.

o Using -O1 does not set -fomit-frame-pointer, hence even
   without the volatile you get a frame pointer.  To get
   rid of it, use code that doesn't trigger a frame and
   compile with optimization higher than -O1 and / or with
   -fomit-frame-pointer.

o Trying to get free stack by reading SP is a broken design.
   For example, an ISR might consume way more space, which
   will not be detected by your code.  Consider code like
   in "mem-check.c" from here:
 
http://rn-wissen.de/wiki/index.php?title=Speicherverbrauch_bestimmen_mit_avr-gcc#Dynamischer_RAM-Verbrauch

> AFAIK r1 is the zero reg, and r0 is temp reg, so this should not cause
> problems, so it's not a critical bug, only optimization issue.
> (Unnecessary __tmp_reg__ =__zero_reg__ )

As already mentioned, the problem is in your code.

> And my optimal solution is
>     unsigned int GetStackPointer()
>     {
>         union { unsigned int w; unsigned char b[2];} w;
>         w.b[0] = SPL; w.b[1] = SPH;
>         21a: 8d b7        in r24, 0x3d ; 61
>         21c: 9e b7        in r25, 0x3e ; 62
>         return w.w;
>     }
>     21e: 08 95        ret

No need for type punning unions, just use "SP" as explained above.

Johann


_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: push r1, pop r0

Szikra Istvan-2
Hi

Thanks for the SP, I missed that. 
And apparently Atmel Studio also cannot find it, and underlines it with red error marker.
It does compile, and I have found it in avr/common.h, it's probably a problem with __AVR_ARCH__ handling by AS...
I guess that's what I get for trusting the IDE:)

I know that just reading SP is not enough, I do also use stack watermarking. It's just an additional diagnostic information.
Note: SP can also be used to re-mark the unused stack to trace stack usage over time...
(or mark stack on platforms without .init* sections.)

Note #999: 
"the problem is in your code."
This isn't actually my code. My code was written for ARM and looked something like this:
unsigned int GetStackPointer() {
    volatile register unsigned int sp asm("r13");
    return sp;
}
that someone ported to AVR, and now I'm fixing it... ;)



Üdvözlettel,
Szikra Istvan

On Wed, Nov 8, 2017 at 7:10 PM, Georg-Johann Lay <[hidden email]> wrote:
Szikra Istvan schrieb:
Hi all,

I have this interesting test case for you guys, and girls.

#include <avr/io.h>

unsigned int GetStackPointer()
{
    volatile unsigned int sp = (SPH << 8) | SPL;

There is "SP" for you.  Ne need to hamper with bytes.

uint16_t GetStackPointer (void)
{
    return SP;

}

    return sp;
}

int main(void)
{
    while(1)
    {
        PORTA.OUT = GetStackPointer();
    }
}

After building it with Atmel Studio 6.2 (default project settings,
-mmcu=atxmega128a4u), AVR8/GNU C Compiler/Linker : 4.8.1 the .lss contains
this gem:

unsigned int GetStackPointer()
{
 21a: cf 93        push r28
 21c: df 93        push r29
 21e: 1f 92        push r1
 220: 1f 92        push r1
 222: cd b7        in r28, 0x3d ; 61
 224: de b7        in r29, 0x3e ; 62
    volatile unsigned int sp = (SPH << 8) | SPL;
 226: 2e b7        in r18, 0x3e ; 62
 228: 8d b7        in r24, 0x3d ; 61
 22a: 90 e0        ldi r25, 0x00 ; 0
 22c: 92 2b        or r25, r18
 22e: 89 83        std Y+1, r24 ; 0x01
 230: 9a 83        std Y+2, r25 ; 0x02
    return sp;
 232: 89 81        ldd r24, Y+1 ; 0x01
 234: 9a 81        ldd r25, Y+2 ; 0x02
}
 236: 0f 90        pop r0
 238: 0f 90        pop r0
 23a: df 91        pop r29
 23c: cf 91        pop r28
 23e: 08 95        ret

(Atmel Studio 7 (Version: 7.0.1417), gcc version 5.4.0
(AVR_8_bit_GNU_Toolchain_3.6.0_1734) does the same.)
(If anyone interested I can attach the whole compressed project.)

A working test case is always preferred over binary clump :-)

I'm mostly interested in the push r1, pop r0 pairs.
Why? What are they doing? Who puts them there? Can someone "fix" the
compiler/binutils?

You used volatile which forces "sp" into the frame of the function.
As avr-gcc has to set up a frame pointer, this needs to push 2 more
bytes.  Y is frame pointer and callee-saved here, this explains total
of 4 bytes of stack usage (not counting return address).

This seems to be a bug for me.

Not to me.

Can someone "fix" the compiler/binutils?

The only bug is in your code:

o Forcing sp onto stack by volatile.

o Using -O1 does not set -fomit-frame-pointer, hence even
  without the volatile you get a frame pointer.  To get
  rid of it, use code that doesn't trigger a frame and
  compile with optimization higher than -O1 and / or with
  -fomit-frame-pointer.

o Trying to get free stack by reading SP is a broken design.
  For example, an ISR might consume way more space, which
  will not be detected by your code.  Consider code like
  in "mem-check.c" from here:

http://rn-wissen.de/wiki/index.php?title=Speicherverbrauch_bestimmen_mit_avr-gcc#Dynamischer_RAM-Verbrauch

AFAIK r1 is the zero reg, and r0 is temp reg, so this should not cause
problems, so it's not a critical bug, only optimization issue.
(Unnecessary __tmp_reg__ =__zero_reg__ )

As already mentioned, the problem is in your code.

And my optimal solution is
    unsigned int GetStackPointer()
    {
        union { unsigned int w; unsigned char b[2];} w;
        w.b[0] = SPL; w.b[1] = SPH;
        21a: 8d b7        in r24, 0x3d ; 61
        21c: 9e b7        in r25, 0x3e ; 62
        return w.w;
    }
    21e: 08 95        ret

No need for type punning unions, just use "SP" as explained above.

Johann



_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: push r1, pop r0

David Brown-4
On 09/11/17 04:57, Szikra Istvan wrote:

> Hi
>
> Thanks for the SP, I missed that.
> And apparently Atmel Studio also cannot find it, and underlines it with
> red error marker.
> It does compile, and I have found it in avr/common.h, it's probably a
> problem with __AVR_ARCH__ handling by AS...
> I guess that's what I get for trusting the IDE:)
>
> I know that just reading SP is not enough, I do also use stack
> watermarking. It's just an additional diagnostic information.
> Note: SP can also be used to re-mark the unused stack to trace stack
> usage over time...
> (or mark stack on platforms without .init* sections.)
>
> Note #999:
> "the problem is in your code."
> This isn't actually my code. My code was written for ARM and looked
> something like this:
> unsigned int GetStackPointer() {
>     volatile register unsigned int sp asm("r13");
>     return sp;
> }
> that someone ported to AVR, and now I'm fixing it... ;)
>

That code is wrong for the ARM too.  You should not fix a variable to an
assembly register that is used specifically by the compiler.

Depending on the ARM in question, you should probably use a CMSIS
function like _get_PSP().  The standard definition for gcc and Cortex-M is:

__attribute__( ( always_inline ) ) __STATIC_INLINE uint32_t __get_PSP(void)
{
  register uint32_t result;

  __ASM volatile ("MRS %0, psp\n"  : "=r" (result) );
  return(result);
}


Just to read the r13 register, the code would be something like this (I
haven't checked the details here) :

static inline uint32_t GetStackPointer(void)
{
        uint32_t sp;
        asm volatile("mov %0, r13" : "=r" (sp));
        return sp;
}


And you can probably just use __builtin_frame_address() - that should,
in theory, work on the AVR and the ARM (I have not tested it on either
target).




_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: push r1, pop r0

Szikra Istvan-2
Hi David, 

What is exactly wrong with my code?

Thanks by the way for __get_PSP and all your help.

This is what it looks like on my end:
__STATIC_INLINE uint32_t __get_PSP(void)
{
  register uint32_t __regProcessStackPointer  __ASM("psp");
  return(__regProcessStackPointer);
}
in
C:\Program Files (x86)\Atmel\Atmel Toolchain\ARM GCC\Native\4.8.1443\CMSIS_Atmel\CMSIS\Include\core_cmFunc.h

Which for me looks a lot like my code except for volatile.

Actually I already had inline asm version, but I prefer to use C if I can:
unsigned int GetStackPointer()
{
//asm volatile ("mov %0, r13" : "=l" (sp));
volatile register unsigned int sp asm("r13");
return sp;
}


So why volatile?
Because it generates smaller code:

    while (1) 
    {
        unsigned int sp=GetStackPointer();
        sum+=sp;
  400270: 9b01      ldr r3, [sp, #4]
  400272: 446b      add r3, sp
  400274: 9301      str r3, [sp, #4]
  400276: e7fb      b.n 400270 <main+0xc>

Without it:

    //volatile 
    register unsigned int sp asm("r13");
    return sp;
  400270: 466a      mov r2, sp
    volatile 
    uint32_t sum=0;
    while (1) 
    {
        unsigned int sp=GetStackPointer();
        sum+=sp;
  400272: 9b01      ldr r3, [sp, #4]
  400274: 4413      add r3, r2
  400276: 9301      str r3, [sp, #4]
  400278: e7fb      b.n 400272 <main+0xe>
the code gets bigger.


__builtin_frame_address(0) returns r7 not sp. Which should be te same, but forces the use of r7:

  400264: b580      push {r7, lr}
  400266: b082      sub sp, #8
  400268: af00      add r7, sp, #0

My version doesn't have this:
  400264: b500      push {lr}
  400266: b083      sub sp, #12


Here's te complete test case with code size statistics, in case someone want's to try it with different compiler versions...


#include "sam.h"

static inline unsigned int GetStackPointer() {
    volatile register unsigned int sp asm("r13");
    return sp;
}
static inline unsigned int GetStackPointerAsm() {
    register uint32_t sp;
    asm volatile ("mov %0, r13" : "=l" (sp));
    return sp;
}
static inline unsigned int GetStackPointerNV() {
    register unsigned int sp asm("r13");
    return sp;
}

__attribute__( ( always_inline ) ) __STATIC_INLINE uint32_t GetStackPointerAsm2(void)
{
    register uint32_t result;
    __ASM volatile ("MRS %0, psp\n"  : "=r" (result) );
    return(result);
}
static inline uint32_t GetStackPointerAsm3(void)
{
    uint32_t sp;
    asm volatile("mov %0, r13" : "=r" (sp));
    return sp;
}

int main(void)
{
    SystemInit();

    volatile uint32_t sum=0;
    while (1) {
        ///                                  Program Memory Usage  
        sum+=GetStackPointer();          /// 2080 bytes
        //sum+=GetStackPointerAsm();       /// 2084 bytes
        //sum+=GetStackPointerAsm2();      /// 2084 bytes
        //sum+=GetStackPointerAsm3();      /// 2084 bytes
        //sum+=GetStackPointerNV();        /// 2084 bytes 
        //sum+=__get_PSP();                /// 2084 bytes
        //sum+=__builtin_frame_address(0); /// 2084 bytes
    }
}

I agree that the use of volatile is a hack, but that's the best solution I had found back when I wrote it.


We might consider moving the discussion off the list, since it no longer AVR related.


Best regards,
Szikra Istvan

On Thu, Nov 9, 2017 at 10:26 AM, David Brown <[hidden email]> wrote:
On 09/11/17 04:57, Szikra Istvan wrote:
> Hi
>
> Thanks for the SP, I missed that.
> And apparently Atmel Studio also cannot find it, and underlines it with
> red error marker.
> It does compile, and I have found it in avr/common.h, it's probably a
> problem with __AVR_ARCH__ handling by AS...
> I guess that's what I get for trusting the IDE:)
>
> I know that just reading SP is not enough, I do also use stack
> watermarking. It's just an additional diagnostic information.
> Note: SP can also be used to re-mark the unused stack to trace stack
> usage over time...
> (or mark stack on platforms without .init* sections.)
>
> Note #999:
> "the problem is in your code."
> This isn't actually my code. My code was written for ARM and looked
> something like this:
> unsigned int GetStackPointer() {
>     volatile register unsigned int sp asm("r13");
>     return sp;
> }
> that someone ported to AVR, and now I'm fixing it... ;)
>

That code is wrong for the ARM too.  You should not fix a variable to an
assembly register that is used specifically by the compiler.

Depending on the ARM in question, you should probably use a CMSIS
function like _get_PSP().  The standard definition for gcc and Cortex-M is:

__attribute__( ( always_inline ) ) __STATIC_INLINE uint32_t __get_PSP(void)
{
  register uint32_t result;

  __ASM volatile ("MRS %0, psp\n"  : "=r" (result) );
  return(result);
}


Just to read the r13 register, the code would be something like this (I
haven't checked the details here) :

static inline uint32_t GetStackPointer(void)
{
        uint32_t sp;
        asm volatile("mov %0, r13" : "=r" (sp));
        return sp;
}


And you can probably just use __builtin_frame_address() - that should,
in theory, work on the AVR and the ARM (I have not tested it on either
target).





_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: push r1, pop r0

Georg-Johann Lay-2
On 09.11.2017 17:49, Szikra Istvan wrote:

> Hi David,
>
> What is exactly wrong with my code?
>
> Thanks by the way for __get_PSP and all your help.
>
> This is what it looks like on my end:
> __STATIC_INLINE uint32_t __get_PSP(void)
> {
>    register uint32_t __regProcessStackPointer  __ASM("psp");

Assuming that __ASM obfuscation is just __asm, ITWTJBA:
if that works than just by accident.  Local register variables
are just guaranteed to live in the indicated register only when
used as operand to inline asm, and only during respective piece of
inline asm.

So you want to read the documentation again, in particular the
"The only supported use for this feature.." part of

https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html

>    return(__regProcessStackPointer);
> }
> in
> C:\Program Files (x86)\Atmel\Atmel Toolchain\ARM
> GCC\Native\4.8.1443\CMSIS_Atmel\CMSIS\Include\core_cmFunc.h
>
> Which for me looks a lot like my code except for volatile.
> Actually I already had inline asm version, but I prefer to use C if I can:
> unsigned int GetStackPointer()
> {
> //asm volatile ("mov %0, r13" : "=l" (sp));
> volatile register unsigned int sp asm("r13");

Notice that GCC's (global or local) register variables cannot be
volatile.  volatile makes only sense for stuff in memory.

> return sp;
> }
>
>
> So why volatile?

volatile is void here (with respect to register).  If it has an
effect, then it is forcing the var into the frame.

> Because it generates smaller code:
>
>      while (1)
>      {
>          unsigned int sp=GetStackPointer();
>          sum+=sp;
>    400270: 9b01      ldr r3, [sp, #4]
>    400272: 446b      add r3, sp
>    400274: 9301      str r3, [sp, #4]
>    400276: e7fb      b.n 400270 <main+0xc>
>
> Without it:
>
>      //volatile
>      register unsigned int sp asm("r13");
>      return sp;
>    400270: 466a      mov r2, sp
>      volatile
>      uint32_t sum=0;
>      while (1)
>      {
>          unsigned int sp=GetStackPointer();
>          sum+=sp;
>    400272: 9b01      ldr r3, [sp, #4]
>    400274: 4413      add r3, r2
>    400276: 9301      str r3, [sp, #4]
>    400278: e7fb      b.n 400272 <main+0xe>
> the code gets bigger.
>
>
> __builtin_frame_address(0) returns r7 not sp. Which should be te same, but
> forces the use of r7:
>
>    400264: b580      push {r7, lr}
>    400266: b082      sub sp, #8
>    400268: af00      add r7, sp, #0
>
> My version doesn't have this:
>    400264: b500      push {lr}
>    400266: b083      sub sp, #12
>
>
> Here's te complete test case with code size statistics, in case someone
> want's to try it with different compiler versions...
>
>
> #include "sam.h"
>
> static inline unsigned int GetStackPointer() {
>      volatile register unsigned int sp asm("r13");
>      return sp;
> }
> static inline unsigned int GetStackPointerAsm() {
>      register uint32_t sp;
>      asm volatile ("mov %0, r13" : "=l" (sp));
>      return sp;
> }
> static inline unsigned int GetStackPointerNV() {
>      register unsigned int sp asm("r13");
>      return sp;
> }
>
> __attribute__( ( always_inline ) ) __STATIC_INLINE uint32_t
> GetStackPointerAsm2(void)
> {
>      register uint32_t result;
>      __ASM volatile ("MRS %0, psp\n"  : "=r" (result) );
>      return(result);
> }
> static inline uint32_t GetStackPointerAsm3(void)
> {
>      uint32_t sp;
>      asm volatile("mov %0, r13" : "=r" (sp));
>      return sp;
> }
>
> int main(void)
> {
>      SystemInit();
>
>      volatile uint32_t sum=0;
>      while (1) {
>          ///                                  Program Memory Usage
>          sum+=GetStackPointer();          /// 2080 bytes
>          //sum+=GetStackPointerAsm();       /// 2084 bytes
>          //sum+=GetStackPointerAsm2();      /// 2084 bytes
>          //sum+=GetStackPointerAsm3();      /// 2084 bytes
>          //sum+=GetStackPointerNV();        /// 2084 bytes
>          //sum+=__get_PSP();                /// 2084 bytes
>          //sum+=__builtin_frame_address(0); /// 2084 bytes
>      }
> }
>
> I agree that the use of volatile is a hack, but that's the best solution I
> had found back when I wrote it.
>
>
> We might consider moving the discussion off the list, since it no longer
> AVR related.
>
>
> Best regards,
> Szikra Istvan

[snippet tons of TOFU quotes]

_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: push r1, pop r0

David Brown-4
In reply to this post by Szikra Istvan-2
On 09/11/17 17:49, Szikra Istvan wrote:
> Hi David,
>
> What is exactly wrong with my code?
>
> Thanks by the way for __get_PSP and all your help.

This is getting into ARM stuff, and is therefore off-topic for the
avr-gcc list (as you noted below).  But the principles are fairly
cross-target.

The "register __ASM" version of __get_PSP is for ARM's compiler (from
Keil), not for gcc.  With gcc you should not use register asm constructs
for registers that are in use otherwise by the compiler - there are only
a few registers that are safe to use for asm variables.  The code might
well work correctly - but it is not a reliable way to handle it.

In particular, trying to use sp directly as a register variable may go
badly wrong if the code needs to change or use sp in some way.  The
single instruction you saved with this construct is totally negligible,
and would probably disappear entirely in any real use of reading SP.
Dangerous hacks and micro-optimisations like this are not worth it.

mvh.,

David


_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list