RFC: Speeding up small ISRs: PR20296

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

RFC: Speeding up small ISRs: PR20296

Georg-Johann Lay-2
https://gcc.gnu.org/PR20296

is about speeding up "small" ISRs, and is open for 12 years now...

Anyone familiar with avr-gcc knows that a fix would be high effort and
risk, and that's the major reason for why PR20296 is still open (and
even classified "suspended").

In some forum discussion (again!) on that issue, there was the following
proposal to approach that PR:

1) Let GCC emit directives / pseudo-instructions in non-naked ISR
prologue / epilogue

2) Let GAS scan the code and replace the directives with code as needed.

Currently,

#include <avr/io.h>
#include <avr/interrupt.h>

ISR (INT0_vect)
{
     __asm ("; Code");
}

emit something like:


__vector_1:
        push r1
        push r0
        in r0,__SREG__
        push r0
        clr __zero_reg__
.L__stack_usage = 3

        ; Code

        pop r0
        out __SREG__,r0
        pop r0
        pop r1
        reti


which would change to:


__vector_1:
        .maybe_isr_prologue 123
        ;; Rest of prologue

        ; Code

        ;; Rest of epilogue
        .maybe_isr_epilogue 123
        reti

GAS would then scan the code associated to the function and replace the
.maybe by appropriate sequence to safe / init / restore tmp-reg,
zero-reg and SREG.  Other registers like R24 are handled by GCC as
usual.  For example, if the scan reveals that tmp-reg is not needed but
zero-reg is (which will imply SREG due to the CLR) the replacement code
would be:


__vector_1:
        push r1
        in r1,__SREG__
        push r1
        clr __zero_reg__

        ; Code

        pop r1
        out __SREG__,r1
        pop r1
        reti


Maybe someone is interested in implementing the GAS part, and if that is
the case and the proposal is feasible, I would take care of the GCC part.

Caveats:

a) .L__stack_usage can no more be computed by GCC

b) It's hard to find the end of the relevant code.  We might have
interleaved sections (like with dispatch tables), there might be code
that is emit after the epilogue, there might be more than 1 epilogue,
dunno if GAS can infer whether JMP is local or non-local.

We could add a new GCC pass that filters out situations that are
pointless to scan like code with dispatch tables or function calls, and
fall back to classical prologue / epilogue in such cases.

The .maybe gets function-unique identifiers (123 in the example) so that
GAS knows which epilogue belongs to which .prologue provided that's helpful.

I am not familiar with Binutils / GAS though and don't know if it's easy
to add the 2 new passes: One to scan and one to replace the .maybe with
appropriate code.  IIUC GAS only works on sections, and the scan would
be on BFD internal representation (like relaxing) after the parser read
in the asm sources?

The GCC change would add a new option, configure test whether GAS
supports this, let ISR prologue and epilogue emit new unspec_volatile
pseudo insns and add a scan pass to detect situations that are pointless
should fall back to old code, like when dispatch tables, calls or
non-local goto is seen.


Johann

_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Speeding up small ISRs: PR20296

Erik Christiansen-2
Reply-To: [hidden email]

On 15.06.17 14:43, Georg-Johann Lay wrote:

> https://gcc.gnu.org/PR20296
>
> is about speeding up "small" ISRs, and is open for 12 years now...
>
> Anyone familiar with avr-gcc knows that a fix would be high effort and risk,
> and that's the major reason for why PR20296 is still open (and even
> classified "suspended").
>
> In some forum discussion (again!) on that issue, there was the following
> proposal to approach that PR:

Reading the PR causes me to infer that moving that code generation out
of gcc into gas is the proposed fix for unmanageable optimisation
complexity in gcc in the use case.

> 1) Let GCC emit directives / pseudo-instructions in non-naked ISR prologue /
> epilogue

If only existing directives and macro invocations are emitted, then the
need to modify gas code is obviated. I.e. if:

  .maybe_isr_prologue 123       ; were instead:
   maybe_isr_prologue 123

then a gas macro could generate the required code without unnecessary
complexity. If the desired code to be generated from the parameters
supplied can be described, then I'll write the macro(s). After some
iterations, we should have some good results for some useful use cases.

> 2) Let GAS scan the code and replace the directives with code as needed.

That is what gas does with directives and macros, but avoiding
modification of gas to add new directives is a very worthwhile design
goal, not least to avoid being told on binutils to do simple things with
the directives already provided. (And code generation is simple, if a
macro invocation with parameter(s) is supplied.)

> Currently,
>
> #include <avr/io.h>
> #include <avr/interrupt.h>
>
> ISR (INT0_vect)
> {
>     __asm ("; Code");
> }
>
> emit something like:
>
>
> __vector_1:
> push r1
> push r0
> in r0,__SREG__
> push r0
> clr __zero_reg__
> .L__stack_usage = 3
>
> ; Code
>
> pop r0
> out __SREG__,r0
> pop r0
> pop r1
> reti
>
>
> which would change to:
>
>
> __vector_1:
> .maybe_isr_prologue 123
> ;; Rest of prologue
>
> ; Code
>
> ;; Rest of epilogue
> .maybe_isr_epilogue 123
> reti
>
> GAS would then scan the code associated to the function and replace the
> .maybe by appropriate sequence to safe / init / restore tmp-reg, zero-reg
> and SREG.

That sets things up handily for finishing by simple macro. Let us say
that gcc emits "_maybe_isr_prologue 1 2 3", then 1 could be the switch
for save, 2 for init, and 3 for restore, if desired. Gas macros readily
handle omission of the last parameter (with it then taking an internally
defined default value), which can be useful if gas knows the default,
and gcc doesn't. Lumping it all into a single parameter would lead to 8
parameter values, just to cover 3 binary switches, IIUC the use case.

> Other registers like R24 are handled by GCC as usual.  For example, if
> the scan reveals that tmp-reg is not needed but zero-reg is (which
> will imply SREG due to the CLR) the replacement code would be:

>
> __vector_1:
> push r1
> in r1,__SREG__
> push r1
> clr __zero_reg__
>
> ; Code
>
> pop r1
> out __SREG__,r1
> pop r1
> reti

An epilogue macro can be made to know whether its matching prologue
saved tmp-reg, even if that is stretching assembler macros slightly.
That would not require any additional code scan. So long as nesting ISRs
is illegal, then it would not even clutter the gas symbol table
perceptibly.

> Maybe someone is interested in implementing the GAS part, and if that is the
> case and the proposal is feasible, I would take care of the GCC part.

I propose that we minimise toolchain modification by choosing an elegant
implementation, based on existing gas capabilities, if feasible. Thus
far, I have not seen any proposed code generation which ought not be
achievable that way.

> Caveats:
>
> a) .L__stack_usage can no more be computed by GCC

It is no effort for gas to implement lines like: .L__stack_usage = 3
As we know exactly how many bytes we are adding to the stack frame, we
can effortlessly dimension and emit that line - and yes, it is gas which
converts that 'L' into a unique integer. Whatever code later uses all the
.nnnn__stack_usage sizes should continue to work as before.

> b) It's hard to find the end of the relevant code.  We might have
> interleaved sections (like with dispatch tables), there might be code that
> is emit after the epilogue, there might be more than 1 epilogue,

If gcc doesn't know what it is doing, then gas can't fix that part. ;-)
If gcc emits 15 "_maybe_maybe" macros, then gas will make 15 expansions.
If gcc does know what it is doing, then perhaps 14 of them will expand
to no code, or part of a prologue/epilogue, in a useful sequence. Gas
will not know what gcc was up to when it executed, in the recent past,
except by what it passes in its output.

> dunno if GAS can infer whether JMP is local or non-local.

If it's one of its recognised local labels. Macro-local symbols may be
defined by a "LOCAL" directive, and local scope symbols by use of '@' or
concatenating a parameter suffix to part-label. There are also the
numbered local labels. It knows and uses local symbols beginning with
'L', and omits them from the symbol table, IIRC.

If the JMP destination is external, then ld will handle the linking.
That's well outside the remit of gas. If any relaxation is hoped for,
then that will be also provided by ld, if available.

> We could add a new GCC pass that filters out situations that are pointless
> to scan like code with dispatch tables or function calls, and fall back to
> classical prologue / epilogue in such cases.

Is this alluding to some sort of demultiplexer in an ISR?
It seems odd that such specificity would be suited to general treatment
by the toolchain. It sounds intriguing.

> The .maybe gets function-unique identifiers (123 in the example) so that GAS
> knows which epilogue belongs to which .prologue provided that's helpful.

That greatly simplifies the work to be done in gas. Now linking prologue
with epilogue is effortless. But as an ISR will be contained in one
compile unit, and no other ISR will sanely be nested, is that required?
(I may have some iuse-case catching up to do here.)

> I am not familiar with Binutils / GAS though and don't know if it's easy to
> add the 2 new passes: One to scan and one to replace the .maybe with
> appropriate code.

Neither appear necessary. Gas makes the substitutions as is.

> IIUC GAS only works on sections,

Sorry, no, it only works on compile units, i.e a source file and its
includes. It knows little else. Its relationship to sections is that
understands .section directives to the extent that it will put code into
whichever section is named in one. It even has a section stack, so that
it can pop back to the prior section after an excursion into another.

> and the scan would be on BFD internal representation (like relaxing)
> after the parser read in the asm sources?

Gas takes assembler source code text as input, and generates an ELF
relocatable object file. It is only ld which can perform relaxation.

> The GCC change would add a new option, configure test whether GAS supports
> this, let ISR prologue and epilogue emit new unspec_volatile pseudo insns

Hmmm, "unspec_volatile" doesn't appear in the described use case. If it
is desired to communicate some state or value to gas, it is only
necessary to issue:

   _maybe_isr_prologue 123 _here_be_elephants 42

and the _maybe_isr_prologue macro can:

   .set have_elephants_123 1
   .set elephant_count_123 42

and make decisions based on that for the remainder of the compile unit.
The only caveat is that here_be_elephants will have to have a defined
value, most easily provided in an always-included header, perhaps.

> and add a scan pass to detect situations that are pointless should fall back
> to old code, like when dispatch tables, calls or non-local goto is seen.

The examples provided are within existing gas capability. Let's go with
that to begin with. Complexity is like Murphy, and arrives soon enough,
without being invited.

Erik


_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Speeding up small ISRs: PR20296

Georg-Johann Lay-2
Erik Christiansen schrieb:

> Reply-To: [hidden email]
>
> On 15.06.17 14:43, Georg-Johann Lay wrote:
>> https://gcc.gnu.org/PR20296
>>
>> is about speeding up "small" ISRs, and is open for 12 years now...
>>
>> Anyone familiar with avr-gcc knows that a fix would be high effort and risk,
>> and that's the major reason for why PR20296 is still open (and even
>> classified "suspended").
>>
>> In some forum discussion (again!) on that issue, there was the following
>> proposal to approach that PR:
>
> Reading the PR causes me to infer that moving that code generation out
> of gcc into gas is the proposed fix for unmanageable optimisation
> complexity in gcc in the use case.

Allow me not to go into GCC details how...

>> 1) Let GCC emit directives / pseudo-instructions in non-naked ISR prologue /
>> epilogue
>
> If only existing directives and macro invocations are emitted, then the
> need to modify gas code is obviated. I.e. if:
>
>   .maybe_isr_prologue 123       ; were instead:
>    maybe_isr_prologue 123

What I meant is a pseudo instruction.  Something that's handled like an
instruction but not available in hardware, indicated by a leading ".".
If you don't like that "." just drop it, it's just sugar.

> then a gas macro could generate the required code without unnecessary
> complexity. If the desired code to be generated from the parameters

Can you make this explicit?  How can a macro know whether the following
code, hundreds of insns, clobbers tmp_reg or SREG?

Macro expansion runs prior to instruction parse.

> supplied can be described, then I'll write the macro(s). After some
> iterations, we should have some good results for some useful use cases.

Maybe I don't get your point.  Can you provide such a macro?

>> 2) Let GAS scan the code and replace the directives with code as needed.
>> Currently,
>>
>> #include <avr/io.h>
>> #include <avr/interrupt.h>
>>
>> ISR (INT0_vect)
>> {
>>     __asm ("; Code");
>> }
>>
>> emit something like:
>>
>>
>> __vector_1:
>> push r1
>> push r0
>> in r0,__SREG__
>> push r0
>> clr __zero_reg__
>> .L__stack_usage = 3
>>
>> ; Code
>>
>> pop r0
>> out __SREG__,r0
>> pop r0
>> pop r1
>> reti
>>
>>
>> which would change to:
>>
>>
>> __vector_1:
>> .maybe_isr_prologue 123
>> ;; Rest of prologue
>>
>> ; Code
>>
>> ;; Rest of epilogue
>> .maybe_isr_epilogue 123
>> reti
>>
>> GAS would then scan the code associated to the function and replace the
>> .maybe by appropriate sequence to safe / init / restore tmp-reg, zero-reg
>> and SREG.
>
> That sets things up handily for finishing by simple macro. Let us say
> that gcc emits "_maybe_isr_prologue 1 2 3", then 1 could be the switch
> for save, 2 for init, and 3 for restore, if desired. Gas macros readily
> handle omission of the last parameter (with it then taking an internally
> defined default value), which can be useful if gas knows the default,
> and gcc doesn't. Lumping it all into a single parameter would lead to 8
> parameter values, just to cover 3 binary switches, IIUC the use case.

The "123" was just intended as tag (in the case that's helpful for GAS).
    To be more precise, consider

;; Start of ISR1
__vector_1:
        .maybe_isr_prologue 123
        ;; Rest of prologue
        ; CodeA
        ;; Rest of epilogue
        .maybe_isr_epilogue 123
        reti
        ; CodeB
        ;; Rest of epilogue
        .maybe_isr_epilogue 123
        reti
        ; CodeC
;; End of ISR1

If CodeA+CodeB+CodeC clobber SREG, Rzero, Rtmp then .maybe_isr_prologue
shall expand to

        push r1
        push r0
        in r0,__SREG__
        push r0
        clr __zero_reg__

If CodeA+CodeB+CodeC clobber SREG, Rzero but not Rtmp then
.maybe_isr_prologue shall expand to

        push r1
        in r1,__SREG__
        push r1
        clr __zero_reg__

If CodeA+CodeB+CodeC clobber SREG but neither Rzero nor Rtmp then
.maybe_isr_prologue shall expand to (this might be optimized further)

        push r0
        in r0,__SREG__
        push r0

If CodeA+CodeB+CodeC clobber Rtmp but neither Rzero nor SREG then
.maybe_isr_prologue shall expand to

        push r0

If CodeA+CodeB+CodeC clobber neither Rtmp nor Rzero nor SREG then
.maybe_isr_prologue shall expand to

        ;; Empty

etc. and with epilogues (2 in this case) accordingly.


>> Other registers like R24 are handled by GCC as usual.  For example, if
>> the scan reveals that tmp-reg is not needed but zero-reg is (which
>> will imply SREG due to the CLR) the replacement code would be:
>
>> __vector_1:
>> push r1
>> in r1,__SREG__
>> push r1
>> clr __zero_reg__
>>
>> ; Code
>>
>> pop r1
>> out __SREG__,r1
>> pop r1
>> reti
>
> An epilogue macro can be made to know whether its matching prologue
> saved tmp-reg, even if that is stretching assembler macros slightly.
> That would not require any additional code scan. So long as nesting ISRs

Can you give an example?  In particular, how the prologue macro gains
knowledge about whether tmp-reg needs to be handled or not, i.e. how it
draws that information from "Code"?

> is illegal, then it would not even clutter the gas symbol table
> perceptibly.

ISR nesting is no issue.  ISR nesting occurs dynamically, not during
statically.

>> Maybe someone is interested in implementing the GAS part, and if that is the
>> case and the proposal is feasible, I would take care of the GCC part.
>
> I propose that we minimise toolchain modification by choosing an elegant
> implementation, based on existing gas capabilities, if feasible. Thus
> far, I have not seen any proposed code generation which ought not be
> achievable that way.

I've seen such examples, e.g. working around specific kind of silicon
bugs might be considerably less effort in GAS compared to GCC.

>> Caveats:
>>
>> a) .L__stack_usage can no more be computed by GCC
>
> It is no effort for gas to implement lines like: .L__stack_usage = 3
> As we know exactly how many bytes we are adding to the stack frame, we
> can effortlessly dimension and emit that line - and yes, it is gas which
> converts that 'L' into a unique integer. Whatever code later uses all the
> .nnnn__stack_usage sizes should continue to work as before.
>
>> b) It's hard to find the end of the relevant code.  We might have
>> interleaved sections (like with dispatch tables), there might be code that
>> is emit after the epilogue, there might be more than 1 epilogue,
>
> If gcc doesn't know what it is doing, then gas can't fix that part. ;-)

GAS can fix it in principle, and it would be orders of magnitude less
work than in GCC.  That's the point of the proposal.

>> dunno if GAS can infer whether JMP is local or non-local.
>
> If it's one of its recognised local labels. Macro-local symbols may be
> defined by a "LOCAL" directive, and local scope symbols by use of '@' or
> concatenating a parameter suffix to part-label. There are also the

I am not talking about assembly input. I am talking about GAS internals.

> numbered local labels. It knows and uses local symbols beginning with
> 'L', and omits them from the symbol table, IIRC.
>
> If the JMP destination is external, then ld will handle the linking.
> That's well outside the remit of gas. If any relaxation is hoped for,
> then that will be also provided by ld, if available.

GAS might also relax, and any pseudo-instruction expansion must not
occur after GAS relaxing.  IIUC any GAS relaxing is forwarded to LD
since --mlink-relax, hence that option would have to be forced to be
always on provided pseudo expansion would have to be run after GAS relaxing.

>> The .maybe gets function-unique identifiers (123 in the example) so that GAS
>> knows which epilogue belongs to which .prologue provided that's helpful.
>
> That greatly simplifies the work to be done in gas. Now linking prologue
> with epilogue is effortless. But as an ISR will be contained in one
> compile unit, and no other ISR will sanely be nested, is that required?

As said, ISR nesting doesn't occur statically.

>> I am not familiar with Binutils / GAS though and don't know if it's easy to
>> add the 2 new passes: One to scan and one to replace the .maybe with
>> appropriate code.
>
> Neither appear necessary. Gas makes the substitutions as is.

hä? Would you make that explicit?  Like a macro that catches all 5
situations I outlined above, for any CodeA+CodeB+CodeC ?

>
>> IIUC GAS only works on sections,
>
> Sorry, no, it only works on compile units, i.e a source file and its
> includes. It knows little else. Its relationship to sections is that
> understands .section directives to the extent that it will put code into
> whichever section is named in one. It even has a section stack, so that
> it can pop back to the prior section after an excursion into another.

Ya, I know.  My questions however are not re. the GAS front, but re.
internal working and representation, passes etc.

>> and the scan would be on BFD internal representation (like relaxing)
>> after the parser read in the asm sources?
>
> Gas takes assembler source code text as input, and generates an ELF
> relocatable object file. It is only ld which can perform relaxation.

That's GAS as a black box.  Internally, they are using BFD, no? (Except
macro expansion and parse input which are strings of course).

>> The GCC change would add a new option, configure test whether GAS supports
>> this, let ISR prologue and epilogue emit new unspec_volatile pseudo insns
>
> Hmmm, "unspec_volatile" doesn't appear in the described use case. If it

unspec_volatile is a GCC internal device.  You'll never see it except
with internal dumps like with -fdump-rtl-all.

>> and add a scan pass to detect situations that are pointless should fall back
>> to old code, like when dispatch tables, calls or non-local goto is seen.
>
> The examples provided are within existing gas capability. Let's go with

Okay. I missed that.  I just don't see how to do it.

Johann


_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Speeding up small ISRs: PR20296

Erik Christiansen-2
TL;DR: The meat (one lower-effort solution) follows some resynchronisation
       of understanding of the problem, and what gcc can do.

On 17.06.17 15:41, Georg-Johann Lay wrote:

> Erik Christiansen schrieb:
> > Reply-To: [hidden email]
> > > 1) Let GCC emit directives / pseudo-instructions in non-naked ISR prologue /
> > > epilogue
> >
> > If only existing directives and macro invocations are emitted, then the
> > need to modify gas code is obviated. I.e. if:
> >
> >   .maybe_isr_prologue 123       ; were instead:
> >    maybe_isr_prologue 123
...
> > then a gas macro could generate the required code without unnecessary
> > complexity. If the desired code to be generated from the parameters
>
> Can you make this explicit?  How can a macro know whether the following
> code, hundreds of insns, clobbers tmp_reg or SREG?

There's been some mutual misunderstanding: As mentioned above, a macro
can generate code, guided by parameters and information squirrelled into
the symbol table. I didn't find mention of gas look-ahead in the OP, and
so had only discussed forward information flow. That's fixed below.

> Macro expansion runs prior to instruction parse.

Yes, I have written a two-pass assembler, but the only look-ahead there
was for computing forward jump offsets. I've not delved into gas
internals (merely making heavy use of its capabilities, especially
macros), but adding clobber-list collation for multiple ISRs could
perhaps be a feasible addition to gas. (An easier, less intrusive method
also exists, however.)

...

> To be more precise, consider

> ;; Start of ISR1
> __vector_1:
> .maybe_isr_prologue 123
> ;; Rest of prologue
> ; CodeA
> ;; Rest of epilogue
> .maybe_isr_epilogue 123
> reti
> ; CodeB
> ;; Rest of epilogue
> .maybe_isr_epilogue 123
> reti
> ; CodeC
> ;; End of ISR1
>
> If CodeA+CodeB+CodeC clobber SREG, Rzero, Rtmp then .maybe_isr_prologue
> shall expand to
>
> push r1
> push r0
> in r0,__SREG__
> push r0
> clr __zero_reg__
...

Thank you for the expansion, though the variability of save requirements
is not lost on this coder who writes all his ISRs in assembler. ;-)
(What I had missed was that gcc has clobber lists, but not for this
use case - despite all its passes.)

Lets pause to consider another two-part approach; one which avoids gas
hacking, and avoids the need for look-ahead in either gas or gcc, but
achieves that in the toolchain as a whole:

1) Let GCC emit tagged macro invocations in non-naked ISR prologue /
                ------------------------
   epilogue. (I.e. Proposal basically unchanged so far, but no gas
   rewrite needed if it can still do the work.)

2) Let gcc also write ".set _save_r1_123 1" to a temporary header file
   if the ISR tagged 123 clobbers r1. Ditto for other registers.
   (Opening a second temporary file for write is no effort for gcc,
   which is already writing its output to a temporary .o file for gas to
   process.) Gcc also prefixes ".include temporary_header_file" to the
   .o file. (The .o files are also usually temporary, retained only with
   -save-temps or -S.)

3) Let the _maybe_isr_prologue macro be designed so that
   "maybe_isr_prologue 123" will generate a "push r1" if _save_r1_123 is
   set, and "maybe_isr_epilogue 123" will then in turn generate a "pop r1".
   Ditto for the other registers. (Very simple macros.)

   So long as there is a "maybe_isr_prologue 123" before each reti, then
   it does not matter how many of them there are. Alternatively, gcc
   could just emit "oe_reti 123" (i.e. optimised-epilogue reti) for less
   clutter in the gcc output. (A neater "look", but insufficiently
   explicit for a casual reader?)

   To see what gas is doing there, assembly with -alcmns will include
   macro expansions in the listing, cropping untrue conditionals.

4) Let the two _maybe_isr_prologue/epilogue macro definitions be added
   to an always implicitly included header. (That's less work than
   having gcc copy the text to the extra temporary header.)

What we have thereby done is synthesise a code generation loop without
hacking gas at all, and without having to solve the look-ahead dilemma
within gcc, or do a great deal more work there than before. This "whole
toolchain" solution is minimally intrusive code-wise.

Including at the head of the standard gcc output file, information which
gcc only knows at the end of one of its passes, gives gas free look-ahead
without having to hack extra passes into its structure. I am uncertain
that that would be possible within the avr-gas modules, and shudder at
the prospect of suggesting a core redesign to binutils maintainers.

There may at some future date be advantages in retaining all clobber
detection wholly within gcc, rather than letting it fragment throughout
the toolchain. If any tweaking were ever necessary, then it could all be
done in one place - and keeping it coherent also remains simpler.

In any event, with look-ahead handled, the old ambiguity problem is
gone, AIUI.

Erik


_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Speeding up small ISRs: PR20296

Erik Christiansen-2
In reply to this post by Georg-Johann Lay-2
On 15.06.17 14:43, Georg-Johann Lay wrote:
> https://gcc.gnu.org/PR20296
>
> is about speeding up "small" ISRs, and is open for 12 years now...
>
> Anyone familiar with avr-gcc knows that a fix would be high effort and risk,
> and that's the major reason for why PR20296 is still open (and even
> classified "suspended").
...

Johann, I'll be off the net for about a week from late tonight (GMT+10),
so unfortunately not able to respond immediately on any complications
arising. (They're always there, in anything which has remained unsolved
for 12 years.)

A low-effort solution seems needed here, because it is so easy to avoid
all the complexity by just replacing the long-winded ugly and convoluted
'C' in PR20296:

void SIG_PIN_CHANGE0 (void) __attribute__ ((signal)); void SIG_PIN_CHANGE0 (void)
{
  (*(volatile unsigned char *)((0x12) + 0x20)) |= 1;
}

with a simple and elegant assembler file:

>>
   .func SIG_PIN_CHANGE0
   .global SIG_PIN_CHANGE0

   sbi 0x12, 0
   reti
<<

put it through gas, and just link its .o file in with all those produced
from 'C'. All finished and done - without any hassles or toolchain
modification.

Call me pragmatic, but I see two classes of user: those who don't
understand the two lines of assembler, and won't know or fuss about the
inefficient gcc overprotection; and those who do know enough to fuss,
and can therefore type two lines of assembler, a global declaration, and
add to their makefile a:

%.o: %.s
   $(AS) -I$(INC_DIR) $(ASFLAGS) -o $(OBJDIR)/$@ $<

To produce the ISR .o file.

If their linking makefile target uses *.o, then the new one is linked
automatically. I tend to list object files explicitly, as I omit some
during development, e.g.:

target: init.o encoder.o lcd.o os.o keyboard.o ui.o mathlib.o
      ( cd $(OBJDIR) ; $(CC) $(LDFLAGS) $(CFLAGS) -o $@.elf $^ > map )
      ...

Either way, there's not much to it, and there is never any unexpected
code padding. That's a big part of the reason why it seems that only a
minimally intrusive (and labour intensive) toolchain effort is
warranted.

Erik

_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Speeding up small ISRs: PR20296

Thomas D. Dean
On 06/19/2017 10:48 PM, Erik Christiansen wrote:

> On 15.06.17 14:43, Georg-Johann Lay wrote:
>> https://gcc.gnu.org/PR20296
>>
>> is about speeding up "small" ISRs, and is open for 12 years now...
>>
>> Anyone familiar with avr-gcc knows that a fix would be high effort and risk,
>> and that's the major reason for why PR20296 is still open (and even
>> classified "suspended").
> ...
>
> Johann, I'll be off the net for about a week from late tonight (GMT+10),
> so unfortunately not able to respond immediately on any complications
> arising. (They're always there, in anything which has remained unsolved
> for 12 years.)
>
> A low-effort solution seems needed here, because it is so easy to avoid
> all the complexity by just replacing the long-winded ugly and convoluted
> 'C' in PR20296:
>
> void SIG_PIN_CHANGE0 (void) __attribute__ ((signal)); void SIG_PIN_CHANGE0 (void)
> {
>    (*(volatile unsigned char *)((0x12) + 0x20)) |= 1;
> }
>
> with a simple and elegant assembler file:
>
>>>
>     .func SIG_PIN_CHANGE0
>     .global SIG_PIN_CHANGE0
>
>     sbi 0x12, 0
>     reti
> <<
>
> put it through gas, and just link its .o file in with all those produced
> from 'C'. All finished and done - without any hassles or toolchain
> modification.
>
> Call me pragmatic, but I see two classes of user: those who don't
> understand the two lines of assembler, and won't know or fuss about the
> inefficient gcc overprotection; and those who do know enough to fuss,
> and can therefore type two lines of assembler, a global declaration, and
> add to their makefile a:
>
> %.o: %.s
>     $(AS) -I$(INC_DIR) $(ASFLAGS) -o $(OBJDIR)/$@ $<
>
> To produce the ISR .o file.
>
> If their linking makefile target uses *.o, then the new one is linked
> automatically. I tend to list object files explicitly, as I omit some
> during development, e.g.:
>
> target: init.o encoder.o lcd.o os.o keyboard.o ui.o mathlib.o
>        ( cd $(OBJDIR) ; $(CC) $(LDFLAGS) $(CFLAGS) -o $@.elf $^ > map )
>        ...
>
> Either way, there's not much to it, and there is never any unexpected
> code padding. That's a big part of the reason why it seems that only a
> minimally intrusive (and labour intensive) toolchain effort is
> warranted.
>
> Erik

 > cat avr-sig.c
#include <io.h>
#include <interrupt.h>

ISR(PCINT0_vect,ISR_NAKED ){
   (*(volatile unsigned char *)((0x12) + 0x20)) |= 1;
   reti();
}


 > avr-gcc -O3 -mmcu=atmega165 -S avr-sig.c  -I /usr/lib/avr/include/avr/

 > cat avr-sig.s
         .file   "avr-sig.c"
__SP_H__ = 0x3e
__SP_L__ = 0x3d
__SREG__ = 0x3f
__tmp_reg__ = 0
__zero_reg__ = 1
         .text
.global __vector_2
         .type   __vector_2, @function
__vector_2:
/* prologue: naked */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
         sbi 0x12,0
/* #APP */
  ;  6 "avr-sig.c" 1
         reti
  ;  0 "" 2
/* epilogue start */
/* #NOAPP */
         .size   __vector_2, .-__vector_2
         .ident  "GCC: (GNU) 4.9.2"

_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Speeding up small ISRs: PR20296

Georg-Johann Lay-2
In reply to this post by Georg-Johann Lay-2
On 15.06.2017 14:43, Georg-Johann Lay wrote:

> https://gcc.gnu.org/PR20296
>
> is about speeding up "small" ISRs, and is open for 12 years now...
>
> Anyone familiar with avr-gcc knows that a fix would be high effort and
> risk, and that's the major reason for why PR20296 is still open (and
> even classified "suspended").
>
> In some forum discussion (again!) on that issue, there was the following
> proposal to approach that PR:
>
> 1) Let GCC emit directives / pseudo-instructions in non-naked ISR
> prologue / epilogue
>
> 2) Let GAS scan the code and replace the directives with code as needed.
>
> Currently,
>
> #include <avr/io.h>
> #include <avr/interrupt.h>
>
> ISR (INT0_vect)
> {
>      __asm ("; Code");
> }
>
> emit something like:
>
>
> __vector_1:
>      push r1
>      push r0
>      in r0,__SREG__
>      push r0
>      clr __zero_reg__
> .L__stack_usage = 3
>
>      ; Code
>
>      pop r0
>      out __SREG__,r0
>      pop r0
>      pop r1
>      reti
>
>
> which would change to:
>
>
> __vector_1:
>      .maybe_isr_prologue 123
>      ;; Rest of prologue
>
>      ; Code
>
>      ;; Rest of epilogue
>      .maybe_isr_epilogue 123
>      reti
>
> GAS would then scan the code associated to the function and replace the
> .maybe by appropriate sequence to safe / init / restore tmp-reg,
> zero-reg and SREG.  Other registers like R24 are handled by GCC as
> usual.  For example, if the scan reveals that tmp-reg is not needed but
> zero-reg is (which will imply SREG due to the CLR) the replacement code
> would be:
>
>
> __vector_1:
>      push r1
>      in r1,__SREG__
>      push r1
>      clr __zero_reg__
>
>      ; Code
>
>      pop r1
>      out __SREG__,r1
>      pop r1
>      reti
>
>
> Maybe someone is interested in implementing the GAS part, and if that is
> the case and the proposal is feasible, I would take care of the GCC part.
>
> Caveats:
>
> a) .L__stack_usage can no more be computed by GCC
>
> b) It's hard to find the end of the relevant code.  We might have
> interleaved sections (like with dispatch tables), there might be code
> that is emit after the epilogue, there might be more than 1 epilogue,
> dunno if GAS can infer whether JMP is local or non-local.
>
> We could add a new GCC pass that filters out situations that are
> pointless to scan like code with dispatch tables or function calls, and
> fall back to classical prologue / epilogue in such cases.
>
> The .maybe gets function-unique identifiers (123 in the example) so that
> GAS knows which epilogue belongs to which .prologue provided that's
> helpful.
>
> I am not familiar with Binutils / GAS though and don't know if it's easy
> to add the 2 new passes: One to scan and one to replace the .maybe with
> appropriate code.  IIUC GAS only works on sections, and the scan would
> be on BFD internal representation (like relaxing) after the parser read
> in the asm sources?

FYI, I just went ahead any typed down these lines for GAS.


If someone wants to give it a try, it's here:

https://sourceware.org/bugzilla/show_bug.cgi?id=21683#c2

Binutils manual didn't catch up with master yet.  If so, you should
see a new "AVR Pseudo Instructions" entry in

https://sourceware.org/binutils/docs/as/AVR_002dDependent.html

that adds some documentation / specification.

Johann


> The GCC change would add a new option, configure test whether GAS
> supports this, let ISR prologue and epilogue emit new unspec_volatile
> pseudo insns and add a scan pass to detect situations that are pointless
> should fall back to old code, like when dispatch tables, calls or
> non-local goto is seen.
>
> Johann

_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Speeding up small ISRs: PR20296

Georg-Johann Lay-2
Georg-Johann Lay schrieb:

> On 15.06.2017 14:43, Georg-Johann Lay wrote:
>> https://gcc.gnu.org/PR20296
>>
>> is about speeding up "small" ISRs, and is open for 12 years now...
>>
>> Anyone familiar with avr-gcc knows that a fix would be high effort and
>> risk, and that's the major reason for why PR20296 is still open (and
>> even classified "suspended").
>>
>> In some forum discussion (again!) on that issue, there was the
>> following proposal to approach that PR:
>>
>> 1) Let GCC emit directives / pseudo-instructions in non-naked ISR
>> prologue / epilogue
>>
>> 2) Let GAS scan the code and replace the directives with code as needed.
>>
>> Currently,
>>
>> #include <avr/io.h>
>> #include <avr/interrupt.h>
>>
>> ISR (INT0_vect)
>> {
>>      __asm ("; Code");
>> }
>>
>> emit something like:
>>
>>
>> __vector_1:
>>      push r1
>>      push r0
>>      in r0,__SREG__
>>      push r0
>>      clr __zero_reg__
>> .L__stack_usage = 3
>>
>>      ; Code
>>
>>      pop r0
>>      out __SREG__,r0
>>      pop r0
>>      pop r1
>>      reti
>>
>>
>> which would change to:
>>
>>
>> __vector_1:
>>      .maybe_isr_prologue 123
>>      ;; Rest of prologue
>>
>>      ; Code
>>
>>      ;; Rest of epilogue
>>      .maybe_isr_epilogue 123
>>      reti
>>
>> GAS would then scan the code associated to the function and replace
>> the .maybe by appropriate sequence to safe / init / restore tmp-reg,
>> zero-reg and SREG.  Other registers like R24 are handled by GCC as
>> usual.  For example, if the scan reveals that tmp-reg is not needed
>> but zero-reg is (which will imply SREG due to the CLR) the replacement
>> code would be:
>>
>>
>> __vector_1:
>>      push r1
>>      in r1,__SREG__
>>      push r1
>>      clr __zero_reg__
>>
>>      ; Code
>>
>>      pop r1
>>      out __SREG__,r1
>>      pop r1
>>      reti
>>
>>
>> Maybe someone is interested in implementing the GAS part, and if that
>> is the case and the proposal is feasible, I would take care of the GCC
>> part.
>>
>> Caveats:
>>
>> a) .L__stack_usage can no more be computed by GCC
>>
>> b) It's hard to find the end of the relevant code.  We might have
>> interleaved sections (like with dispatch tables), there might be code
>> that is emit after the epilogue, there might be more than 1 epilogue,
>> dunno if GAS can infer whether JMP is local or non-local.
>>
>> We could add a new GCC pass that filters out situations that are
>> pointless to scan like code with dispatch tables or function calls,
>> and fall back to classical prologue / epilogue in such cases.
>>
>> The .maybe gets function-unique identifiers (123 in the example) so
>> that GAS knows which epilogue belongs to which .prologue provided
>> that's helpful.
>>
>> I am not familiar with Binutils / GAS though and don't know if it's
>> easy to add the 2 new passes: One to scan and one to replace the
>> .maybe with appropriate code.  IIUC GAS only works on sections, and
>> the scan would be on BFD internal representation (like relaxing) after
>> the parser read in the asm sources?
>
> FYI, I just went ahead any typed down these lines for GAS.
>
>
> If someone wants to give it a try, it's here:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=21683#c2
>
>> The GCC change would add a new option, configure test whether GAS
>> supports this, let ISR prologue and epilogue emit new unspec_volatile
>> pseudo insns and add a scan pass to detect situations that are
>> pointless should fall back to old code, like when dispatch tables,
>> calls or non-local goto is seen.

The according avr-gcc feature is also upstream now.  It adds a new
option -m[no-]gas-isr-prologues and a new function attribute to disable
__gcc_isr generation for individual ISRs:

http://gcc.gnu.org/onlinedocs/gcc/AVR-Function-Attributes.html

If you want to play around with it, you need Binutils that implement
PR20296 (e.g. Binutils GIT master or upcoming 2.29) and avr-gcc that
implements PR81268 (GCC SVN trunk r250093 or newer).

The feature is enabled per default for all optimization levels except
for -O0 and -Og.

Johann



_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Speeding up small ISRs: PR20296

Erik Christiansen-2
On 14.07.17 19:18, Georg-Johann Lay wrote:

> The according avr-gcc feature is also upstream now.  It adds a new option
> -m[no-]gas-isr-prologues and a new function attribute to disable __gcc_isr
> generation for individual ISRs:
>
> http://gcc.gnu.org/onlinedocs/gcc/AVR-Function-Attributes.html
>
> If you want to play around with it, you need Binutils that implement PR20296
> (e.g. Binutils GIT master or upcoming 2.29) and avr-gcc that implements
> PR81268 (GCC SVN trunk r250093 or newer).
>
> The feature is enabled per default for all optimization levels except for
> -O0 and -Og.

Thank you, Johann, for all the work I've seen coming in on binutils and
gcc-patches. The stuff I have in hand still uses assembler ISRs, easily
linkable with C code. After 30 years of it, I still find it easier to
write an ISR in assembler than C - and I know where each part microsecond
is expended, so latency is known in advance.

It is very encouraging, though, to see continuing development of the AVR
toolchain.

Erik

_______________________________________________
AVR-GCC-list mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list