Some optimizations for the hardware divider (#1033)

* Remove unnecessary wait in pico_divider.

There is no need to wait if there is more than 8 cycles between setup and result readout.
Dividend/divisor readout should be correct without delay. Update comment to reflect that.

* Optimize hw_divider_save_state/hw_divider_restore_state.

Doing multiple pushes to avoid stack usage is faster.
The wait loop in hw_divider_save_state had an incorrect branch in the wait loop.
This didn't matter since the wait wasn't necessary to begin with.

* Remove pointless aligns in hardware_divider.

The regular_func_with_section inserts a new section so if aligning
is desired it should be placed in the macro after section start.

* Save a few bytes in hardware_divider.

Signed and unsigned code can use the same exit code.
Branching to the common code is free since we need the 8 cycle
delay anyway.
This commit is contained in:
Peter Pettersson
2022-10-17 00:40:22 +02:00
committed by GitHub
parent 2d4e3baa82
commit 3bd7a829db
3 changed files with 35 additions and 61 deletions

View File

@ -19,11 +19,10 @@ need to change SHIFT above
#endif
// SIO_BASE ptr in r2; pushes r4-r7, lr to stack
// requires that division started at least 2 cycles prior to the start of the macro
.macro save_div_state_and_lr
// originally we did this, however a) it uses r3, and b) the push takes 6 cycles, b)
// any IRQ which uses the divider will necessarily put the data back, which will
// immediately make it ready
// originally we did this, however a) it uses r3, and b) the push and dividend/divisor
// readout takes 8 cycles, c) any IRQ which uses the divider will necessarily put the
// data back, which will immediately make it ready
//
// // ldr r3, [r2, #SIO_DIV_CSR_OFFSET]
// // // wait for results as we can't save signed-ness of operation
@ -31,7 +30,7 @@ need to change SHIFT above
// // lsrs r3, #SIO_DIV_CSR_READY_SHIFT_FOR_CARRY
// // bcc 1b
// 6 cycles
// 6 cycle push + 2 ldr ensures the 8 cycle delay before remainder and quotient are ready
push {r4, r5, r6, r7, lr}
// note we must read quotient last, and since it isn't the last reg, we'll not use ldmia!
ldr r4, [r2, #SIO_DIV_UDIVIDEND_OFFSET]