### Sqrt

Thu, 05/29/2003 - 00:33

### Forums:

arm compiler does not support sqrt function is thr anyways to to achieve it...when i wrote a my sqrt function the arm compiler does not takes the comparison of longs.to float etc some stupid stufsss.

Has anybody achieved it..

Dips

Thu, 05/29/2003 - 06:49

Do you recommend not using floats because they are slow/inefficient on ARM or because they will cause a crash in some situations?

Thank you,

-Aaron

Do you recommend not using floats because they are slow/inefficient on ARM or because they will cause a crash in some situations?

Thank you,

-Aaron

Thu, 05/29/2003 - 07:29

floating point is not supported by these processor, you can declare a float or double var, but you can´t make any calculations with it, like + - / or *, you can simulate it, take a look at helper functions in the brew api doc, there are functions to + - * and / with double.

[]´s

Marcel

floating point is not supported by these processor, you can declare a float or double var, but you can´t make any calculations with it, like + - / or *, you can simulate it, take a look at helper functions in the brew api doc, there are functions to + - * and / with double.

[]´s

Marcel

Thu, 05/29/2003 - 08:01

As Marcel pointed out you can use BREW helper functions like FADD etc. to do floating point computation.

But I would recommend not to use floating point at all. ARM processor does not have native support for floating point. From your requirement identify fixed point precision requirement and implement fixed point class of your own. If you are developing any performance oriented application you will get performance hit, if possible even avoid division as ARM processor does not have native support for division. Every time you perform a division, it costs you several thousand cycles. Divisions can also be performed with lookup tables.

regards

ruben

As Marcel pointed out you can use BREW helper functions like FADD etc. to do floating point computation.

But I would recommend not to use floating point at all. ARM processor does not have native support for floating point. From your requirement identify fixed point precision requirement and implement fixed point class of your own. If you are developing any performance oriented application you will get performance hit, if possible even avoid division as ARM processor does not have native support for division. Every time you perform a division, it costs you several thousand cycles. Divisions can also be performed with lookup tables.

regards

ruben

Thu, 05/29/2003 - 08:27

"several thousand cycles" is an overstatement. According to the Arm Application Note 34: Writing Efficient C for ARM (http://www.arm.com/support/567GCF/$File/DAI0034A_efficient-c.pdf), integer division can take 20-140 cycles, depending on the numerator and denominator. Still not a good idea for an interactive game, but it really depends on the application.

I couldn't find a cycle count estimate for floating point math on the ARM web site.

"several thousand cycles" is an overstatement. According to the Arm Application Note 34: Writing Efficient C for ARM (http://www.arm.com/support/567GCF/$File/DAI0034A_efficient-c.pdf), integer division can take 20-140 cycles, depending on the numerator and denominator. Still not a good idea for an interactive game, but it really depends on the application.

I couldn't find a cycle count estimate for floating point math on the ARM web site.

Thu, 05/29/2003 - 09:01

Sorry it was my typo. Instead of typing "several cycles" I have mistyped as "several thousand cycles". On the side note I would like to provide following information, which will give you some kind of idea how expensive divisions are.

A 132 MHz ARM is theoretically capable of performing 132 million instructions (or 264 million instructions if 50 % of them are shifts). But you are maxing the CPU out at about 70000 divisions per second. If your game runs at 70 frames per second, that means that you are using the processor to its maximum if you perform just 1,000 divisions for every frame you draw.

regards

ruben

Sorry it was my typo. Instead of typing "several cycles" I have mistyped as "several thousand cycles". On the side note I would like to provide following information, which will give you some kind of idea how expensive divisions are.

A 132 MHz ARM is theoretically capable of performing 132 million instructions (or 264 million instructions if 50 % of them are shifts). But you are maxing the CPU out at about 70000 divisions per second. If your game runs at 70 frames per second, that means that you are using the processor to its maximum if you perform just 1,000 divisions for every frame you draw.

regards

ruben

Thu, 05/29/2003 - 23:21

If you want it to be really fast, here's an assembly implementation that comes from a Gameboy Advance mailing list. Since the GBA also uses an ARM processor, the routine works on Brew phones just as well. Becasue it is written for the GBA, the assembler syntax may require some tweaking, though. I have not tested this code myself, but I thought I'd share it here, so that others can give it a shot.

Keep in mind however, that even with an optimized assembly implementation such as this, sqrt is a very slow process.

sqrt32 takes about 220-230 CPU clocks

sqrt16 takes about 130-140 CPU clocks

==========================================

//calculates the square root of number (0..0xffffffff)

u32 sqrt32(u32 number);

//calculates the square root of number (0..0xffff)

u32 sqrt16(u32 number);

==========================================

=========== sqrt32.s =====================

.ARM

.GLOBL sqrt32

.TYPE sqrt32, function

@ fast 32 bit square root

@

@ (c) 2003, Vivid^Brainwave team

@

@ In:

@ R0 - argument (0..0xffffffff)

@ Output:

@ R0 - square root of

@

@ Used registers: R0 - R3

sqrt32:

cmp R0, #0x00

bxeq LR

mov R3, #0x10

.sqrt32_0:

cmp R0, #0x40000000

movcc R0, R0, lsl #0x02

subcc R3, R3, #1

bcc .sqrt32_0

movs R1, R0, lsl #0x10

mov R0, R0, lsr #0x10

mov R2, #0x00

beq .sqrt32_2

.sqrt32_1:

add R2, R2, #0x4000

subs R0, R0, R2

addcc R0, R0, R2

sub R2, R2, #0x4000

add R2, R2, R2

addcs R2, R2, #0x10000

mov R0, R0, lsl #0x02

add R0, R0, R1, lsr #0x1e

movs R1, R1, lsl #0x02

beq .sqrt32_3

subs R3, R3, #0x01

bne .sqrt32_1

mov R0, R2, lsr #0x10

bx LR

.sqrt32_2:

add R2, R2, #0x4000

subs R0, R0, R2

addcc R0, R0, R2

sub R2, R2, #0x4000

add R2, R2, R2

addcs R2, R2, #0x10000

mov R0, R0, lsl #0x02

.sqrt32_3:

subs R3, R3, #0x01

bne .sqrt32_2

mov R0, R2, lsr #0x10

bx LR

.LTORG

==========================================

============= sqrt16.s ===================

.ARM

.GLOBL sqrt16

.TYPE sqrt16, function

@ fast 16 bit square root

@

@ (c) 2003, Vivid^Brainwave team

@

@ In:

@ R0 - number (0..0xffff)

@ Output:

@ R0 - square root of

@

@ Used registers: R0 - R2

sqrt16:

cmp R0, #0x00

bxeq LR

mov R2, #8

.sqrt16_0:

cmp R0, #0x4000

movcc R0, R0, lsl #0x02

subcc R2, R2, #1

bcc .sqrt16_0

mov R1, #0x00

.sqrt16_1:

add R1, R1, #0x4000

subs R0, R0, R1

addcc R0, R0, R1

sub R1, R1, #0x4000

add R1, R1, R1

addcs R1, R1, #0x10000

mov R0, R0, lsl #0x02

subs R2, R2, #0x01

bne .sqrt16_1

mov R0, R1, lsr #0x10

bx LR

.LTORG

==========================================

If you want it to be really fast, here's an assembly implementation that comes from a Gameboy Advance mailing list. Since the GBA also uses an ARM processor, the routine works on Brew phones just as well. Becasue it is written for the GBA, the assembler syntax may require some tweaking, though. I have not tested this code myself, but I thought I'd share it here, so that others can give it a shot.

Keep in mind however, that even with an optimized assembly implementation such as this, sqrt is a very slow process.

sqrt32 takes about 220-230 CPU clocks

sqrt16 takes about 130-140 CPU clocks

==========================================

//calculates the square root of number (0..0xffffffff)

u32 sqrt32(u32 number);

//calculates the square root of number (0..0xffff)

u32 sqrt16(u32 number);

==========================================

=========== sqrt32.s =====================

.ARM

.GLOBL sqrt32

.TYPE sqrt32, function

@ fast 32 bit square root

@

@ (c) 2003, Vivid^Brainwave team

@

@ In:

@ R0 - argument (0..0xffffffff)

@ Output:

@ R0 - square root of

@

@ Used registers: R0 - R3

sqrt32:

cmp R0, #0x00

bxeq LR

mov R3, #0x10

.sqrt32_0:

cmp R0, #0x40000000

movcc R0, R0, lsl #0x02

subcc R3, R3, #1

bcc .sqrt32_0

movs R1, R0, lsl #0x10

mov R0, R0, lsr #0x10

mov R2, #0x00

beq .sqrt32_2

.sqrt32_1:

add R2, R2, #0x4000

subs R0, R0, R2

addcc R0, R0, R2

sub R2, R2, #0x4000

add R2, R2, R2

addcs R2, R2, #0x10000

mov R0, R0, lsl #0x02

add R0, R0, R1, lsr #0x1e

movs R1, R1, lsl #0x02

beq .sqrt32_3

subs R3, R3, #0x01

bne .sqrt32_1

mov R0, R2, lsr #0x10

bx LR

.sqrt32_2:

add R2, R2, #0x4000

subs R0, R0, R2

addcc R0, R0, R2

sub R2, R2, #0x4000

add R2, R2, R2

addcs R2, R2, #0x10000

mov R0, R0, lsl #0x02

.sqrt32_3:

subs R3, R3, #0x01

bne .sqrt32_2

mov R0, R2, lsr #0x10

bx LR

.LTORG

==========================================

============= sqrt16.s ===================

.ARM

.GLOBL sqrt16

.TYPE sqrt16, function

@ fast 16 bit square root

@

@ (c) 2003, Vivid^Brainwave team

@

@ In:

@ R0 - number (0..0xffff)

@ Output:

@ R0 - square root of

@

@ Used registers: R0 - R2

sqrt16:

cmp R0, #0x00

bxeq LR

mov R2, #8

.sqrt16_0:

cmp R0, #0x4000

movcc R0, R0, lsl #0x02

subcc R2, R2, #1

bcc .sqrt16_0

mov R1, #0x00

.sqrt16_1:

add R1, R1, #0x4000

subs R0, R0, R1

addcc R0, R0, R1

sub R1, R1, #0x4000

add R1, R1, R1

addcs R1, R1, #0x10000

mov R0, R0, lsl #0x02

subs R2, R2, #0x01

bne .sqrt16_1

mov R0, R1, lsr #0x10

bx LR

.LTORG

==========================================

Fri, 05/30/2003 - 17:24

Does anyone know if the ARM Brew Builder optimizes out divides that can be replaced with shifts? Or should we be doing that manually?

Does anyone know if the ARM Brew Builder optimizes out divides that can be replaced with shifts? Or should we be doing that manually?

Fri, 05/30/2003 - 17:53

The ARM compiler definitely is smart enough to optimize divide and multiply by powers of 2. I know that it is also smart about optimizing multipies by other constant numbers (for example x*5 will be turned into a x*4+x). For these cases, I've looked at code output from the compiler to verify this. As for division by other constants, the compiler will link in a special divide routine for dividing by 10. Other than that, I am not sure what it does, but from taking a look at the online ARM documentation I referenced above, I would take an educated guess that it does not do it (see section 3.4 Division by a Constant).

-Aaron

The ARM compiler definitely is smart enough to optimize divide and multiply by powers of 2. I know that it is also smart about optimizing multipies by other constant numbers (for example x*5 will be turned into a x*4+x). For these cases, I've looked at code output from the compiler to verify this. As for division by other constants, the compiler will link in a special divide routine for dividing by 10. Other than that, I am not sure what it does, but from taking a look at the online ARM documentation I referenced above, I would take an educated guess that it does not do it (see section 3.4 Division by a Constant).

-Aaron

Fri, 05/30/2003 - 18:03

I ran a test using the -S option for armcc (which outputs an assembly file instead of an object file so you can see what your code gets compiled into) and it infact does only optimize a divide by a constant power of 2 or a constant 10.

-Aaron

I ran a test using the -S option for armcc (which outputs an assembly file instead of an object file so you can see what your code gets compiled into) and it infact does only optimize a divide by a constant power of 2 or a constant 10.

-Aaron

Fri, 05/30/2003 - 18:06

Great. I had a hunch this was going on because I had a bunch of divides by a multiple of 2 and stuff in some fairly frequently executed loops and the peformance was still good. I was just assuming any modern compiler would optimize this stuff out.

Great. I had a hunch this was going on because I had a bunch of divides by a multiple of 2 and stuff in some fairly frequently executed loops and the peformance was still good. I was just assuming any modern compiler would optimize this stuff out.

Sun, 07/24/2005 - 18:08

I've found an algorithm on the net for calculating sqrt :

float SquareRoot( float Number,float Accuracy)

{

float MaximumValue ;

float MinimumValue;

float CurrentGuess;

float CurrentGuessSquared ;

float CurrentError;

bool CloseEnough = FALSE;

if (Number > 1.0)

{

MaximumValue = Number;

MinimumValue = 0.0;

}

else

{

MaximumValue = 1.0;

MinimumValue = Number;

}

while (!CloseEnough)

{

CurrentGuess = (MaximumValue + MinimumValue) / 2.0;

CurrentGuessSquared = CurrentGuess * CurrentGuess;

CurrentError = abs( Number - CurrentGuessSquared);

if (CurrentError <= Accuracy)

CloseEnough = TRUE;

else

{

if (CurrentGuessSquared >= Number)

MaximumValue = CurrentGuess;

else

MinimumValue = CurrentGuess;

}

}

return CurrentGuess;

}

The acuracy is by example: 0.001

Germ

I've found an algorithm on the net for calculating sqrt :

float SquareRoot( float Number,float Accuracy)

{

float MaximumValue ;

float MinimumValue;

float CurrentGuess;

float CurrentGuessSquared ;

float CurrentError;

bool CloseEnough = FALSE;

if (Number > 1.0)

{

MaximumValue = Number;

MinimumValue = 0.0;

}

else

{

MaximumValue = 1.0;

MinimumValue = Number;

}

while (!CloseEnough)

{

CurrentGuess = (MaximumValue + MinimumValue) / 2.0;

CurrentGuessSquared = CurrentGuess * CurrentGuess;

CurrentError = abs( Number - CurrentGuessSquared);

if (CurrentError <= Accuracy)

CloseEnough = TRUE;

else

{

if (CurrentGuessSquared >= Number)

MaximumValue = CurrentGuess;

else

MinimumValue = CurrentGuess;

}

}

return CurrentGuess;

}

The acuracy is by example: 0.001

Germ

Mon, 01/19/2009 - 09:19

hi guys you can use the following macro

double FSQRT(double x)

Description

This function computes the square root of the floating point number x.

Prototypes

double FSQRT(double x)

Parameters

x: The number whose square root needs to be computed

Return Value

Returns the square root of x.

Comments

None

Side Effects

None

Version

Introduced BREW Client 2.1

Regards,

hi guys you can use the following macro

double FSQRT(double x)

Description

This function computes the square root of the floating point number x.

Prototypes

double FSQRT(double x)

Parameters

x: The number whose square root needs to be computed

Return Value

Returns the square root of x.

Comments

None

Side Effects

None

Version

Introduced BREW Client 2.1

Regards,

Thu, 02/26/2009 - 18:44

FADD is worked

FADD is worked

Thu, 05/29/2003 - 05:18

Non-availability of regular square root function, comparison of long to float etc. is not ARM compiler limitation. The reason is that C-Runtime library requires complete intialization of static data before you can use those function but BREW does not support static data.

For ARM device I would recommend that you avoid dealing with float. Implement your own Fixed point data type class. For square root calculation use lookup table based implementation. I use lookup table based implementation and it works.

regards

ruben

Non-availability of regular square root function, comparison of long to float etc. is not ARM compiler limitation. The reason is that C-Runtime library requires complete intialization of static data before you can use those function but BREW does not support static data.

For ARM device I would recommend that you avoid dealing with float. Implement your own Fixed point data type class. For square root calculation use lookup table based implementation. I use lookup table based implementation and it works.

regards

ruben