Forums | developer.brewmp.com Forums | developer.brewmp.com

Developer

Forums

Forums:

arm compiler does not support sqrt function is thr anyways to to achieve it...when i wrote a my sqrt function the arm compiler does not takes the comparison of longs.to float etc some stupid stufsss.
Has anybody achieved it..

Dips

Non-availability of regular square root function, comparison of long to float etc. is not ARM compiler limitation. The reason is that C-Runtime library requires complete intialization of static data before you can use those function but BREW does not support static data.
For ARM device I would recommend that you avoid dealing with float. Implement your own Fixed point data type class. For square root calculation use lookup table based implementation. I use lookup table based implementation and it works.
regards
ruben

Non-availability of regular square root function, comparison of long to float etc. is not ARM compiler limitation. The reason is that C-Runtime library requires complete intialization of static data before you can use those function but BREW does not support static data.
For ARM device I would recommend that you avoid dealing with float. Implement your own Fixed point data type class. For square root calculation use lookup table based implementation. I use lookup table based implementation and it works.
regards
ruben

Do you recommend not using floats because they are slow/inefficient on ARM or because they will cause a crash in some situations?
Thank you,
-Aaron

Do you recommend not using floats because they are slow/inefficient on ARM or because they will cause a crash in some situations?
Thank you,
-Aaron

floating point is not supported by these processor, you can declare a float or double var, but you can´t make any calculations with it, like + - / or *, you can simulate it, take a look at helper functions in the brew api doc, there are functions to + - * and / with double.
[]´s
Marcel

floating point is not supported by these processor, you can declare a float or double var, but you can´t make any calculations with it, like + - / or *, you can simulate it, take a look at helper functions in the brew api doc, there are functions to + - * and / with double.
[]´s
Marcel

As Marcel pointed out you can use BREW helper functions like FADD etc. to do floating point computation.
But I would recommend not to use floating point at all. ARM processor does not have native support for floating point. From your requirement identify fixed point precision requirement and implement fixed point class of your own. If you are developing any performance oriented application you will get performance hit, if possible even avoid division as ARM processor does not have native support for division. Every time you perform a division, it costs you several thousand cycles. Divisions can also be performed with lookup tables.
regards
ruben

As Marcel pointed out you can use BREW helper functions like FADD etc. to do floating point computation.
But I would recommend not to use floating point at all. ARM processor does not have native support for floating point. From your requirement identify fixed point precision requirement and implement fixed point class of your own. If you are developing any performance oriented application you will get performance hit, if possible even avoid division as ARM processor does not have native support for division. Every time you perform a division, it costs you several thousand cycles. Divisions can also be performed with lookup tables.
regards
ruben

"several thousand cycles" is an overstatement. According to the Arm Application Note 34: Writing Efficient C for ARM (http://www.arm.com/support/567GCF/$File/DAI0034A_efficient-c.pdf), integer division can take 20-140 cycles, depending on the numerator and denominator. Still not a good idea for an interactive game, but it really depends on the application.
I couldn't find a cycle count estimate for floating point math on the ARM web site.

"several thousand cycles" is an overstatement. According to the Arm Application Note 34: Writing Efficient C for ARM (http://www.arm.com/support/567GCF/$File/DAI0034A_efficient-c.pdf), integer division can take 20-140 cycles, depending on the numerator and denominator. Still not a good idea for an interactive game, but it really depends on the application.
I couldn't find a cycle count estimate for floating point math on the ARM web site.

Sorry it was my typo. Instead of typing "several cycles" I have mistyped as "several thousand cycles". On the side note I would like to provide following information, which will give you some kind of idea how expensive divisions are.
A 132 MHz ARM is theoretically capable of performing 132 million instructions (or 264 million instructions if 50 % of them are shifts). But you are maxing the CPU out at about 70000 divisions per second. If your game runs at 70 frames per second, that means that you are using the processor to its maximum if you perform just 1,000 divisions for every frame you draw.
regards
ruben

Sorry it was my typo. Instead of typing "several cycles" I have mistyped as "several thousand cycles". On the side note I would like to provide following information, which will give you some kind of idea how expensive divisions are.
A 132 MHz ARM is theoretically capable of performing 132 million instructions (or 264 million instructions if 50 % of them are shifts). But you are maxing the CPU out at about 70000 divisions per second. If your game runs at 70 frames per second, that means that you are using the processor to its maximum if you perform just 1,000 divisions for every frame you draw.
regards
ruben

If you want it to be really fast, here's an assembly implementation that comes from a Gameboy Advance mailing list. Since the GBA also uses an ARM processor, the routine works on Brew phones just as well. Becasue it is written for the GBA, the assembler syntax may require some tweaking, though. I have not tested this code myself, but I thought I'd share it here, so that others can give it a shot.
Keep in mind however, that even with an optimized assembly implementation such as this, sqrt is a very slow process.
sqrt32 takes about 220-230 CPU clocks
sqrt16 takes about 130-140 CPU clocks
==========================================
//calculates the square root of number (0..0xffffffff)
u32 sqrt32(u32 number);
//calculates the square root of number (0..0xffff)
u32 sqrt16(u32 number);
==========================================
=========== sqrt32.s =====================
   .ARM
   .GLOBL sqrt32
   .TYPE sqrt32, function
@ fast 32 bit square root
@
@ (c) 2003, Vivid^Brainwave team
@
@ In:
@ R0 - argument (0..0xffffffff)
@ Output:
@ R0 - square root of
@
@ Used registers: R0 - R3
sqrt32:
  cmp R0, #0x00
  bxeq LR
   mov R3, #0x10
.sqrt32_0:
   cmp R0, #0x40000000
   movcc R0, R0, lsl #0x02
   subcc R3, R3, #1
   bcc .sqrt32_0
   movs R1, R0, lsl #0x10
   mov R0, R0, lsr #0x10
   mov R2, #0x00
   beq .sqrt32_2
.sqrt32_1:
   add R2, R2, #0x4000
   subs R0, R0, R2
   addcc R0, R0, R2
   sub R2, R2, #0x4000
   add R2, R2, R2
   addcs R2, R2, #0x10000
   mov R0, R0, lsl #0x02
   add R0, R0, R1, lsr #0x1e
   movs R1, R1, lsl #0x02
   beq .sqrt32_3
   subs R3, R3, #0x01
   bne .sqrt32_1
   mov R0, R2, lsr #0x10
   bx LR
.sqrt32_2:
   add R2, R2, #0x4000
   subs R0, R0, R2
   addcc R0, R0, R2
   sub R2, R2, #0x4000
   add R2, R2, R2
   addcs R2, R2, #0x10000
   mov R0, R0, lsl #0x02
.sqrt32_3:
   subs R3, R3, #0x01
   bne .sqrt32_2
   mov R0, R2, lsr #0x10
   bx LR
   .LTORG
==========================================
============= sqrt16.s ===================
   .ARM
   .GLOBL sqrt16
   .TYPE sqrt16, function
@ fast 16 bit square root
@
@ (c) 2003, Vivid^Brainwave team
@
@ In:
@ R0 - number (0..0xffff)
@ Output:
@ R0 - square root of
@
@ Used registers: R0 - R2
sqrt16:
   cmp R0, #0x00
   bxeq LR
   mov R2, #8
.sqrt16_0:
   cmp R0, #0x4000
   movcc R0, R0, lsl #0x02
   subcc R2, R2, #1
   bcc .sqrt16_0
   mov R1, #0x00
.sqrt16_1:
   add R1, R1, #0x4000
   subs R0, R0, R1
   addcc R0, R0, R1
   sub R1, R1, #0x4000
   add R1, R1, R1
   addcs R1, R1, #0x10000
   mov R0, R0, lsl #0x02
   subs R2, R2, #0x01
   bne .sqrt16_1
   mov R0, R1, lsr #0x10
   bx LR
   .LTORG
==========================================

If you want it to be really fast, here's an assembly implementation that comes from a Gameboy Advance mailing list. Since the GBA also uses an ARM processor, the routine works on Brew phones just as well. Becasue it is written for the GBA, the assembler syntax may require some tweaking, though. I have not tested this code myself, but I thought I'd share it here, so that others can give it a shot.
Keep in mind however, that even with an optimized assembly implementation such as this, sqrt is a very slow process.
sqrt32 takes about 220-230 CPU clocks
sqrt16 takes about 130-140 CPU clocks
==========================================
//calculates the square root of number (0..0xffffffff)
u32 sqrt32(u32 number);
//calculates the square root of number (0..0xffff)
u32 sqrt16(u32 number);
==========================================
=========== sqrt32.s =====================
   .ARM
   .GLOBL sqrt32
   .TYPE sqrt32, function
@ fast 32 bit square root
@
@ (c) 2003, Vivid^Brainwave team
@
@ In:
@ R0 - argument (0..0xffffffff)
@ Output:
@ R0 - square root of
@
@ Used registers: R0 - R3
sqrt32:
  cmp R0, #0x00
  bxeq LR
   mov R3, #0x10
.sqrt32_0:
   cmp R0, #0x40000000
   movcc R0, R0, lsl #0x02
   subcc R3, R3, #1
   bcc .sqrt32_0
   movs R1, R0, lsl #0x10
   mov R0, R0, lsr #0x10
   mov R2, #0x00
   beq .sqrt32_2
.sqrt32_1:
   add R2, R2, #0x4000
   subs R0, R0, R2
   addcc R0, R0, R2
   sub R2, R2, #0x4000
   add R2, R2, R2
   addcs R2, R2, #0x10000
   mov R0, R0, lsl #0x02
   add R0, R0, R1, lsr #0x1e
   movs R1, R1, lsl #0x02
   beq .sqrt32_3
   subs R3, R3, #0x01
   bne .sqrt32_1
   mov R0, R2, lsr #0x10
   bx LR
.sqrt32_2:
   add R2, R2, #0x4000
   subs R0, R0, R2
   addcc R0, R0, R2
   sub R2, R2, #0x4000
   add R2, R2, R2
   addcs R2, R2, #0x10000
   mov R0, R0, lsl #0x02
.sqrt32_3:
   subs R3, R3, #0x01
   bne .sqrt32_2
   mov R0, R2, lsr #0x10
   bx LR
   .LTORG
==========================================
============= sqrt16.s ===================
   .ARM
   .GLOBL sqrt16
   .TYPE sqrt16, function
@ fast 16 bit square root
@
@ (c) 2003, Vivid^Brainwave team
@
@ In:
@ R0 - number (0..0xffff)
@ Output:
@ R0 - square root of
@
@ Used registers: R0 - R2
sqrt16:
   cmp R0, #0x00
   bxeq LR
   mov R2, #8
.sqrt16_0:
   cmp R0, #0x4000
   movcc R0, R0, lsl #0x02
   subcc R2, R2, #1
   bcc .sqrt16_0
   mov R1, #0x00
.sqrt16_1:
   add R1, R1, #0x4000
   subs R0, R0, R1
   addcc R0, R0, R1
   sub R1, R1, #0x4000
   add R1, R1, R1
   addcs R1, R1, #0x10000
   mov R0, R0, lsl #0x02
   subs R2, R2, #0x01
   bne .sqrt16_1
   mov R0, R1, lsr #0x10
   bx LR
   .LTORG
==========================================

Does anyone know if the ARM Brew Builder optimizes out divides that can be replaced with shifts? Or should we be doing that manually?

Does anyone know if the ARM Brew Builder optimizes out divides that can be replaced with shifts? Or should we be doing that manually?

The ARM compiler definitely is smart enough to optimize divide and multiply by powers of 2. I know that it is also smart about optimizing multipies by other constant numbers (for example x*5 will be turned into a x*4+x). For these cases, I've looked at code output from the compiler to verify this. As for division by other constants, the compiler will link in a special divide routine for dividing by 10. Other than that, I am not sure what it does, but from taking a look at the online ARM documentation I referenced above, I would take an educated guess that it does not do it (see section 3.4 Division by a Constant).
-Aaron

The ARM compiler definitely is smart enough to optimize divide and multiply by powers of 2. I know that it is also smart about optimizing multipies by other constant numbers (for example x*5 will be turned into a x*4+x). For these cases, I've looked at code output from the compiler to verify this. As for division by other constants, the compiler will link in a special divide routine for dividing by 10. Other than that, I am not sure what it does, but from taking a look at the online ARM documentation I referenced above, I would take an educated guess that it does not do it (see section 3.4 Division by a Constant).
-Aaron

I ran a test using the -S option for armcc (which outputs an assembly file instead of an object file so you can see what your code gets compiled into) and it infact does only optimize a divide by a constant power of 2 or a constant 10.
-Aaron

I ran a test using the -S option for armcc (which outputs an assembly file instead of an object file so you can see what your code gets compiled into) and it infact does only optimize a divide by a constant power of 2 or a constant 10.
-Aaron

Great. I had a hunch this was going on because I had a bunch of divides by a multiple of 2 and stuff in some fairly frequently executed loops and the peformance was still good. I was just assuming any modern compiler would optimize this stuff out.

Great. I had a hunch this was going on because I had a bunch of divides by a multiple of 2 and stuff in some fairly frequently executed loops and the peformance was still good. I was just assuming any modern compiler would optimize this stuff out.

I've found an algorithm on the net for calculating sqrt :
float SquareRoot( float Number,float Accuracy)
{
float MaximumValue ;
float MinimumValue;
float CurrentGuess;
float CurrentGuessSquared ;
float CurrentError;
bool CloseEnough = FALSE;
if (Number > 1.0)
{
MaximumValue = Number;
MinimumValue = 0.0;
}
else
{
MaximumValue = 1.0;
MinimumValue = Number;
}
while (!CloseEnough)
{
CurrentGuess = (MaximumValue + MinimumValue) / 2.0;
CurrentGuessSquared = CurrentGuess * CurrentGuess;
CurrentError = abs( Number - CurrentGuessSquared);
if (CurrentError <= Accuracy)
CloseEnough = TRUE;
else
{
if (CurrentGuessSquared >= Number)
MaximumValue = CurrentGuess;
else
MinimumValue = CurrentGuess;
}
}
return CurrentGuess;
}
The acuracy is by example: 0.001
Germ

I've found an algorithm on the net for calculating sqrt :
float SquareRoot( float Number,float Accuracy)
{
float MaximumValue ;
float MinimumValue;
float CurrentGuess;
float CurrentGuessSquared ;
float CurrentError;
bool CloseEnough = FALSE;
if (Number > 1.0)
{
MaximumValue = Number;
MinimumValue = 0.0;
}
else
{
MaximumValue = 1.0;
MinimumValue = Number;
}
while (!CloseEnough)
{
CurrentGuess = (MaximumValue + MinimumValue) / 2.0;
CurrentGuessSquared = CurrentGuess * CurrentGuess;
CurrentError = abs( Number - CurrentGuessSquared);
if (CurrentError <= Accuracy)
CloseEnough = TRUE;
else
{
if (CurrentGuessSquared >= Number)
MaximumValue = CurrentGuess;
else
MinimumValue = CurrentGuess;
}
}
return CurrentGuess;
}
The acuracy is by example: 0.001
Germ

hi guys you can use the following macro
double FSQRT(double x)
Description
This function computes the square root of the floating point number x.
Prototypes
double FSQRT(double x)
Parameters
x: The number whose square root needs to be computed
Return Value
Returns the square root of x.
Comments
None
Side Effects
None
Version
Introduced BREW Client 2.1
Regards,

hi guys you can use the following macro
double FSQRT(double x)
Description
This function computes the square root of the floating point number x.
Prototypes
double FSQRT(double x)
Parameters
x: The number whose square root needs to be computed
Return Value
Returns the square root of x.
Comments
None
Side Effects
None
Version
Introduced BREW Client 2.1
Regards,

FADD is worked

FADD is worked