the arm compiler can lift long->vlong casts on multiplcation and convert 64x64->64 multiplication into a 32x32->64 one with optional 64 bit accumulate.