java float有效位数_如何在Java中将浮点数的有效位数截断为任意精度？ [重复]

假设 x 是您希望降低精度的数字， bits 是您希望保留的有效位数 .

当 bits 足够大且 x 的数量级足够接近0时， x * (1L << (bits - Math.getExponent(x))) 将缩放 x ，以便需要移除的位将出现在小数分量中(在小数点之后)，而位将是retain将出现在整数组件中(小数点之前) . 然后，您可以对此进行舍入以删除小数分量，然后将舍入的数字除以 (1L << (bits - Math.getExponent(x))) ，以恢复 x 的数量级，即：

public static double reducePrecision(double x, int bits) {

int exponent = bits - Math.getExponent(x);

return Math.round(x * (1L << exponent)) / (1L << exponent);

}

但是， (1L << exponent) 将在 Math.getExponent(x) > bits || Math.getExponent(x) < bits - 62 时崩溃 . 解决方案是使用 Math.pow(2, exponent) (或来自this answer的快速 pow2(exponent) 实现)来计算2的分数或非常大的幂，即：

public static double reducePrecision(double x, int bits) {

int exponent = bits - Math.getExponent(x);

return Math.round(x * Math.pow(2, exponent)) * Math.pow(2, -exponent);

}

但是， Math.pow(2, exponent) 将在 exponent 接近-1074或1023时崩溃 . 解决方案是使用 Math.scalb(x, exponent) ，以便不必明确计算2的幂，即：

public static double reducePrecision(double x, int bits) {

int exponent = bits - Math.getExponent(x);

return Math.scalb(Math.round(Math.scalb(x, exponent)), -exponent);

}

但是， Math.round(y) 返回 long ，因此它不会保留 Infinity ， NaN ，以及 Math.abs(x) > Long.MAX_VALUE / Math.pow(2, exponent) 的情况 . 此外， Math.round(y) 总是将关系舍入到正无穷大(例如 Math.round(0.5) == 1 && Math.round(1.5) == 2 ) . 解决方案是使用 Math.rint(y) 来接收 double 并保留无偏的IEEE 754舍入到最接近的规则(例如 Math.rint(0.5) == 0.0 && Math.rint(1.5) == 2.0 )，即：

public static double reducePrecision(double x, int bits) {

int exponent = bits - Math.getExponent(x);

return Math.scalb(Math.rint(Math.scalb(x, exponent)), -exponent);

}

最后，这是一个确认我们期望的单元测试：

public static String decompose(double d) {

int SIGN_WIDTH = 1;

int EXP_WIDTH = 11;

int SIGNIFICAND_WIDTH = 53;

String s = String.format("%64s", Long.toBinaryString(Double.doubleToRawLongBits(d))).replace(' ', '0');

return s.substring(0, 0 + SIGN_WIDTH) + " "

+ s.substring(0 + SIGN_WIDTH, 0 + SIGN_WIDTH + EXP_WIDTH) + " "

+ s.substring(0 + SIGN_WIDTH + EXP_WIDTH, 0 + SIGN_WIDTH + EXP_WIDTH + SIGNIFICAND_WIDTH - 1);

}

public static void test() {

// Use a fixed seed so the generated numbers are reproducible.

java.util.Random r = new java.util.Random(0);

// Generate a floating point number that makes use of its full 52 bits of significand precision.

double a = r.nextDouble() * 100;

System.out.println(decompose(a) + " " + a);

Assert.assertFalse(decompose(a).split(" ")[2].substring(23).equals(String.format("%0" + (52 - 23) + "d", 0)));

// Cast the double to a float to produce a "ground truth" of precision loss to compare against.

double b = (float) a;

System.out.println(decompose(b) + " " + b);

Assert.assertTrue(decompose(b).split(" ")[2].substring(23).equals(String.format("%0" + (52 - 23) + "d", 0)));

// 32-bit float has a 23 bit significand, so c's bit pattern should be identical to b's bit pattern.

double c = reducePrecision(a, 23);

System.out.println(decompose(c) + " " + c);

Assert.assertTrue(b == c);

// 23rd-most significant bit in c is 1, so rounding it to the 22nd-most significant bit requires breaking a tie.

// Since 22nd-most significant bit in c is 0, d will be rounded down so that its 22nd-most significant bit remains 0.

double d = reducePrecision(c, 22);

System.out.println(decompose(d) + " " + d);

Assert.assertTrue(decompose(d).split(" ")[2].substring(22).equals(String.format("%0" + (52 - 22) + "d", 0)));

Assert.assertTrue(decompose(c).split(" ")[2].charAt(22) == '1' && decompose(c).split(" ")[2].charAt(21) == '0');

Assert.assertTrue(decompose(d).split(" ")[2].charAt(21) == '0');

// 21st-most significant bit in d is 1, so rounding it to the 20th-most significant bit requires breaking a tie.

// Since 20th-most significant bit in d is 1, e will be rounded up so that its 20th-most significant bit becomes 0.

double e = reducePrecision(c, 20);

System.out.println(decompose(e) + " " + e);

Assert.assertTrue(decompose(e).split(" ")[2].substring(20).equals(String.format("%0" + (52 - 20) + "d", 0)));

Assert.assertTrue(decompose(d).split(" ")[2].charAt(20) == '1' && decompose(d).split(" ")[2].charAt(19) == '1');

Assert.assertTrue(decompose(e).split(" ")[2].charAt(19) == '0');

// Reduce the precision of a number close to the largest normal number.

double f = reducePrecision(a * 0x1p+1017, 23);

System.out.println(decompose(f) + " " + f);

// Reduce the precision of a number close to the smallest normal number.

double g = reducePrecision(a * 0x1p-1028, 23);

System.out.println(decompose(g) + " " + g);

// Reduce the precision of a number close to the smallest subnormal number.

double h = reducePrecision(a * 0x1p-1051, 23);

System.out.println(decompose(h) + " " + h);

}

它的输出：

0 10000000101 0010010001100011000110011111011100100100111000111011 73.0967787376657

0 10000000101 0010010001100011000110100000000000000000000000000000 73.0967788696289

0 10000000101 0010010001100011000110000000000000000000000000000000 73.09677124023438

0 10000000101 0010010001100011001000000000000000000000000000000000 73.0968017578125

0 11111111110 0010010001100011000110100000000000000000000000000000 1.0266060746443803E308

0 00000000001 0010010001100011000110100000000000000000000000000000 2.541339559435826E-308

0 00000000000 0000000000000000000000100000000000000000000000000000 2.652494739E-315