• Posts
  • RSS
  • ◂◂RSS
  • Contact

  • Diving into a floating point bug

    March 2nd, 2017
    tech  [html]
    Yesterday I refactored some python code that effectively changed:
       floor(n * (100 / 101))
    
    into:
       floor(n * Decimal(100/101))
    
    For most integers this doesn't change anything, but for multiples of 101 it gives you different answers:
      floor(101 * (100 / 101))       -> 100
      floor(101 * Decimal(100/101))  ->  99
    
    In python2, however, both give 100:
      floor(101 * (100.0 / 101))       -> 100
      floor(101 * Decimal(100.0/101))  -> 100
    
    What's going on? First, what's the difference between python2 and python3 here? One of the changes not listed in What's New In Python 3.0 is that floor now returns an int instead of a float:
    • 2.x: Return the floor of x as a float, the largest integer value less than or equal to x.
    • 3.x: Return the floor of x, the largest integer less than or equal to x. If x is not a float, delegates to x.__floor__(), which should return an Integral value.
    Since these are positive numbers, we can use int() instead of floor() and now python2 and python3 give the same result:
      2.x: int(101 * Decimal(100.0/101)) -> 99
      3.x: int(101 * Decimal(100/101))   -> 99
    
    The root of the problem is that 100/101 is a repeating decimal, 0.99009900..., and so can't be represented exactly as a Decimal or a float. To store it as a Decimal python defaults to 28 decimal digits and rounds it:
    > D(100)/D(101)
    Decimal('0.9900990099009900990099009901')
    
    Storing it as a float is more complicated. Python uses the platform's hardware support, which is IEEE 754 floating point bascially everywhere, and it gets converted to a binary fraction with 53 bits for the numerator and a power of two for the denominator:
      1114752383012499 / 2**50
    
    Python is aware that there are many numbers for which this is the closest possible floating point approximation, and so to be friendly it selectes the shortest decimal representation when printing it:
    > 100/101
    0.9900990099009901
    

    Now we have all the pieces to see why 101*Decimal(100/101) is less than 101*(100/101), and, critically, why one is just under 100 while the other is exactly 100.

    In the Decimal case python first divides 100 by 101 and gets 1114752383012499 / 2**50. Then it takes the closest representable Decimal to that number, which is Decimal('0.99009900990099009021605525049380958080291748046875'). That's 50 bits of precision because Decimal tries to track and preseve precision in its calculations, and 53 bits of precision is 50 decimal digits of precision. [Update 2017-03-02: this is wrong: 53 bits of precision much less than 50 decimal digits. I'm not sure why Decimal is doing this.] All of the digits starting with 02160... are just noise, but Decimal has no way of knowing that. Then when we multiply by 101 we get Decimal('99.999...58030'), because our Decimal version of the float was slightly less than true 100/101.

    On the other hand, when python gets 101 * (100/101) it stays in IEEE 754 land. It multiplies 101 by 1114752383012499 / 2**50, and the closest float to that is exactly 100.

    I ran into this bug because I had been trying to break a large change into two behavior-preserving changes, first switching some code to use Decimal and then switching the rest. To fix the bug, I combined the two changes into one, so that we switched from using float and int to using Decimal throughout this module.

    Comment via: google plus, facebook, hacker news

    Recent posts on blogs I like:

    Collections: Clothing, How Did They Make It? Part I: High Fiber

    This week we are starting the first of a four (?) part look at pre-modern textile production. As with our series on farming and iron, we are going to follow the sequence of production from the growing of fibers all the way to the finished object, with a f…

    via A Collection of Unmitigated Pedantry March 5, 2021

    Austerity is Inefficient

    Working on an emergency timetable for regional rail has made it clear how an environment of austerity requires tradeoffs that reduce efficiency. I already talked about how the Swiss electronics before concrete slogan is not about not spending money but ab…

    via Pedestrian Observations February 27, 2021

    The Troubling Ethics of Writing (A Speech from Ancient Sumer)

    (Translated from a transcript of an ancient Sumerian speech by Uruk's most well-respected Scriptological Ethicist) Writing is a profoundly dangerous technology: Access to writing was initially, and still remains, uneven. What's worse, the rich are m…

    via BLOG - Cullen O'Keefe February 15, 2021

    more     (via openring)


  • Posts
  • RSS
  • ◂◂RSS
  • Contact