%PDF- %PDF-
Mini Shell

Mini Shell

Direktori : /proc/thread-self/root/usr/share/doc/imath-devel/html/
Upload File :
Create Path :
Current File : //proc/thread-self/root/usr/share/doc/imath-devel/html/float.html


<!doctype html>

<html>
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Floating Point Representation &#8212; Imath Documentation</title>
    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
    <link rel="stylesheet" href="_static/bizstyle.css" type="text/css" />
    
    <script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
    <script src="_static/jquery.js"></script>
    <script src="_static/underscore.js"></script>
    <script src="_static/doctools.js"></script>
    <script async="async" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/latest.js?config=TeX-AMS-MML_HTMLorMML"></script>
    <script src="_static/bizstyle.js"></script>
    <link rel="index" title="Index" href="genindex.html" />
    <link rel="search" title="Search" href="search.html" />
    <link rel="next" title="Box" href="classes/Box.html" />
    <link rel="prev" title="half-float Conversion Configuration Options" href="half_conversion.html" />
    <meta name="viewport" content="width=device-width,initial-scale=1.0" />
    <!--[if lt IE 9]>
    <script src="_static/css3-mediaqueries.js"></script>
    <![endif]-->
  </head><body>
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li class="right" >
          <a href="classes/Box.html" title="Box"
             accesskey="N">next</a> |</li>
        <li class="right" >
          <a href="half_conversion.html" title="half-float Conversion Configuration Options"
             accesskey="P">previous</a> |</li>
        <li class="nav-item nav-item-0"><a href="index.html">Imath</a> &#187;</li>
        <li class="nav-item nav-item-this"><a href="">Floating Point Representation</a></li> 
      </ul>
    </div>  

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body" role="main">
            
  <div class="section" id="floating-point-representation">
<h1>Floating Point Representation<a class="headerlink" href="#floating-point-representation" title="Permalink to this headline">¶</a></h1>
<p><strong>Representation of a 32-bit float:</strong></p>
<p>We assume that a float, f, is an IEEE 754 single-precision floating point number, whose bits are arranged as follows: <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="mi">31</span> <span class="p">(</span><span class="n">msb</span><span class="p">)</span>
<span class="o">|</span>
<span class="o">|</span> <span class="mi">30</span>     <span class="mi">23</span>
<span class="o">|</span> <span class="o">|</span>      <span class="o">|</span>
<span class="o">|</span> <span class="o">|</span>      <span class="o">|</span> <span class="mi">22</span>                    <span class="mi">0</span> <span class="p">(</span><span class="n">lsb</span><span class="p">)</span>
<span class="o">|</span> <span class="o">|</span>      <span class="o">|</span> <span class="o">|</span>                     <span class="o">|</span>
<span class="n">X</span> <span class="n">XXXXXXXX</span> <span class="n">XXXXXXXXXXXXXXXXXXXXXXX</span>

<span class="n">s</span> <span class="n">e</span>        <span class="n">m</span>
</pre></div>
</div>
 S is the sign-bit, e is the exponent and m is the significand.</p>
<p>If e is between 1 and 254, f is a normalized number: <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>        <span class="n">s</span>    <span class="n">e</span><span class="o">-</span><span class="mi">127</span>
<span class="n">f</span> <span class="o">=</span> <span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>  <span class="o">*</span> <span class="mi">2</span>      <span class="o">*</span> <span class="mf">1.</span><span class="n">m</span>
</pre></div>
</div>
 If e is 0, and m is not zero, f is a denormalized number: <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>        <span class="n">s</span>    <span class="o">-</span><span class="mi">126</span>
<span class="n">f</span> <span class="o">=</span> <span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>  <span class="o">*</span> <span class="mi">2</span>      <span class="o">*</span> <span class="mf">0.</span><span class="n">m</span>
</pre></div>
</div>
 If e and m are both zero, f is zero: <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">f</span> <span class="o">=</span> <span class="mf">0.0</span>
</pre></div>
</div>
 If e is 255, f is an “infinity” or “not a number” (NAN), depending on whether m is zero or not.</p>
<p>Examples: <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="mi">0</span> <span class="mi">00000000</span> <span class="mi">00000000000000000000000</span> <span class="o">=</span> <span class="mf">0.0</span>
<span class="mi">0</span> <span class="mi">01111110</span> <span class="mi">00000000000000000000000</span> <span class="o">=</span> <span class="mf">0.5</span>
<span class="mi">0</span> <span class="mi">01111111</span> <span class="mi">00000000000000000000000</span> <span class="o">=</span> <span class="mf">1.0</span>
<span class="mi">0</span> <span class="mi">10000000</span> <span class="mi">00000000000000000000000</span> <span class="o">=</span> <span class="mf">2.0</span>
<span class="mi">0</span> <span class="mi">10000000</span> <span class="mi">10000000000000000000000</span> <span class="o">=</span> <span class="mf">3.0</span>
<span class="mi">1</span> <span class="mi">10000101</span> <span class="mi">11110000010000000000000</span> <span class="o">=</span> <span class="o">-</span><span class="mf">124.0625</span>
<span class="mi">0</span> <span class="mi">11111111</span> <span class="mi">00000000000000000000000</span> <span class="o">=</span> <span class="o">+</span><span class="n">infinity</span>
<span class="mi">1</span> <span class="mi">11111111</span> <span class="mi">00000000000000000000000</span> <span class="o">=</span> <span class="o">-</span><span class="n">infinity</span>
<span class="mi">0</span> <span class="mi">11111111</span> <span class="mi">10000000000000000000000</span> <span class="o">=</span> <span class="n">NAN</span>
<span class="mi">1</span> <span class="mi">11111111</span> <span class="mi">11111111111111111111111</span> <span class="o">=</span> <span class="n">NAN</span>
</pre></div>
</div>
 <strong>Representation of a 16-bit half:</strong></p>
<p>Here is the bit-layout for a half number, h: <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="mi">15</span> <span class="p">(</span><span class="n">msb</span><span class="p">)</span>
<span class="o">|</span>
<span class="o">|</span> <span class="mi">14</span>  <span class="mi">10</span>
<span class="o">|</span> <span class="o">|</span>   <span class="o">|</span>
<span class="o">|</span> <span class="o">|</span>   <span class="o">|</span> <span class="mi">9</span>        <span class="mi">0</span> <span class="p">(</span><span class="n">lsb</span><span class="p">)</span>
<span class="o">|</span> <span class="o">|</span>   <span class="o">|</span> <span class="o">|</span>        <span class="o">|</span>
<span class="n">X</span> <span class="n">XXXXX</span> <span class="n">XXXXXXXXXX</span>

<span class="n">s</span> <span class="n">e</span>     <span class="n">m</span>
</pre></div>
</div>
 S is the sign-bit, e is the exponent and m is the significand.</p>
<p>If e is between 1 and 30, h is a normalized number: <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>        <span class="n">s</span>    <span class="n">e</span><span class="o">-</span><span class="mi">15</span>
<span class="n">h</span> <span class="o">=</span> <span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>  <span class="o">*</span> <span class="mi">2</span>     <span class="o">*</span> <span class="mf">1.</span><span class="n">m</span>
</pre></div>
</div>
 If e is 0, and m is not zero, h is a denormalized number: <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>        <span class="n">S</span>    <span class="o">-</span><span class="mi">14</span>
<span class="n">h</span> <span class="o">=</span> <span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>  <span class="o">*</span> <span class="mi">2</span>     <span class="o">*</span> <span class="mf">0.</span><span class="n">m</span>
</pre></div>
</div>
 If e and m are both zero, h is zero: <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">h</span> <span class="o">=</span> <span class="mf">0.0</span>
</pre></div>
</div>
 If e is 31, h is an “infinity” or “not a number” (NAN), depending on whether m is zero or not.</p>
<p>Examples: <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="mi">0</span> <span class="mi">00000</span> <span class="mi">0000000000</span> <span class="o">=</span> <span class="mf">0.0</span>
<span class="mi">0</span> <span class="mi">01110</span> <span class="mi">0000000000</span> <span class="o">=</span> <span class="mf">0.5</span>
<span class="mi">0</span> <span class="mi">01111</span> <span class="mi">0000000000</span> <span class="o">=</span> <span class="mf">1.0</span>
<span class="mi">0</span> <span class="mi">10000</span> <span class="mi">0000000000</span> <span class="o">=</span> <span class="mf">2.0</span>
<span class="mi">0</span> <span class="mi">10000</span> <span class="mi">1000000000</span> <span class="o">=</span> <span class="mf">3.0</span>
<span class="mi">1</span> <span class="mi">10101</span> <span class="mi">1111000001</span> <span class="o">=</span> <span class="o">-</span><span class="mf">124.0625</span>
<span class="mi">0</span> <span class="mi">11111</span> <span class="mi">0000000000</span> <span class="o">=</span> <span class="o">+</span><span class="n">infinity</span>
<span class="mi">1</span> <span class="mi">11111</span> <span class="mi">0000000000</span> <span class="o">=</span> <span class="o">-</span><span class="n">infinity</span>
<span class="mi">0</span> <span class="mi">11111</span> <span class="mi">1000000000</span> <span class="o">=</span> <span class="n">NAN</span>
<span class="mi">1</span> <span class="mi">11111</span> <span class="mi">1111111111</span> <span class="o">=</span> <span class="n">NAN</span>
</pre></div>
</div>
 <strong>Conversion via Lookup Table:</strong></p>
<p>Converting from half to float is performed by default using a lookup table. There are only 65,536 different half numbers; each of these numbers has been converted and stored in a table pointed to by the <code class="docutils literal notranslate"><span class="pre">imath_half_to_float_table</span></code> pointer.</p>
<p>Prior to Imath v3.1, conversion from float to half was accomplished with the help of an exponent look table, but this is now replaced with explicit bit shifting.</p>
<p><strong>Conversion via Hardware:</strong></p>
<p>For Imath v3.1, the conversion routines have been extended to use F16C SSE instructions whenever present and enabled by compiler flags.</p>
<p><strong>Conversion via Bit-Shifting</strong></p>
<p>If F16C SSE instructions are not available, conversion can be accomplished by a bit-shifting algorithm. For half-to-float conversion, this is generally slower than the lookup table, but it may be preferable when memory limits preclude storing of the 65,536-entry lookup table.</p>
<p>The lookup table symbol is included in the compilation even if <code class="docutils literal notranslate"><span class="pre">IMATH_HALF_USE_LOOKUP_TABLE</span></code> is false, because application code using the exported <code class="docutils literal notranslate"><span class="pre">half.h</span></code> may choose to enable the use of the table.</p>
<p>An implementation can eliminate the table from compilation by defining the <code class="docutils literal notranslate"><span class="pre">IMATH_HALF_NO_LOOKUP_TABLE</span></code> preprocessor symbol. Simply add: <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="c1">#define IMATH_HALF_NO_LOOKUP_TABLE</span>
</pre></div>
</div>
 before including <code class="docutils literal notranslate"><span class="pre">half.h</span></code>, or define the symbol on the compile command line.</p>
<p>Furthermore, an implementation wishing to receive <code class="docutils literal notranslate"><span class="pre">FE_OVERFLOW</span></code> and <code class="docutils literal notranslate"><span class="pre">FE_UNDERFLOW</span></code> floating point exceptions when converting float to half by the bit-shift algorithm can define the preprocessor symbol <code class="docutils literal notranslate"><span class="pre">IMATH_HALF_ENABLE_FP_EXCEPTIONS</span></code> prior to including <code class="docutils literal notranslate"><span class="pre">half.h</span></code>: <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="c1">#define IMATH_HALF_ENABLE_FP_EXCEPTIONS</span>
</pre></div>
</div>
 <strong>Conversion Performance Comparison:</strong></p>
<p>Testing on a Core i9, the timings are approximately:</p>
<p>half to float<ul class="simple">
<li><p>table: 0.71 ns / call</p></li>
<li><p>no table: 1.06 ns / call</p></li>
<li><p>f16c: 0.45 ns / call</p></li>
</ul>
</p>
<p>float-to-half:<ul class="simple">
<li><p>original: 5.2 ns / call</p></li>
<li><p>no exp table + opt: 1.27 ns / call</p></li>
<li><p>f16c: 0.45 ns / call</p></li>
</ul>
</p>
<p><strong>Note:</strong> the timing above depends on the distribution of the floats in question. </p>
</div>


            <div class="clearer"></div>
          </div>
        </div>
      </div>
      <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
        <div class="sphinxsidebarwrapper">
            <p class="logo"><a href="index.html">
              <img class="logo" src="_static/imath-logo-blue.png" alt="Logo"/>
            </a></p>
  <h4>Previous topic</h4>
  <p class="topless"><a href="half_conversion.html"
                        title="previous chapter">half-float Conversion Configuration Options</a></p>
  <h4>Next topic</h4>
  <p class="topless"><a href="classes/Box.html"
                        title="next chapter">Box</a></p>
  <div role="note" aria-label="source link">
    <h3>This Page</h3>
    <ul class="this-page-menu">
      <li><a href="_sources/float.rst.txt"
            rel="nofollow">Show Source</a></li>
    </ul>
   </div>
<div id="searchbox" style="display: none" role="search">
  <h3 id="searchlabel">Quick search</h3>
    <div class="searchformwrapper">
    <form class="search" action="search.html" method="get">
      <input type="text" name="q" aria-labelledby="searchlabel" />
      <input type="submit" value="Go" />
    </form>
    </div>
</div>
<script>$('#searchbox').show(0);</script>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             >index</a></li>
        <li class="right" >
          <a href="classes/Box.html" title="Box"
             >next</a> |</li>
        <li class="right" >
          <a href="half_conversion.html" title="half-float Conversion Configuration Options"
             >previous</a> |</li>
        <li class="nav-item nav-item-0"><a href="index.html">Imath</a> &#187;</li>
        <li class="nav-item nav-item-this"><a href="">Floating Point Representation</a></li> 
      </ul>
    </div>
    <div class="footer" role="contentinfo">
        &#169; Copyright 2021, Contributors to the OpenEXR Project.
      Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.4.3.
    </div>
  </body>
</html>

Zerion Mini Shell 1.0