# ULP Accuracy Measurement Tool

Standalone tool for measuring the accuracy of Eigen's vectorized math functions
in units of ULP (Unit in the Last Place). Compares Eigen's SIMD implementations
against either MPFR (128-bit high-precision reference) or the standard C++ math
library.

## Building

From the Eigen build directory:

```bash
cd build
cmake ..
cmake --build . --target ulp_accuracy
```

If MPFR and GMP are installed, the build automatically enables MPFR support
(`EIGEN_HAS_MPFR`). Without them, only `--ref=std` is available.

### Installing MPFR (Debian/Ubuntu)

```bash
sudo apt install libmpfr-dev libgmp-dev
```

## Usage

```
./test/ulp_accuracy [options]

Options:
  --func=NAME    Function to test (required unless --list)
  --lo=VAL       Start of range (default: -inf)
  --hi=VAL       End of range (default: +inf)
  --double       Test double precision (default: float)
  --step=EPS     Sampling step: advance by (1+EPS)*nextafter(x)
                 (default: 0 = exhaustive; useful for double, e.g. 1e-6)
  --threads=N    Number of threads (default: all cores)
  --batch=N      Batch size for Eigen eval (default: 4096)
  --ref=MODE     Reference: 'std' (default) or 'mpfr'
  --hist_width=N Histogram half-width in ULPs (default: 10)
  --list         List available functions
```

## Examples

List all supported functions:
```bash
./test/ulp_accuracy --list
```

Exhaustive float test of sin against std (tests all ~4.28 billion finite floats):
```bash
./test/ulp_accuracy --func=sin
```

Float test against MPFR (more accurate reference, but slower):
```bash
./test/ulp_accuracy --func=sin --ref=mpfr
```

Double precision test with geometric sampling (exhaustive is impractical for double):
```bash
./test/ulp_accuracy --func=exp --double --step=1e-6
```

Test a specific range:
```bash
./test/ulp_accuracy --func=sin --lo=0 --hi=6.2832
```

## Output

The tool prints:

- **Test configuration**: function, range, reference mode, thread count
- **Max |ULP error|**: worst-case absolute ULP error with the offending input value
- **Mean |ULP error|**: average absolute ULP error across all tested values
- **Signed ULP histogram**: distribution of signed errors showing bias direction

Example output:
```
Function: sin (float)
Range: [-inf, inf]
Representable values in range: 4278190082
Reference: MPFR (128-bit)
Threads: 32
Batch size: 4096

Results:
  Values tested: 4278190081
  Time: 529.04 seconds (8.1 Mvalues/s)
  Max |ULP error|: 2
    at x = -1.5413464e+38 (Eigen=-0.482218683, ref=-0.482218742)
  Mean |ULP error|: 0.0874

Signed ULP error histogram [-10, +10]:
  -2   :        51988 (  0.001%)
  -1   :    186805349 (  4.366%)
  0    :   3904475407 ( 91.265%)
  1    :    186805349 (  4.366%)
  2    :        51988 (  0.001%)
```

## How it works

1. **Range splitting**: The input range is divided evenly across threads by
   splitting the linear ULP space.

2. **Batched evaluation**: Each thread fills batches of input values, evaluates
   them through Eigen's vectorized path (using `Eigen::Array` operations), and
   computes reference values one at a time.

3. **ULP computation**: IEEE 754 bit patterns are mapped to a linear integer
   scale where adjacent representable values are adjacent integers. The signed
   ULP error is the difference between Eigen's result and the reference on this
   scale. Special cases (NaN, infinity mismatches) report infinite error.

4. **Result reduction**: Per-thread statistics (max error, mean error, histogram)
   are merged after all threads complete.

## Supported functions

| Category | Functions |
|----------|-----------|
| Trigonometric | sin, cos, tan, asin, acos, atan |
| Hyperbolic | sinh, cosh, tanh, asinh, acosh, atanh |
| Exponential/Log | exp, exp2, expm1, log, log1p, log10, log2 |
| Error/Gamma | erf, erfc, lgamma |
| Other | logistic, sqrt, cbrt, rsqrt |

## File organization

- `ulp_accuracy.cpp` — Main tool: ULP computation, worker threads, CLI, result printing
- `mpfr_reference.h` — MPFR reference function wrappers and scalar conversion helpers

## Performance tips

- Float exhaustive sweeps test ~4.28 billion values. With `--ref=std` this takes
  ~50 seconds per function; with `--ref=mpfr` it takes ~500 seconds (10x slower).
- For double precision, exhaustive testing is impractical. Use `--step=1e-6` to
  sample ~2.88 billion values geometrically.
- Thread count defaults to all available cores. MPFR is the bottleneck (single
  MPFR call per value per thread), so more cores help significantly.