Optimizing a Potential Model to Acceleration Data

This tutorial covers how to fit gravitational potential models to pulsar acceleration data using peebee’s optimization framework. We’ll explore different optimization algorithms, interpret results, and handle real observational challenges.

Overview

Parameter inference in peebee follows a standard workflow:

  1. Set up the model: Define the gravitational potential to fit

  2. Load the data: Provide pulsar positions, accelerations, and uncertainties

  3. Configure optimization: Set parameter bounds, choose algorithms, add noise models

  4. Run fitting: Execute the optimization and obtain best-fit parameters

  5. Interpret results: Analyze goodness-of-fit, uncertainties, and systematic effects

The optimize module provides the Fitter class that coordinates this process.

Basic Optimization Workflow

Let’s start with a complete example using synthetic data.

Setting Up the Fitter

import numpy as np
from peebee import optimize, models, sampling
from peebee.models import NFW, MiyamotoNagaiDisk
from peebee.noise import GaussianNoise

# Create test model and generate synthetic data
true_halo = NFW(m_vir=1.2e12, r_s=18.0)
true_disk = MiyamotoNagaiDisk(m_tot=5e10, a=3.0, b=0.3)
true_model = true_halo + true_disk

# Generate mock observations
np.random.seed(42)
bounds = np.array([[4.0, 12.0], [-4.0, 4.0], [-1.0, 1.0]])
x_obs, y_obs, z_obs, alos_true = sampling.sample_alos_sources_uniform(
    true_model, 40, bounds
)

# Add realistic noise
alos_err = 0.1 * np.abs(alos_true)  # 10% uncertainties
alos_obs = alos_true + np.random.normal(0, alos_err)

print(f"Generated {len(alos_obs)} mock observations")
print(f"Acceleration range: [{np.min(alos_obs):.2f}, {np.max(alos_obs):.2f}] mm/s/yr")
print(f"Mean uncertainty: {np.mean(alos_err):.3f} mm/s/yr")

The Basic Fitter Workflow

# Step 1: Create fitter instance
fitter = optimize.Fitter()

# Step 2: Set up the model to fit (use same structure as true model)
fit_model = NFW(m_vir=8e11, r_s=15.0) + MiyamotoNagaiDisk(m_tot=4e10, a=2.5, b=0.25)
fitter.set_model(fit_model)

# Step 3: Load observational data
fitter.set_data(x_obs, y_obs, z_obs, alos_obs, alos_err, frame='cart')

# Step 4: Configure noise model
fitter.set_noise_model(GaussianNoise(sigma=1.0))

# Step 5: Set parameter bounds
fitter.configure_params({
    'NFW.m_vir': (5e11, 2e12),           # virial mass range
    'NFW.r_s': (5.0, 40.0),              # scale radius range
    'Miyamoto-Nagai Disk.m_tot': (1e10, 1e11),  # disk mass range
    'Miyamoto-Nagai Disk.a': (1.0, 5.0), # disk scale length
    'Miyamoto-Nagai Disk.b': (0.1, 1.0), # disk scale height
    'noise.sigma': (0.1, 10.0)           # noise parameter
})

print("Configured parameters:")
for param, bounds in fitter.param_bounds.items():
    current = fit_model.params.get(param, fitter.noise_model.params.get(param.split('.')[-1], 'N/A'))
    print(f"  {param}: [{bounds[0]:.2e}, {bounds[1]:.2e}] (current: {current:.2e})")

Running the Optimization

# Run optimization with gradient descent
print("Running optimization...")
results = fitter.optimize(method='gradient_descent')

print(f"\nOptimization completed!")
print(f"Reduced χ²: {results.reduced_chi2:.3f}")
print(f"AIC: {results.aic:.1f}")

print("\nBest-fit parameters:")
for param, value in results.best_fit_params.items():
    if param in fitter.param_bounds:  # only show fitted parameters
        true_val = true_model.params.get(param, 'N/A')
        if isinstance(true_val, (int, float)):
            recovery = (value - true_val) / true_val * 100
            print(f"  {param}: {value:.2e} (true: {true_val:.2e}, recovery: {recovery:+5.1f}%)")
        else:
            print(f"  {param}: {value:.2e}")

Different Optimization Algorithms

Peebee supports several optimization methods for different use cases.

Gradient Descent

Fast local optimization, good when you have reasonable starting values:

# Gradient-based optimization (scipy's minimize)
results_gd = fitter.optimize(method='gradient_descent')
print(f"Gradient descent: χ² = {results_gd.reduced_chi2:.3f}, "
      f"time = {results_gd.optimization_time:.2f}s")

Differential Evolution

Global optimization, robust to initial conditions but slower:

# Global optimization with differential evolution
results_de = fitter.optimize(method='differential_evolution')
print(f"Differential evolution: χ² = {results_de.reduced_chi2:.3f}, "
      f"time = {results_de.optimization_time:.2f}s")

Least Squares with Loss Functions

Robust fitting for data with outliers:

# Least squares with Huber loss (robust to outliers)
results_huber = fitter.optimize(method='least_squares', loss='huber')
print(f"Huber robust fit: χ² = {results_huber.reduced_chi2:.3f}")

# Compare parameter differences
print("\nParameter comparison:")
print("Parameter                   Gradient   DiffEvol   Huber")
print("-" * 55)
for param in ['NFW.m_vir', 'NFW.r_s', 'Miyamoto-Nagai Disk.m_tot']:
    gd_val = results_gd.best_fit_params[param]
    de_val = results_de.best_fit_params[param]
    huber_val = results_huber.best_fit_params[param]
    print(f"{param:25s} {gd_val:9.2e} {de_val:9.2e} {huber_val:9.2e}")

Working with Real Data

Real pulsar data presents additional challenges.

Loading Real Data

import pandas as pd

# Example: Create realistic mock data that mimics real observations
# In practice, you would load this from a file
real_data = {
    'pulsar': ['J1713+0747', 'J1909-3744', 'J0437-4715', 'J1744-1134', 'J2145-0750'],
    'l': [16.33, 359.93, 253.40, 10.87, 74.87],      # Galactic longitude
    'b': [24.98, -29.87, -42.28, 9.19, -32.38],     # Galactic latitude
    'd': [1.15, 1.14, 0.156, 0.36, 0.50],           # Distance (kpc)
    'alos_obs': [1.23, -0.87, 2.45, -1.56, 0.89],   # Observed acceleration (mm/s/yr)
    'alos_err': [0.15, 0.12, 0.18, 0.24, 0.21],     # Uncertainty (mm/s/yr)
    'timing_years': [15, 12, 20, 8, 10]             # Years of timing data
}

df = pd.DataFrame(real_data)

print("Real pulsar dataset:")
print(df[['pulsar', 'l', 'b', 'd', 'alos_obs', 'alos_err']].round(2))

Data Quality Assessment

# Analyze data quality before fitting

# Signal-to-noise ratios
snr = np.abs(df['alos_obs']) / df['alos_err']
print(f"\nData quality assessment:")
print(f"  Number of pulsars: {len(df)}")
print(f"  Distance range: [{df['d'].min():.2f}, {df['d'].max():.2f}] kpc")
print(f"  S/N range: [{snr.min():.1f}, {snr.max():.1f}]")
print(f"  Detections (>2σ): {np.sum(snr > 2)}/{len(df)}")

# Check for outliers
median_acc = np.median(np.abs(df['alos_obs']))
outliers = np.abs(df['alos_obs']) > 3 * median_acc
print(f"  Potential outliers: {np.sum(outliers)} pulsars")

if np.any(outliers):
    print("  Outlier pulsars:")
    for i in np.where(outliers)[0]:
        print(f"    {df.iloc[i]['pulsar']}: {df.iloc[i]['alos_obs']:+.2f} ± {df.iloc[i]['alos_err']:.2f}")

Fitting to Real Data

# Set up fitter for real data
real_fitter = optimize.Fitter()

# Use a more complex model for real data
real_model = NFW(m_vir=1e12, r_s=20.0) + MiyamotoNagaiDisk(m_tot=5e10, a=3.0, b=0.3)
real_fitter.set_model(real_model)

# Load real data
real_fitter.set_data(
    df['l'], df['b'], df['d'],
    df['alos_obs'], df['alos_err'],
    frame='gal'
)

# Use robust noise model for potential outliers
from peebee.noise import LorentzNoise
real_fitter.set_noise_model(LorentzNoise(gamma=1.0))

# Configure parameters with physically motivated priors
real_fitter.configure_params({
    'NFW.m_vir': (5e11, 2e12),
    'NFW.r_s': (10.0, 30.0),
    'Miyamoto-Nagai Disk.m_tot': (2e10, 1e11),
    'Miyamoto-Nagai Disk.a': (2.0, 4.0),
    'Miyamoto-Nagai Disk.b': (0.1, 0.5),
    'noise.gamma': (0.1, 5.0)
})

# Run optimization
real_results = real_fitter.optimize(method='differential_evolution')

print(f"\nReal data fit results:")
print(f"Reduced χ²: {real_results.reduced_chi2:.3f}")
print(f"AIC: {real_results.aic:.1f}")

Parameter Constraints and Uncertainties

Understanding parameter uncertainties is crucial for scientific interpretation.

Hessian-Based Uncertainties

# Extract uncertainties from optimization
print("\nParameter uncertainties (from Hessian):")
print("Parameter                    Value         ±1σ Error    Relative")
print("-" * 70)

for param in real_results.best_fit_params:
    if param in real_fitter.param_bounds:
        value = real_results.best_fit_params[param]
        uncertainty = real_results.uncertainties.get(param, 0)
        if uncertainty > 0:
            relative = uncertainty / value * 100
            print(f"{param:25s} {value:12.2e} ± {uncertainty:9.2e} ({relative:5.1f}%)")
        else:
            print(f"{param:25s} {value:12.2e} (no uncertainty)")

Model Comparison and Selection

Compare different model structures to find the best representation.

Information Criteria

# Compare models with different complexity

models_to_test = [
    ("Halo only", NFW(m_vir=1e12, r_s=20.0)),
    ("Disk only", MiyamotoNagaiDisk(m_tot=5e10, a=3.0, b=0.3)),
    ("Halo + Disk", NFW(m_vir=1e12, r_s=20.0) + MiyamotoNagaiDisk(m_tot=5e10, a=3.0, b=0.3))
]

model_results = []

for name, model in models_to_test:
    # Set up fitter
    test_fitter = optimize.Fitter()
    test_fitter.set_model(model)
    test_fitter.set_data(df['l'], df['b'], df['d'], df['alos_obs'], df['alos_err'])
    test_fitter.set_noise_model(GaussianNoise(sigma=1.0))

    # Configure parameters based on model type
    bounds = {'noise.sigma': (0.1, 5.0)}
    if 'NFW' in str(model):
        bounds.update({'NFW.m_vir': (5e11, 2e12), 'NFW.r_s': (10.0, 30.0)})
    if 'Miyamoto' in str(model):
        bounds.update({
            'Miyamoto-Nagai Disk.m_tot': (1e10, 1e11),
            'Miyamoto-Nagai Disk.a': (1.0, 5.0),
            'Miyamoto-Nagai Disk.b': (0.1, 1.0)
        })

    test_fitter.configure_params(bounds)

    # Fit model
    result = test_fitter.optimize(method='gradient_descent')

    model_results.append({
        'name': name,
        'nparams': model.nparams + 1,  # +1 for noise
        'chi2': result.reduced_chi2 * (len(df) - model.nparams - 1),
        'reduced_chi2': result.reduced_chi2,
        'aic': result.aic,
        'bic': result.aic + (model.nparams + 1) * np.log(len(df)) - 2 * (model.nparams + 1)
    })

# Display comparison
print("\nModel comparison:")
print("Model          Nparams    χ²     χ²_red    AIC     BIC")
print("-" * 55)
for result in model_results:
    print(f"{result['name']:12s} {result['nparams']:7d} {result['chi2']:7.1f} "
          f"{result['reduced_chi2']:7.3f} {result['aic']:7.1f} {result['bic']:7.1f}")

Residual Analysis

Analyze fit residuals to identify systematic issues.

# Calculate residuals for best model
best_model = real_fitter.model
best_model.set_params(real_results.best_fit_params)

alos_predicted = best_model.alos(df['l'], df['b'], df['d'])
residuals = df['alos_obs'] - alos_predicted
normalized_residuals = residuals / df['alos_err']

print("Residual analysis:")
print("Pulsar        Observed  Predicted  Residual   Norm.Res  Significance")
print("-" * 70)
for i, (_, row) in enumerate(df.iterrows()):
    significance = abs(normalized_residuals.iloc[i])
    flag = "***" if significance > 3 else ""
    print(f"{row['pulsar']:12s} {row['alos_obs']:8.2f} {alos_predicted[i]:8.2f} "
          f"{residuals.iloc[i]:8.2f} {normalized_residuals.iloc[i]:8.2f} "
          f"{significance:8.2f}σ {flag}")

# Statistical tests
print(f"\nResidual statistics:")
print(f"  RMS residual: {np.std(residuals):.3f} mm/s/yr")
print(f"  Mean |residual|: {np.mean(np.abs(residuals)):.3f} mm/s/yr")
print(f"  Normalized residuals mean: {np.mean(normalized_residuals):+.2f}")
print(f"  Normalized residuals std: {np.std(normalized_residuals):.2f}")
print(f"  Outliers (>3σ): {np.sum(np.abs(normalized_residuals) > 3)} / {len(residuals)}")

Best Practices

For reliable parameter inference:

  1. Start with simple models: Test individual components before fitting complex composites

  2. Use physical priors: Set parameter bounds based on observational constraints

  3. Test multiple algorithms: Compare local and global optimization results

  4. Check convergence: Verify results don’t depend on starting values

  5. Analyze residuals: Look for systematic patterns that suggest missing physics

  6. Consider model selection: Use information criteria to choose appropriate complexity

  7. Validate with synthetic data: Test your analysis pipeline on mock datasets

Troubleshooting Common Issues

Poor convergence:
  • Try different starting values or use global optimization

  • Check that parameter bounds are reasonable

  • Ensure data quality (remove outliers, check uncertainties)

Unphysical parameters:
  • Tighten parameter bounds based on observational constraints

  • Check for parameter degeneracies

  • Consider whether model structure is appropriate

Large residuals:
  • Examine data for outliers or systematic errors

  • Consider more complex model structures

  • Use robust noise models (Lorentz, PowerLaw)

Unstable uncertainties:
  • Check that Hessian is positive definite

  • Use profile likelihood for better uncertainty estimates

  • Bootstrap resampling for non-Gaussian uncertainties

Next Steps

With fitted models in hand, you can:

  • Constructing Noise Models: Explore robust fitting techniques for difficult datasets

  • Compare your results with literature values and physical expectations

  • Use fitted models to predict accelerations for new pulsar discoveries

  • Extend analysis to include time-dependent effects or non-standard dark matter models