Constructing Noise Models¶
This tutorial covers peebee’s noise model framework for handling realistic observational uncertainties and outliers in pulsar acceleration data. We’ll explore different noise distributions and show how to configure robust fitting procedures.
Overview¶
Real astronomical data rarely follows perfect Gaussian statistics. Systematic errors, calibration issues, and unexpected astrophysical effects can create outliers and non-Gaussian noise distributions. Peebee’s noise models provide robust alternatives to standard χ² fitting.
Available noise models:
GaussianNoise: Standard Gaussian likelihood (traditional χ² fitting)
LorentzNoise: Cauchy/Lorentzian distribution (heavy-tailed, robust to outliers)
PowerLawNoise: Power-law likelihood for extreme outlier scenarios
Understanding Noise Models¶
Each noise model implements a different likelihood function for the residuals.
Gaussian Noise Model¶
The standard approach assumes Gaussian-distributed measurement errors:
import numpy as np
from peebee.noise import GaussianNoise
from peebee import optimize, models
from peebee.models import NFW, MiyamotoNagaiDisk
# Create synthetic data with Gaussian noise
np.random.seed(42)
# True model
true_model = NFW(m_vir=1e12, r_s=20.0) + MiyamotoNagaiDisk(m_tot=5e10, a=3.0, b=0.3)
# Generate clean accelerations
l_test = np.random.uniform(0, 360, 30)
b_test = np.random.uniform(-30, 30, 30)
d_test = np.random.uniform(0.5, 2.5, 30)
alos_clean = true_model.alos(l_test, b_test, d_test)
# Add Gaussian noise
uncertainties = 0.15 * np.ones_like(alos_clean) # 0.15 mm/s/yr uncertainty
alos_noisy = alos_clean + np.random.normal(0, uncertainties)
print("Gaussian noise example:")
print(f"Data points: {len(alos_noisy)}")
print(f"RMS residual: {np.std(alos_noisy - alos_clean):.3f} mm/s/yr")
print(f"Expected RMS: {np.mean(uncertainties):.3f} mm/s/yr")
Create and configure a Gaussian noise model:
# Initialize Gaussian noise model
gaussian_noise = GaussianNoise(sigma=1.0)
print(f"Gaussian noise model parameters: {gaussian_noise.params}")
print(f"Parameter names: {gaussian_noise.param_names}")
# Test likelihood calculation
test_residuals = np.array([0.0, 1.0, -1.0, 2.0])
likelihood = gaussian_noise.likelihood(test_residuals)
print(f"Negative log-likelihood: {likelihood:.3f}")
Lorentzian Noise Model¶
The Lorentzian (Cauchy) distribution has heavy tails, making it robust to outliers:
from peebee.noise import LorentzNoise
# Create data with outliers
alos_outliers = alos_clean + np.random.normal(0, uncertainties)
# Add some extreme outliers (5% of data)
n_outliers = max(1, len(alos_outliers) // 20)
outlier_indices = np.random.choice(len(alos_outliers), n_outliers, replace=False)
alos_outliers[outlier_indices] += np.random.normal(0, 10*uncertainties[outlier_indices])
print(f"\nAdded {n_outliers} extreme outliers")
print(f"Outlier amplitudes: {alos_outliers[outlier_indices] - alos_clean[outlier_indices]}")
print(f"RMS with outliers: {np.std(alos_outliers - alos_clean):.3f} mm/s/yr")
# Initialize Lorentzian noise model
lorentz_noise = LorentzNoise(gamma=1.0)
print(f"Lorentzian noise model parameters: {lorentz_noise.params}")
Power-Law Noise Model¶
For extreme cases with very heavy tails:
from peebee.noise import PowerLawNoise
# Initialize power-law noise model
power_law_noise = PowerLawNoise(zeta=1.5)
print(f"Power-law noise parameter ζ = {power_law_noise.params['zeta']}")
# Compare likelihood behavior for different noise models
test_residual = 5.0 # Large outlier
gauss_ll = gaussian_noise.likelihood(np.array([test_residual]))
lorentz_ll = lorentz_noise.likelihood(np.array([test_residual]))
power_ll = power_law_noise.likelihood(np.array([test_residual]))
print(f"\nLikelihood for residual = {test_residual}:")
print(f" Gaussian: {gauss_ll:.3f}")
print(f" Lorentzian: {lorentz_ll:.3f}")
print(f" Power-law: {power_ll:.3f}")
print(" (Lower = more likely; Lorentzian/Power-law are more tolerant)")
Configuring Noise Models in Fitting¶
Let’s compare how different noise models handle the same dataset.
Setting Up the Test¶
# Create a model to fit
fit_model = NFW(m_vir=8e11, r_s=15.0) + MiyamotoNagaiDisk(m_tot=4e10, a=2.5, b=0.25)
# Common fitter configuration
def setup_fitter(noise_model, data_alos):
fitter = optimize.Fitter()
fitter.set_model(fit_model)
fitter.set_data(l_test, b_test, d_test, data_alos, uncertainties)
fitter.set_noise_model(noise_model)
# Configure parameters
fitter.configure_params({
'NFW.m_vir': (5e11, 2e12),
'NFW.r_s': (5.0, 40.0),
'Miyamoto-Nagai Disk.m_tot': (1e10, 1e11),
'Miyamoto-Nagai Disk.a': (1.0, 5.0),
'Miyamoto-Nagai Disk.b': (0.1, 1.0)
})
return fitter
print("Setting up noise model comparison...")
Gaussian Fitting¶
# Fit with Gaussian noise model
gaussian_noise = GaussianNoise(sigma=1.0)
gaussian_fitter = setup_fitter(gaussian_noise, alos_outliers)
# Add noise parameter to optimization
gaussian_fitter.configure_params({'noise.sigma': (0.1, 10.0)})
print("Fitting with Gaussian noise model...")
gaussian_results = gaussian_fitter.optimize(method='gradient_descent')
print(f"Gaussian fit results:")
print(f" Reduced χ²: {gaussian_results.reduced_chi2:.3f}")
print(f" AIC: {gaussian_results.aic:.1f}")
print(f" Noise σ: {gaussian_results.best_fit_params['noise.sigma']:.3f}")
Lorentzian Fitting¶
# Fit with Lorentzian noise model
lorentz_noise = LorentzNoise(gamma=1.0)
lorentz_fitter = setup_fitter(lorentz_noise, alos_outliers)
# Add noise parameter to optimization
lorentz_fitter.configure_params({'noise.gamma': (0.1, 10.0)})
print("Fitting with Lorentzian noise model...")
lorentz_results = lorentz_fitter.optimize(method='gradient_descent')
print(f"Lorentzian fit results:")
print(f" Reduced χ²: {lorentz_results.reduced_chi2:.3f}")
print(f" AIC: {lorentz_results.aic:.1f}")
print(f" Noise γ: {lorentz_results.best_fit_params['noise.gamma']:.3f}")
Parameter Recovery Analysis¶
Compare how each noise model handles outliers:
# Compare parameter recovery
true_params = true_model.params
results_list = [
("Gaussian", gaussian_results),
("Lorentzian", lorentz_results)
]
print("\nParameter recovery comparison:")
print("Parameter True Value Gaussian Lorentzian Recovery Diff")
print("-" * 80)
for param in ['NFW.m_vir', 'NFW.r_s', 'Miyamoto-Nagai Disk.m_tot']:
true_val = true_params[param]
gauss_val = gaussian_results.best_fit_params[param]
lorentz_val = lorentz_results.best_fit_params[param]
gauss_recovery = (gauss_val - true_val) / true_val * 100
lorentz_recovery = (lorentz_val - true_val) / true_val * 100
print(f"{param:25s} {true_val:10.2e} {gauss_val:10.2e} {lorentz_val:11.2e} "
f"{lorentz_recovery - gauss_recovery:+9.1f}%")
# Residual analysis
fit_model.set_params(gaussian_results.best_fit_params)
residuals_gaussian = alos_outliers - fit_model.alos(l_test, b_test, d_test)
fit_model.set_params(lorentz_results.best_fit_params)
residuals_lorentz = alos_outliers - fit_model.alos(l_test, b_test, d_test)
print(f"\nResidual analysis:")
print(f" Gaussian RMS: {np.std(residuals_gaussian):.3f} mm/s/yr")
print(f" Lorentzian RMS: {np.std(residuals_lorentz):.3f} mm/s/yr")
Advanced Noise Model Configuration¶
Working with Fixed Noise Parameters¶
Sometimes you want to fix noise parameters based on external knowledge:
# Fix noise parameter based on instrumental knowledge
fixed_noise = LorentzNoise(gamma=0.2) # Known instrumental characteristics
fixed_fitter = optimize.Fitter()
fixed_fitter.set_model(fit_model)
fixed_fitter.set_data(l_test, b_test, d_test, alos_outliers, uncertainties)
fixed_fitter.set_noise_model(fixed_noise)
# Don't include noise parameter in optimization
fixed_fitter.configure_params({
'NFW.m_vir': (5e11, 2e12),
'NFW.r_s': (5.0, 40.0),
'Miyamoto-Nagai Disk.m_tot': (1e10, 1e11)
})
print("Fitting with fixed noise parameter...")
fixed_results = fixed_fitter.optimize(method='gradient_descent')
print(f"Fixed noise results:")
print(f" Reduced χ²: {fixed_results.reduced_chi2:.3f}")
print(f" Fixed γ: {fixed_noise.params['gamma']:.3f}")
Noise Model Parameter Updates¶
# Update noise model parameters after initialization
dynamic_noise = GaussianNoise(sigma=1.0)
print(f"Initial σ: {dynamic_noise.params['sigma']}")
# Update parameter
dynamic_noise.set_params({'sigma': 2.5})
print(f"Updated σ: {dynamic_noise.params['sigma']}")
# Get parameter names and current values
print(f"All parameters: {dynamic_noise.get_params()}")
Model Selection with Information Criteria¶
# Compare models using AIC
print("\nModel selection:")
print("Noise Model AIC ΔAIC Best?")
print("-" * 35)
aics = [gaussian_results.aic, lorentz_results.aic]
best_aic = min(aics)
for (name, results) in results_list:
delta_aic = results.aic - best_aic
is_best = "✓" if results.aic == best_aic else ""
print(f"{name:12s} {results.aic:7.1f} {delta_aic:6.1f} {is_best:>6s}")
print("\nInterpretation:")
print(" ΔAIC < 2: Models are essentially equivalent")
print(" ΔAIC 2-7: Some support for best model")
print(" ΔAIC > 10: Strong support for best model")
Noise Model Diagnostics¶
Understanding noise model behavior:
Direct Likelihood Comparison¶
# Calculate likelihoods for the same set of residuals
test_residuals = np.array([-3.0, -1.0, 0.0, 1.0, 3.0]) # Example residuals
# Set up noise models with similar scales
gauss = GaussianNoise(sigma=1.0)
lorentz = LorentzNoise(gamma=1.0)
print("Likelihood comparison for different residuals:")
print("Residual Gaussian Lorentzian Ratio")
print("-" * 40)
for res in test_residuals:
# Calculate negative log-likelihoods
ll_gauss = gauss.likelihood(np.array([res]))
ll_lorentz = lorentz.likelihood(np.array([res]))
ratio = ll_lorentz / ll_gauss if ll_gauss != 0 else np.inf
print(f"{res:8.1f} {ll_gauss:9.3f} {ll_lorentz:11.3f} {ratio:8.3f}")
print("\nInterpretation:")
print(" Ratio < 1: Lorentzian more tolerant of residual")
print(" Ratio > 1: Gaussian more tolerant of residual")
Working with Real Data¶
Practical guidelines for real pulsar datasets:
Data Assessment¶
# Example: Assess real data for appropriate noise model choice
def assess_data_for_noise_model(alos_obs, alos_err):
"""Assess data characteristics to guide noise model selection"""
# Calculate normalized residuals (need a rough model first)
median_acc = np.median(alos_obs)
residuals_from_median = alos_obs - median_acc
normalized = residuals_from_median / alos_err
# Statistical tests
n_outliers = np.sum(np.abs(normalized) > 3)
outlier_fraction = n_outliers / len(normalized)
rms = np.std(normalized)
skewness = np.mean(normalized**3)
kurtosis = np.mean(normalized**4) - 3 # Excess kurtosis
print(f"Data assessment:")
print(f" Sample size: {len(alos_obs)}")
print(f" Outliers (>3σ): {n_outliers} ({outlier_fraction*100:.1f}%)")
print(f" RMS of normalized residuals: {rms:.2f}")
print(f" Skewness: {skewness:+.2f}")
print(f" Excess kurtosis: {kurtosis:+.2f}")
# Recommendations
print(f"\nRecommendations:")
if outlier_fraction < 0.05 and abs(kurtosis) < 1:
print(" → Use GaussianNoise (data appears well-behaved)")
elif outlier_fraction < 0.15:
print(" → Consider LorentzNoise (some outliers present)")
else:
print(" → Use LorentzNoise or PowerLawNoise (many outliers)")
if abs(skewness) > 1:
print(" → Data may have systematic bias - investigate")
# Test with our outlier-contaminated data
assess_data_for_noise_model(alos_outliers, uncertainties)
Iterative Outlier Identification¶
# Use robust fitting to identify problematic data points
def identify_outliers_iterative(fitter, threshold=3.0, max_iterations=3):
"""Iteratively identify outliers using robust fitting"""
original_data = fitter.alos_obs.copy()
original_err = fitter.alos_err.copy()
for iteration in range(max_iterations):
# Fit with current data
results = fitter.optimize(method='gradient_descent')
# Calculate residuals
fitter.model.set_params(results.best_fit_params)
predicted = fitter.model.alos(fitter.l, fitter.b, fitter.d)
residuals = fitter.alos_obs - predicted
normalized_residuals = residuals / fitter.alos_err
# Identify outliers
outlier_mask = np.abs(normalized_residuals) > threshold
n_outliers = np.sum(outlier_mask)
print(f"Iteration {iteration+1}: {n_outliers} outliers found")
if n_outliers == 0:
print(" No more outliers - converged")
break
# Remove outliers for next iteration
keep_mask = ~outlier_mask
fitter.set_data(
fitter.l[keep_mask], fitter.b[keep_mask], fitter.d[keep_mask],
fitter.alos_obs[keep_mask], fitter.alos_err[keep_mask]
)
print(f" Removed {n_outliers} points, {len(fitter.alos_obs)} remaining")
return results, outlier_mask
# Example usage (would need a properly set up fitter)
print("\nIterative outlier removal example:")
print("This approach removes outliers completely rather than down-weighting them")
Best Practices¶
Guidelines for choosing and using noise models:
Model Selection Strategy:
Start with data assessment: Use statistical tests to understand your data
Begin with Gaussian: For well-understood, high-quality data
Use Lorentzian for robustness: When outliers are suspected but should be retained
Compare models objectively: Use AIC/BIC for model selection
Validate with synthetic data: Test your approach on simulated datasets
Parameter Configuration:
Use physically motivated bounds: Noise parameters should make sense
Consider fixed parameters: If instrumental characteristics are known
Monitor convergence: Some noise models can be harder to optimize
Cross-validate results: Check stability across different datasets
Interpretation and Reporting:
Document your choice: Report which noise model was used and why
Show model comparison: Include AIC/BIC comparisons
Analyze residuals: Validate your noise model choice post-fit
Consider systematic effects: Large noise parameters may indicate missing physics
Common Pitfalls¶
- Over-robust fitting:
Very flexible noise models can mask real astrophysical signals
Balance robustness against statistical power
- Inappropriate parameter bounds:
Noise parameters should be constrained by physical expectations
Unconstrained noise can compensate for model inadequacies
- Model comparison without validation:
AIC/BIC prefer simpler models - validate on independent data
Consider cross-validation for model selection
Next Steps¶
With robust noise modeling:
Apply to real pulsar timing datasets with known systematics
Explore hierarchical noise models for multi-survey data
Combine with advanced optimization techniques
Use noise diagnostics to identify instrumental or astrophysical systematics
Understanding noise models enables reliable parameter inference from real astronomical data with realistic uncertainties and outliers.