Skip to content

Tuning

The tuning module wraps sklearn's GridSearchCV and RandomizedSearchCV with temporal cross-validation defaults that prevent look-ahead bias. It enforces walk-forward validation by default, ensuring that hyperparameter selection respects the time-series nature of financial data.

Overview

Hyperparameter tuning for portfolio optimization requires special care: standard k-fold CV would use future returns to select parameters, introducing look-ahead bias. The tuning module addresses this by coupling sklearn's search algorithms with temporal cross-validation from the validation module.

Because the portfolio pipeline is a single sklearn Pipeline object, all nested parameters are accessible via the double-underscore __ notation (e.g., "optimizer__l2_coef", "drop_correlated__threshold").

Exhaustive search over a specified parameter grid with temporal CV.

Configuration

from optimizer.tuning import GridSearchConfig
from optimizer.validation import WalkForwardConfig
from optimizer.scoring import ScorerConfig

config = GridSearchConfig(
    cv_config=WalkForwardConfig.for_quarterly_rolling(),
    scorer_config=ScorerConfig.for_sharpe(),
    n_jobs=None,
    return_train_score=False,
)
Field Type Default Description
cv_config WalkForwardConfig default (quarterly rolling) Temporal cross-validation strategy
scorer_config ScorerConfig default (Sharpe ratio) Portfolio scoring function
n_jobs int or None None Parallel jobs; -1 uses all cores
return_train_score bool False Compute training scores (slower)

Presets

Preset CV Config n_jobs Description
for_quick_search() Monthly rolling -1 Fast evaluation, all cores
for_thorough_search() Quarterly expanding -1 Comprehensive with train scores

Samples parameter configurations from specified distributions rather than exhaustive enumeration. Preferred when the parameter space is large or continuous.

Configuration

from optimizer.tuning import RandomizedSearchConfig

config = RandomizedSearchConfig(
    n_iter=50,
    cv_config=WalkForwardConfig.for_quarterly_rolling(),
    scorer_config=ScorerConfig.for_sharpe(),
    n_jobs=None,
    random_state=42,
    return_train_score=False,
)
Field Type Default Description
n_iter int 50 Number of random parameter samples
cv_config WalkForwardConfig default Temporal CV strategy
scorer_config ScorerConfig default (Sharpe) Scoring function
n_jobs int or None None Parallel jobs
random_state int or None None Seed for reproducibility
return_train_score bool False Compute training scores

Presets

Preset n_iter CV Config Description
for_quick_search(20) 20 Monthly rolling Fast random sampling
for_thorough_search(100) 100 Quarterly expanding Comprehensive search

Nested Parameter Addressing

The sklearn Pipeline flattens all transformer and optimizer parameters, making them tunable via the double-underscore __ notation. The step names come from build_portfolio_pipeline():

validate__max_abs_return
outliers__winsorize_threshold
outliers__remove_threshold
impute__sector_mapping
drop_correlated__threshold
optimizer__risk_measure
optimizer__l2_coef
optimizer__prior_estimator__mu_estimator__alpha

Discovering tunable parameters

from optimizer.pipeline import build_portfolio_pipeline
from optimizer.optimization import MeanRiskConfig, build_mean_risk

optimizer = build_mean_risk(MeanRiskConfig.for_max_sharpe())
pipeline = build_portfolio_pipeline(optimizer)

# List all tunable parameters
for name, value in sorted(pipeline.get_params().items()):
    print(f"{name}: {value}")

Code Examples

Grid search over regularization

from optimizer.pipeline import build_portfolio_pipeline, tune_and_optimize
from optimizer.optimization import MeanRiskConfig, build_mean_risk
from optimizer.tuning import GridSearchConfig
from skfolio.preprocessing import prices_to_returns

X = prices_to_returns(prices)
optimizer = build_mean_risk(MeanRiskConfig.for_max_sharpe())
pipeline = build_portfolio_pipeline(optimizer)

param_grid = {
    "optimizer__l2_coef": [0.0, 0.01, 0.05, 0.1],
}

result = tune_and_optimize(
    pipeline, X,
    param_grid=param_grid,
    tuning_config=GridSearchConfig.for_quick_search(),
)
print(f"Best L2 coef: {result.pipeline.get_params()['optimizer__l2_coef']}")

Grid search over multiple parameters

param_grid = {
    "optimizer__l2_coef": [0.0, 0.01, 0.1],
    "drop_correlated__threshold": [0.85, 0.90, 0.95],
    "outliers__winsorize_threshold": [2.5, 3.0, 3.5],
}

result = tune_and_optimize(
    pipeline, X,
    param_grid=param_grid,
    tuning_config=GridSearchConfig(n_jobs=-1),
)

Randomized search with distributions

from scipy.stats import uniform, loguniform
from optimizer.tuning import RandomizedSearchConfig

param_distributions = {
    "optimizer__l2_coef": loguniform(1e-4, 1e-1),
    "drop_correlated__threshold": uniform(0.80, 0.15),  # [0.80, 0.95]
}

result = tune_and_optimize(
    pipeline, X,
    param_grid=param_distributions,
    tuning_config=RandomizedSearchConfig.for_thorough_search(n_iter=50),
)

Using build functions directly

from optimizer.tuning import build_grid_search_cv, build_randomized_search_cv

# Grid search
gs = build_grid_search_cv(pipeline, param_grid, config=GridSearchConfig())
gs.fit(X)
print(f"Best score: {gs.best_score_:.4f}")
print(f"Best params: {gs.best_params_}")

# Randomized search
rs = build_randomized_search_cv(pipeline, param_distributions, config=RandomizedSearchConfig())
rs.fit(X)

Gotchas and Tips

Temporal CV is enforced by default

Both GridSearchConfig and RandomizedSearchConfig default to walk-forward validation. Do not override this with standard KFold — it introduces look-ahead bias.

Use double-underscore notation for nested parameters

Pipeline parameters are addressed as "step_name__parameter". For deeply nested parameters, chain underscores: "optimizer__prior_estimator__mu_estimator__alpha".

Grid search vs randomized search

Use grid search when the parameter space is small and discrete. Use randomized search when exploring continuous distributions or when the grid would be too large. Randomized search with n_iter=50 often finds good parameters faster than exhaustive grid search.

Computation cost

Each combination is evaluated across all walk-forward folds. With 4 folds, 3 parameters, and 4 values each: 4^3 * 4 = 256 fits. Use n_jobs=-1 for parallelism and start with for_quick_search().

Quick Reference

Task Code
Quick grid search GridSearchConfig.for_quick_search()
Thorough grid search GridSearchConfig.for_thorough_search()
Quick random search RandomizedSearchConfig.for_quick_search(n_iter=20)
Thorough random search RandomizedSearchConfig.for_thorough_search(n_iter=100)
Tune + optimize tune_and_optimize(pipeline, X, param_grid={...})
Build grid search build_grid_search_cv(pipeline, param_grid)
List tunable params pipeline.get_params().keys()