Tuning¶
The tuning module wraps sklearn's GridSearchCV and RandomizedSearchCV with temporal cross-validation defaults that prevent look-ahead bias. It enforces walk-forward validation by default, ensuring that hyperparameter selection respects the time-series nature of financial data.
Overview¶
Hyperparameter tuning for portfolio optimization requires special care: standard k-fold CV would use future returns to select parameters, introducing look-ahead bias. The tuning module addresses this by coupling sklearn's search algorithms with temporal cross-validation from the validation module.
Because the portfolio pipeline is a single sklearn Pipeline object, all nested parameters are accessible via the double-underscore __ notation (e.g., "optimizer__l2_coef", "drop_correlated__threshold").
Grid Search¶
Exhaustive search over a specified parameter grid with temporal CV.
Configuration¶
from optimizer.tuning import GridSearchConfig
from optimizer.validation import WalkForwardConfig
from optimizer.scoring import ScorerConfig
config = GridSearchConfig(
cv_config=WalkForwardConfig.for_quarterly_rolling(),
scorer_config=ScorerConfig.for_sharpe(),
n_jobs=None,
return_train_score=False,
)
| Field | Type | Default | Description |
|---|---|---|---|
cv_config |
WalkForwardConfig |
default (quarterly rolling) | Temporal cross-validation strategy |
scorer_config |
ScorerConfig |
default (Sharpe ratio) | Portfolio scoring function |
n_jobs |
int or None |
None |
Parallel jobs; -1 uses all cores |
return_train_score |
bool |
False |
Compute training scores (slower) |
Presets¶
| Preset | CV Config | n_jobs | Description |
|---|---|---|---|
for_quick_search() |
Monthly rolling | -1 | Fast evaluation, all cores |
for_thorough_search() |
Quarterly expanding | -1 | Comprehensive with train scores |
Randomized Search¶
Samples parameter configurations from specified distributions rather than exhaustive enumeration. Preferred when the parameter space is large or continuous.
Configuration¶
from optimizer.tuning import RandomizedSearchConfig
config = RandomizedSearchConfig(
n_iter=50,
cv_config=WalkForwardConfig.for_quarterly_rolling(),
scorer_config=ScorerConfig.for_sharpe(),
n_jobs=None,
random_state=42,
return_train_score=False,
)
| Field | Type | Default | Description |
|---|---|---|---|
n_iter |
int |
50 | Number of random parameter samples |
cv_config |
WalkForwardConfig |
default | Temporal CV strategy |
scorer_config |
ScorerConfig |
default (Sharpe) | Scoring function |
n_jobs |
int or None |
None |
Parallel jobs |
random_state |
int or None |
None |
Seed for reproducibility |
return_train_score |
bool |
False |
Compute training scores |
Presets¶
| Preset | n_iter | CV Config | Description |
|---|---|---|---|
for_quick_search(20) |
20 | Monthly rolling | Fast random sampling |
for_thorough_search(100) |
100 | Quarterly expanding | Comprehensive search |
Nested Parameter Addressing¶
The sklearn Pipeline flattens all transformer and optimizer parameters, making them tunable via the double-underscore __ notation. The step names come from build_portfolio_pipeline():
validate__max_abs_return
outliers__winsorize_threshold
outliers__remove_threshold
impute__sector_mapping
drop_correlated__threshold
optimizer__risk_measure
optimizer__l2_coef
optimizer__prior_estimator__mu_estimator__alpha
Discovering tunable parameters¶
from optimizer.pipeline import build_portfolio_pipeline
from optimizer.optimization import MeanRiskConfig, build_mean_risk
optimizer = build_mean_risk(MeanRiskConfig.for_max_sharpe())
pipeline = build_portfolio_pipeline(optimizer)
# List all tunable parameters
for name, value in sorted(pipeline.get_params().items()):
print(f"{name}: {value}")
Code Examples¶
Grid search over regularization¶
from optimizer.pipeline import build_portfolio_pipeline, tune_and_optimize
from optimizer.optimization import MeanRiskConfig, build_mean_risk
from optimizer.tuning import GridSearchConfig
from skfolio.preprocessing import prices_to_returns
X = prices_to_returns(prices)
optimizer = build_mean_risk(MeanRiskConfig.for_max_sharpe())
pipeline = build_portfolio_pipeline(optimizer)
param_grid = {
"optimizer__l2_coef": [0.0, 0.01, 0.05, 0.1],
}
result = tune_and_optimize(
pipeline, X,
param_grid=param_grid,
tuning_config=GridSearchConfig.for_quick_search(),
)
print(f"Best L2 coef: {result.pipeline.get_params()['optimizer__l2_coef']}")
Grid search over multiple parameters¶
param_grid = {
"optimizer__l2_coef": [0.0, 0.01, 0.1],
"drop_correlated__threshold": [0.85, 0.90, 0.95],
"outliers__winsorize_threshold": [2.5, 3.0, 3.5],
}
result = tune_and_optimize(
pipeline, X,
param_grid=param_grid,
tuning_config=GridSearchConfig(n_jobs=-1),
)
Randomized search with distributions¶
from scipy.stats import uniform, loguniform
from optimizer.tuning import RandomizedSearchConfig
param_distributions = {
"optimizer__l2_coef": loguniform(1e-4, 1e-1),
"drop_correlated__threshold": uniform(0.80, 0.15), # [0.80, 0.95]
}
result = tune_and_optimize(
pipeline, X,
param_grid=param_distributions,
tuning_config=RandomizedSearchConfig.for_thorough_search(n_iter=50),
)
Using build functions directly¶
from optimizer.tuning import build_grid_search_cv, build_randomized_search_cv
# Grid search
gs = build_grid_search_cv(pipeline, param_grid, config=GridSearchConfig())
gs.fit(X)
print(f"Best score: {gs.best_score_:.4f}")
print(f"Best params: {gs.best_params_}")
# Randomized search
rs = build_randomized_search_cv(pipeline, param_distributions, config=RandomizedSearchConfig())
rs.fit(X)
Gotchas and Tips¶
Temporal CV is enforced by default
Both GridSearchConfig and RandomizedSearchConfig default to walk-forward validation. Do not override this with standard KFold — it introduces look-ahead bias.
Use double-underscore notation for nested parameters
Pipeline parameters are addressed as "step_name__parameter". For deeply nested parameters, chain underscores: "optimizer__prior_estimator__mu_estimator__alpha".
Grid search vs randomized search
Use grid search when the parameter space is small and discrete. Use randomized search when exploring continuous distributions or when the grid would be too large. Randomized search with n_iter=50 often finds good parameters faster than exhaustive grid search.
Computation cost
Each combination is evaluated across all walk-forward folds. With 4 folds, 3 parameters, and 4 values each: 4^3 * 4 = 256 fits. Use n_jobs=-1 for parallelism and start with for_quick_search().
Quick Reference¶
| Task | Code |
|---|---|
| Quick grid search | GridSearchConfig.for_quick_search() |
| Thorough grid search | GridSearchConfig.for_thorough_search() |
| Quick random search | RandomizedSearchConfig.for_quick_search(n_iter=20) |
| Thorough random search | RandomizedSearchConfig.for_thorough_search(n_iter=100) |
| Tune + optimize | tune_and_optimize(pipeline, X, param_grid={...}) |
| Build grid search | build_grid_search_cv(pipeline, param_grid) |
| List tunable params | pipeline.get_params().keys() |