pipeline¶
optimizer.pipeline
¶
End-to-end portfolio pipeline orchestration.
Composes pre-selection, optimisation, validation, scoring, hyperparameter tuning, and rebalancing into a single workflow.
PortfolioResult
dataclass
¶
Container for the output of a full portfolio optimisation run.
Attributes¶
weights : pd.Series
Final asset weights (ticker → weight).
portfolio : object
Skfolio Portfolio from predict() on the full dataset.
Exposes .sharpe_ratio, .sortino_ratio, .max_drawdown,
.composition, etc.
backtest : object or None
Out-of-sample MultiPeriodPortfolio (walk-forward) or
Population (CPCV / MultipleRandomizedCV). None when
backtesting was skipped.
pipeline : object
The fitted sklearn Pipeline (pre-selection + optimiser).
Can be reused for predict() on new data.
summary : dict[str, float]
Key performance metrics extracted from the in-sample portfolio.
rebalance_needed : bool or None
Whether the portfolio exceeds drift thresholds relative to
previous_weights. None when no previous weights were
provided.
turnover : float or None
One-way turnover between previous_weights and the new
weights. None when no previous weights were provided.
fx_decomposition : FxReturnDecomposition or None
FX return decomposition when FxConfig.mode == DECOMPOSE.
currency : str or None
Base currency used for FX conversion (e.g. "EUR").
net_returns : pd.Series or None
Net backtest portfolio returns after transaction cost deduction.
None when no backtest was run.
net_sharpe_ratio : float or None
Annualized Sharpe ratio computed from net_returns.
None when no backtest was run.
weight_history : pd.DataFrame or None
Absolute portfolio weights at each walk-forward rebalancing date.
Rows are rebalancing dates; columns are asset tickers.
Compatible with compute_net_alpha(weights_history=...).
None when no backtest was run.
build_portfolio_pipeline(optimizer, pre_selection_config=None, sector_mapping=None)
¶
Compose a full sklearn Pipeline: pre-selection → optimiser.
The resulting pipeline is a single estimator for cross-validation and hyperparameter tuning. Pre-selection is performed within each CV fold, preventing data leakage.
Parameters¶
optimizer : BaseOptimization
A skfolio optimiser (e.g. from build_mean_risk(),
build_hrp(), etc.) used as the final pipeline estimator.
pre_selection_config : PreSelectionConfig or None
Pre-selection configuration. None uses default settings.
sector_mapping : dict[str, str] or None
Ticker → sector mapping for :class:SectorImputer.
Returns¶
sklearn.pipeline.Pipeline
A fitted-ready pipeline whose fit(X) cleans and filters
returns then optimises, and whose predict(X) produces
a skfolio Portfolio.
Examples¶
from optimizer.optimization import MeanRiskConfig, build_mean_risk from optimizer.pipeline import build_portfolio_pipeline optimizer = build_mean_risk(MeanRiskConfig.for_max_sharpe()) pipeline = build_portfolio_pipeline(optimizer) pipeline.fit(X) # X = returns DataFrame portfolio = pipeline.predict(X) print(portfolio.sharpe_ratio)
backtest(pipeline, X, *, cv_config=None, y=None, n_jobs=None)
¶
Run walk-forward backtest on a portfolio pipeline.
Parameters¶
pipeline : Pipeline
A fitted-ready sklearn Pipeline (from build_portfolio_pipeline).
X : pd.DataFrame
Return matrix (observations x assets).
cv_config : WalkForwardConfig or None
Walk-forward configuration. Defaults to quarterly rolling.
y : pd.DataFrame or None
Benchmark or factor returns for models that require fit(X, y).
n_jobs : int or None
Number of parallel jobs.
Returns¶
MultiPeriodPortfolio or Population Out-of-sample portfolio predictions.
compute_net_backtest_returns(gross_returns, weight_changes, cost_bps=10.0)
¶
Deduct proportional transaction costs from gross backtest returns.
For each date with weight changes, the one-way turnover (half the sum
of absolute weight deltas, consistent with compute_turnover()) is
multiplied by cost_bps / 10_000 and subtracted from the gross
return at that date. A shift of weight w from one asset to another
incurs a cost of w * cost_bps / 10_000, not 2w.
Parameters¶
gross_returns : pd.Series Gross portfolio returns indexed by date. weight_changes : pd.DataFrame Weight change matrix (dates x assets). Only dates present in this DataFrame incur transaction costs. cost_bps : float Transaction cost in basis points (default 10 bps).
Returns¶
pd.Series Net returns with costs deducted.
optimize(pipeline, X, *, y=None)
¶
Fit pipeline on full data and return final weights.
Parameters¶
pipeline : Pipeline A fitted-ready sklearn Pipeline. X : pd.DataFrame Return matrix (observations x assets). y : pd.DataFrame or None Benchmark or factor returns.
Returns¶
PortfolioResult Weights, in-sample portfolio, and fitted pipeline.
run_full_pipeline(prices, optimizer, *, pre_selection_config=None, sector_mapping=None, cv_config=None, previous_weights=None, rebalancing_config=None, current_date=None, last_review_date=None, y_prices=None, risk_free_rate=0.0, delisting_returns=None, fx_config=None, currency_map=None, fx_rates=None, benchmark_currency=None, cost_bps=10.0, n_jobs=None)
¶
End-to-end: prices → validated weights + backtest + rebalancing.
This is the single entry point for producing a portfolio from raw price data. It:
- Converts prices to linear returns. 1b. Applies delisting returns (survivorship-bias correction).
- Builds the full pipeline (pre-selection + optimiser).
- Backtests via walk-forward (if
cv_configis provided). - Fits on full data to produce final weights.
- Checks rebalancing thresholds (if
previous_weightsgiven).
Parameters¶
prices : pd.DataFrame
Price matrix (dates x tickers).
optimizer : BaseOptimization
A skfolio optimiser instance (e.g. from build_mean_risk()).
pre_selection_config : PreSelectionConfig or None
Pre-selection configuration.
sector_mapping : dict[str, str] or None
Ticker → sector mapping for imputation.
cv_config : WalkForwardConfig or None
Walk-forward backtest configuration. None skips
backtesting.
previous_weights : ndarray or None
Current portfolio weights for rebalancing analysis.
rebalancing_config : ThresholdRebalancingConfig or HybridRebalancingConfig or None
Rebalancing configuration. Pass a ThresholdRebalancingConfig
for pure drift-based rebalancing or a HybridRebalancingConfig
for calendar-gated threshold rebalancing.
current_date : pd.Timestamp or None
Evaluation date for hybrid rebalancing. Defaults to the last
date in the return series when not provided.
last_review_date : pd.Timestamp or None
Date of the last hybrid review. When None with a
HybridRebalancingConfig, the calendar gate is treated as
already elapsed (threshold alone decides).
y_prices : pd.DataFrame or None
Benchmark or factor price series. Converted to returns
alongside asset prices.
delisting_returns : dict[str, float] or None
Mapping of ticker → terminal delisting return. When provided,
each ticker's last valid return is replaced with this value
after prices_to_returns() (survivorship-bias correction,
issue #274). Tickers not present in the returns columns are
silently ignored.
fx_config : FxConfig or None
Multi-currency FX conversion configuration (issue #283).
When provided with mode != NONE, prices are converted to
the base currency before prices_to_returns(). None
disables conversion (default, backward-compatible).
currency_map : dict[str, str] or None
Ticker → ISO currency code mapping. Required when
fx_config is provided.
fx_rates : pd.DataFrame or None
Pre-loaded FX rate DataFrame (dates x currencies). Each
column holds units-of-base per one unit-of-foreign.
Required when fx_config is provided.
benchmark_currency : str | None
ISO currency code for the benchmark in y_prices (issue #308).
When provided and FX conversion is active, all columns of
y_prices are treated as denominated in this currency and
converted to fx_config.base_currency before returns are
computed. None (default) preserves existing behaviour:
the benchmark is converted only if its ticker already appears
in currency_map.
cost_bps : float
One-way transaction cost in basis points applied to each
walk-forward rebalancing event. Subtracted from gross backtest
returns to produce result.net_returns and
result.net_sharpe_ratio. Default 10 bps.
n_jobs : int or None
Number of parallel jobs for backtesting.
Returns¶
PortfolioResult Complete result with weights, portfolio metrics, optional backtest, net returns, and rebalancing signals.
Examples¶
from optimizer.optimization import MeanRiskConfig, build_mean_risk from optimizer.validation import WalkForwardConfig from optimizer.pipeline import run_full_pipeline
optimizer = build_mean_risk(MeanRiskConfig.for_max_sharpe()) result = run_full_pipeline( ... prices=price_df, ... optimizer=optimizer, ... cv_config=WalkForwardConfig.for_quarterly_rolling(), ... ) print(result.weights) print(result.summary) print(result.backtest.sharpe_ratio) # out-of-sample
run_full_pipeline_with_selection(prices, optimizer, *, fundamentals=None, volume_history=None, financial_statements=None, analyst_data=None, insider_data=None, macro_data=None, regime_data=None, investability_config=None, factor_config=None, standardization_config=None, scoring_config=None, selection_config=None, regime_config=None, integration_config=None, sector_mapping=None, pre_selection_config=None, cv_config=None, previous_weights=None, rebalancing_config=None, current_date=None, last_review_date=None, y_prices=None, current_members=None, ic_history=None, risk_free_rate=0.0, delisting_returns=None, market_returns=None, fx_config=None, currency_map=None, fx_rates=None, benchmark_currency=None, cost_bps=10.0, n_jobs=None)
¶
End-to-end: fundamentals + prices → stock selection → optimization.
Extends :func:run_full_pipeline with upstream stock pre-selection:
- Screen universe for investability (if
fundamentalsprovided). - Compute and standardize factor scores.
- Apply macro regime tilts (if
macro_data+regime_config). - Compute composite score and select stocks.
- Run existing
run_full_pipelineon selected tickers.
Parameters¶
prices : pd.DataFrame
Price matrix (dates x tickers).
optimizer : BaseOptimization
A skfolio optimiser instance.
fundamentals : pd.DataFrame or None
Cross-sectional data indexed by ticker (market_cap, ratios).
If None, skips screening and factor selection.
volume_history : pd.DataFrame or None
Volume matrix (dates x tickers).
financial_statements : pd.DataFrame or None
Statement-level data for screening.
analyst_data : pd.DataFrame or None
Analyst recommendation data for factor construction.
insider_data : pd.DataFrame or None
Insider transaction data for factor construction.
macro_data : pd.DataFrame or None
Macro indicators for regime classification.
regime_data : pd.DataFrame or None
Merged macro indicators (pmi, spread_2s10s, hy_oas, etc.)
for composite regime classification. When provided and
non-empty, takes precedence over macro_data for regime
classification. Receives the same publication lag filtering.
investability_config : InvestabilityScreenConfig or None
Universe screening configuration.
factor_config : FactorConstructionConfig or None
Factor construction parameters.
standardization_config : StandardizationConfig or None
Factor standardization parameters.
scoring_config : CompositeScoringConfig or None
Composite scoring parameters.
selection_config : SelectionConfig or None
Stock selection parameters.
regime_config : RegimeTiltConfig or None
Regime tilt parameters.
integration_config : FactorIntegrationConfig or None
Factor-to-optimization bridge parameters.
sector_mapping : dict[str, str] or None
Ticker -> sector mapping.
pre_selection_config : PreSelectionConfig or None
Return-data pre-selection configuration.
cv_config : WalkForwardConfig or None
Walk-forward backtest configuration.
previous_weights : ndarray or None
Current portfolio weights for rebalancing.
rebalancing_config : ThresholdRebalancingConfig or None
Rebalancing threshold configuration.
y_prices : pd.DataFrame or None
Benchmark or factor price series.
current_members : pd.Index or None
Currently selected tickers for hysteresis.
ic_history : pd.DataFrame or None
IC history for IC-weighted scoring.
market_returns : pd.Series or None
Pre-computed market return series for beta estimation.
When provided, used as the benchmark instead of the
equal-weight cross-sectional mean. Pass a currency-
consistent broad index (e.g. SPY daily returns) when
prices spans multiple currency zones.
benchmark_currency : str | None
ISO currency code for the benchmark in y_prices.
Forwarded verbatim to :func:run_full_pipeline; see that
function's documentation for full semantics (issue #308).
n_jobs : int or None
Number of parallel jobs.
Returns¶
PortfolioResult Complete result with weights, metrics, backtest, and rebalancing signals.
tune_and_optimize(pipeline, X, param_grid, *, tuning_config=None, y=None, risk_free_rate=0.0)
¶
Tune hyperparameters via grid or randomized search, then optimise.
Parameters¶
pipeline : Pipeline
A fitted-ready sklearn Pipeline.
X : pd.DataFrame
Return matrix (observations x assets).
param_grid : dict
Parameter grid for GridSearchCV or distributions for
RandomizedSearchCV. Keys use sklearn double-underscore
notation for nested parameters.
tuning_config : GridSearchConfig or RandomizedSearchConfig or None
Search configuration. Defaults to quarterly walk-forward
with Sharpe ratio scoring (grid search).
y : pd.DataFrame or None
Benchmark or factor returns.
risk_free_rate : float
Daily risk-free rate for consistent Sharpe scoring (issue #272).
When non-zero and the scorer uses Sharpe ratio, the scorer
config is updated to use this rate.
Returns¶
PortfolioResult Weights from the best estimator, with backtest from CV.