Skip to content

pipeline

optimizer.pipeline

End-to-end portfolio pipeline orchestration.

Composes pre-selection, optimisation, validation, scoring, hyperparameter tuning, and rebalancing into a single workflow.

PortfolioResult dataclass

Container for the output of a full portfolio optimisation run.

Attributes

weights : pd.Series Final asset weights (ticker → weight). portfolio : object Skfolio Portfolio from predict() on the full dataset. Exposes .sharpe_ratio, .sortino_ratio, .max_drawdown, .composition, etc. backtest : object or None Out-of-sample MultiPeriodPortfolio (walk-forward) or Population (CPCV / MultipleRandomizedCV). None when backtesting was skipped. pipeline : object The fitted sklearn Pipeline (pre-selection + optimiser). Can be reused for predict() on new data. summary : dict[str, float] Key performance metrics extracted from the in-sample portfolio. rebalance_needed : bool or None Whether the portfolio exceeds drift thresholds relative to previous_weights. None when no previous weights were provided. turnover : float or None One-way turnover between previous_weights and the new weights. None when no previous weights were provided.

build_portfolio_pipeline(optimizer, pre_selection_config=None, sector_mapping=None)

Compose a full sklearn Pipeline: pre-selection → optimiser.

The resulting pipeline is a single estimator for cross-validation and hyperparameter tuning. Pre-selection is performed within each CV fold, preventing data leakage.

Parameters

optimizer : BaseOptimization A skfolio optimiser (e.g. from build_mean_risk(), build_hrp(), etc.) used as the final pipeline estimator. pre_selection_config : PreSelectionConfig or None Pre-selection configuration. None uses default settings. sector_mapping : dict[str, str] or None Ticker → sector mapping for :class:SectorImputer.

Returns

sklearn.pipeline.Pipeline A fitted-ready pipeline whose fit(X) cleans and filters returns then optimises, and whose predict(X) produces a skfolio Portfolio.

Examples

from optimizer.optimization import MeanRiskConfig, build_mean_risk from optimizer.pipeline import build_portfolio_pipeline optimizer = build_mean_risk(MeanRiskConfig.for_max_sharpe()) pipeline = build_portfolio_pipeline(optimizer) pipeline.fit(X) # X = returns DataFrame portfolio = pipeline.predict(X) print(portfolio.sharpe_ratio)

backtest(pipeline, X, *, cv_config=None, y=None, n_jobs=None)

Run walk-forward backtest on a portfolio pipeline.

Parameters

pipeline : Pipeline A fitted-ready sklearn Pipeline (from build_portfolio_pipeline). X : pd.DataFrame Return matrix (observations x assets). cv_config : WalkForwardConfig or None Walk-forward configuration. Defaults to quarterly rolling. y : pd.DataFrame or None Benchmark or factor returns for models that require fit(X, y). n_jobs : int or None Number of parallel jobs.

Returns

MultiPeriodPortfolio or Population Out-of-sample portfolio predictions.

compute_net_backtest_returns(gross_returns, weight_changes, cost_bps=10.0)

Deduct proportional transaction costs from gross backtest returns.

For each date with weight changes, the turnover (sum of absolute weight deltas) is multiplied by cost_bps / 10_000 and subtracted from the gross return at that date.

Parameters

gross_returns : pd.Series Gross portfolio returns indexed by date. weight_changes : pd.DataFrame Weight change matrix (dates x assets). Only dates present in this DataFrame incur transaction costs. cost_bps : float Transaction cost in basis points (default 10 bps).

Returns

pd.Series Net returns with costs deducted.

optimize(pipeline, X, *, y=None)

Fit pipeline on full data and return final weights.

Parameters

pipeline : Pipeline A fitted-ready sklearn Pipeline. X : pd.DataFrame Return matrix (observations x assets). y : pd.DataFrame or None Benchmark or factor returns.

Returns

PortfolioResult Weights, in-sample portfolio, and fitted pipeline.

run_full_pipeline(prices, optimizer, *, pre_selection_config=None, sector_mapping=None, cv_config=None, previous_weights=None, rebalancing_config=None, current_date=None, last_review_date=None, y_prices=None, n_jobs=None)

End-to-end: prices → validated weights + backtest + rebalancing.

This is the single entry point for producing a portfolio from raw price data. It:

  1. Converts prices to linear returns.
  2. Builds the full pipeline (pre-selection + optimiser).
  3. Backtests via walk-forward (if cv_config is provided).
  4. Fits on full data to produce final weights.
  5. Checks rebalancing thresholds (if previous_weights given).
Parameters

prices : pd.DataFrame Price matrix (dates x tickers). optimizer : BaseOptimization A skfolio optimiser instance (e.g. from build_mean_risk()). pre_selection_config : PreSelectionConfig or None Pre-selection configuration. sector_mapping : dict[str, str] or None Ticker → sector mapping for imputation. cv_config : WalkForwardConfig or None Walk-forward backtest configuration. None skips backtesting. previous_weights : ndarray or None Current portfolio weights for rebalancing analysis. rebalancing_config : ThresholdRebalancingConfig or HybridRebalancingConfig or None Rebalancing configuration. Pass a ThresholdRebalancingConfig for pure drift-based rebalancing or a HybridRebalancingConfig for calendar-gated threshold rebalancing. current_date : pd.Timestamp or None Evaluation date for hybrid rebalancing. Defaults to the last date in the return series when not provided. last_review_date : pd.Timestamp or None Date of the last hybrid review. When None with a HybridRebalancingConfig, the calendar gate is treated as already elapsed (threshold alone decides). y_prices : pd.DataFrame or None Benchmark or factor price series. Converted to returns alongside asset prices. n_jobs : int or None Number of parallel jobs for backtesting.

Returns

PortfolioResult Complete result with weights, portfolio metrics, optional backtest, and rebalancing signals.

Examples

from optimizer.optimization import MeanRiskConfig, build_mean_risk from optimizer.validation import WalkForwardConfig from optimizer.pipeline import run_full_pipeline

optimizer = build_mean_risk(MeanRiskConfig.for_max_sharpe()) result = run_full_pipeline( ... prices=price_df, ... optimizer=optimizer, ... cv_config=WalkForwardConfig.for_quarterly_rolling(), ... ) print(result.weights) print(result.summary) print(result.backtest.sharpe_ratio) # out-of-sample

run_full_pipeline_with_selection(prices, optimizer, *, fundamentals=None, volume_history=None, financial_statements=None, analyst_data=None, insider_data=None, macro_data=None, investability_config=None, factor_config=None, standardization_config=None, scoring_config=None, selection_config=None, regime_config=None, integration_config=None, sector_mapping=None, pre_selection_config=None, cv_config=None, previous_weights=None, rebalancing_config=None, current_date=None, last_review_date=None, y_prices=None, current_members=None, ic_history=None, n_jobs=None)

End-to-end: fundamentals + prices → stock selection → optimization.

Extends :func:run_full_pipeline with upstream stock pre-selection:

  1. Screen universe for investability (if fundamentals provided).
  2. Compute and standardize factor scores.
  3. Apply macro regime tilts (if macro_data + regime_config).
  4. Compute composite score and select stocks.
  5. Run existing run_full_pipeline on selected tickers.
Parameters

prices : pd.DataFrame Price matrix (dates x tickers). optimizer : BaseOptimization A skfolio optimiser instance. fundamentals : pd.DataFrame or None Cross-sectional data indexed by ticker (market_cap, ratios). If None, skips screening and factor selection. volume_history : pd.DataFrame or None Volume matrix (dates x tickers). financial_statements : pd.DataFrame or None Statement-level data for screening. analyst_data : pd.DataFrame or None Analyst recommendation data for factor construction. insider_data : pd.DataFrame or None Insider transaction data for factor construction. macro_data : pd.DataFrame or None Macro indicators for regime classification. investability_config : InvestabilityScreenConfig or None Universe screening configuration. factor_config : FactorConstructionConfig or None Factor construction parameters. standardization_config : StandardizationConfig or None Factor standardization parameters. scoring_config : CompositeScoringConfig or None Composite scoring parameters. selection_config : SelectionConfig or None Stock selection parameters. regime_config : RegimeTiltConfig or None Regime tilt parameters. integration_config : FactorIntegrationConfig or None Factor-to-optimization bridge parameters. sector_mapping : dict[str, str] or None Ticker -> sector mapping. pre_selection_config : PreSelectionConfig or None Return-data pre-selection configuration. cv_config : WalkForwardConfig or None Walk-forward backtest configuration. previous_weights : ndarray or None Current portfolio weights for rebalancing. rebalancing_config : ThresholdRebalancingConfig or None Rebalancing threshold configuration. y_prices : pd.DataFrame or None Benchmark or factor price series. current_members : pd.Index or None Currently selected tickers for hysteresis. ic_history : pd.DataFrame or None IC history for IC-weighted scoring. n_jobs : int or None Number of parallel jobs.

Returns

PortfolioResult Complete result with weights, metrics, backtest, and rebalancing signals.

tune_and_optimize(pipeline, X, param_grid, *, tuning_config=None, y=None)

Tune hyperparameters via grid or randomized search, then optimise.

Parameters

pipeline : Pipeline A fitted-ready sklearn Pipeline. X : pd.DataFrame Return matrix (observations x assets). param_grid : dict Parameter grid for GridSearchCV or distributions for RandomizedSearchCV. Keys use sklearn double-underscore notation for nested parameters. tuning_config : GridSearchConfig or RandomizedSearchConfig or None Search configuration. Defaults to quarterly walk-forward with Sharpe ratio scoring (grid search). y : pd.DataFrame or None Benchmark or factor returns.

Returns

PortfolioResult Weights from the best estimator, with backtest from CV.