pre_selection¶

`optimizer.pre_selection` ¶

Pre-selection pipeline assembly.

`PreSelectionConfig` `dataclass` ¶

Immutable configuration for the pre-selection pipeline.

All parameters map 1:1 to transformer/selector constructor arguments, making the config serialisable and suitable for hyperparameter sweeps.

Parameters¶

max_abs_return : float Maximum absolute return before treating as data error (DataValidator). winsorize_threshold : float Z-score threshold for winsorisation (OutlierTreater). remove_threshold : float Z-score threshold for removal as data error (OutlierTreater). outlier_method : str Outlier detection approach. Currently only "time_series" is supported (per-column z-scores). imputation_fallback : str Fallback when sector data is unavailable. "global_mean" uses the cross-sectional mean across all assets. correlation_threshold : float Pairwise correlation above which an asset is dropped (DropCorrelated). correlation_absolute : bool If True, use absolute correlation values. top_k : int or None If set, keep only the k assets with the highest (or lowest) mean return via SelectKExtremes. top_k_highest : bool Select assets with the highest mean when True, lowest when False. use_pareto : bool If True, apply SelectNonDominated Pareto filter. pareto_min_assets : int or None Minimum number of assets to retain after Pareto filtering. use_non_expiring : bool If True, apply SelectNonExpiring to remove soon-expiring assets. expiration_lookahead : int or None Number of calendar days to look ahead for expiring assets, forwarded to SelectNonExpiring as a timedelta. is_log_normal : bool Whether returns are assumed log-normal for multi-period scaling (deferred to Chapter 2, stored here for completeness).

`for_daily_annual()` `classmethod` ¶

Sensible defaults for daily returns over a ~1-year horizon.

`for_conservative()` `classmethod` ¶

Tighter filters for a more conservative universe.

`build_preselection_pipeline(config=None, sector_mapping=None)` ¶

Build an sklearn Pipeline for data cleaning and asset pre-selection.

The pipeline is assembled from config and follows this order::

validate → outliers → impute → SelectComplete → DropZeroVariance
→ DropCorrelated → [SelectKExtremes] → [SelectNonDominated]
→ [SelectNonExpiring]

Optional steps (in brackets) are only included when the corresponding config flag or parameter is set.

All transformer hyper-parameters are accessible via pipeline.get_params() for cross-validation tuning (e.g. outliers__winsorize_threshold).

Parameters¶

config : PreSelectionConfig or None Pipeline configuration. Defaults to PreSelectionConfig() (sensible defaults for daily equity returns). sector_mapping : dict[str, str] or None Ticker → sector mapping forwarded to :class:SectorImputer. When None, global cross-sectional mean imputation is used.

Returns¶

sklearn.pipeline.Pipeline

pre_selection¶

optimizer.pre_selection ¶

PreSelectionConfig dataclass ¶

Parameters¶

for_daily_annual() classmethod ¶

for_conservative() classmethod ¶

build_preselection_pipeline(config=None, sector_mapping=None) ¶

Parameters¶

Returns¶

`optimizer.pre_selection` ¶

`PreSelectionConfig` `dataclass` ¶

`for_daily_annual()` `classmethod` ¶

`for_conservative()` `classmethod` ¶

`build_preselection_pipeline(config=None, sector_mapping=None)` ¶