factors¶

`optimizer.factors` ¶

Factor construction, scoring, and selection for stock pre-selection.

`CompositeMethod` ¶

Bases: str, Enum

Composite scoring method.

`CompositeScoringConfig` `dataclass` ¶

Configuration for composite score construction.

Parameters¶

method : CompositeMethod Equal-weight, IC-weighted, ICIR-weighted, ridge, or GBT composite. ic_lookback : int Number of periods for IC estimation when using IC weighting. core_weight : float Relative weight for core factor groups. supplementary_weight : float Relative weight for supplementary factor groups. ridge_alpha : float L2 regularisation strength for RIDGE_WEIGHTED. Passed as the single candidate to RidgeCV; increase for more shrinkage. gbt_max_depth : int Maximum tree depth for GBT_WEIGHTED. gbt_n_estimators : int Number of boosting rounds for GBT_WEIGHTED.

`for_equal_weight()` `classmethod` ¶

Equal-weight composite scoring.

`for_ic_weighted()` `classmethod` ¶

IC-weighted composite scoring (raw IC magnitude).

`for_icir_weighted()` `classmethod` ¶

ICIR-weighted composite scoring (mean IC / std IC).

Penalises inconsistent predictors by dividing mean IC by IC volatility before normalising weights.

`for_ridge_weighted()` `classmethod` ¶

Ridge regression composite scoring.

Learns optimal linear factor weights from historical data with L2 regularisation, avoiding the need for IC proxies.

`for_gbt_weighted()` `classmethod` ¶

Gradient-boosted tree composite scoring.

Captures non-linear factor interactions (e.g. high value + improving momentum = stronger combined signal).

`FactorConstructionConfig` `dataclass` ¶

Configuration for factor computation.

Parameters¶

factors : tuple[FactorType, ...] Which factors to compute. momentum_lookback : int Lookback window for momentum in trading days. momentum_skip : int Recent days to skip for momentum (reversal avoidance). volatility_lookback : int Lookback window for volatility in trading days. beta_lookback : int Lookback window for beta estimation in trading days. amihud_lookback : int Lookback window for Amihud illiquidity in trading days. publication_lag : PublicationLagConfig Per-source publication lags for point-in-time correctness. Pass a plain int for a uniform lag across all sources (backward-compatible; converted to :class:PublicationLagConfig automatically).

`for_core_factors()` `classmethod` ¶

Core factors with strongest empirical support.

`for_all_factors()` `classmethod` ¶

All 17 factors.

`FactorGroupType` ¶

Bases: str, Enum

Factor group taxonomy.

`FactorIntegrationConfig` `dataclass` ¶

Configuration for bridging factor scores to optimization.

Parameters¶

risk_free_rate : float Annual risk-free rate for expected return mapping. market_risk_premium : float Annual equity risk premium. use_black_litterman : bool Whether to generate Black-Litterman views from factor scores. exposure_lower_bound : float Lower bound for factor exposure constraints. exposure_upper_bound : float Upper bound for factor exposure constraints.

`for_linear_mapping()` `classmethod` ¶

Direct factor score to expected return mapping.

`for_black_litterman()` `classmethod` ¶

Factor-based Black-Litterman views.

`FactorType` ¶

Bases: str, Enum

Individual factor identifiers.

`FactorValidationConfig` `dataclass` ¶

Configuration for factor validation and statistical testing.

Parameters¶

newey_west_lags : int Number of lags for Newey-West t-statistic. t_stat_threshold : float Minimum absolute t-statistic for significance. fdr_alpha : float False discovery rate alpha level. n_quantiles : int Number of quantiles for spread analysis. fmp_top_pct : float Top percentile for factor-mimicking portfolios. fmp_bottom_pct : float Bottom percentile for factor-mimicking portfolios.

`for_strict()` `classmethod` ¶

Strict validation thresholds.

`for_standard()` `classmethod` ¶

Standard validation thresholds.

`GroupWeight` ¶

Bases: str, Enum

Weight tier for factor groups.

`MacroRegime` ¶

Bases: str, Enum

Macro-economic regime classification.

`PublicationLagConfig` `dataclass` ¶

Differentiated publication lags by data source type.

Each source has an independent delay between the period end date and the date the data is reliably available for use in factor construction. Using source-specific lags avoids look-ahead bias when aligning fundamental data to price dates.

Parameters¶

annual_days : int Lag for annual financial statements (days after fiscal year end). Default: 90 days (~3 months for 10-K filing). quarterly_days : int Lag for quarterly financial statements (days after quarter end). Default: 45 days (~6 weeks for 10-Q filing). analyst_days : int Lag for analyst estimates and recommendations. Default: 5 days (short dissemination buffer). macro_days : int Lag for macroeconomic indicators (release lag + revision lag). Default: 63 days (~2 months).

`uniform(days)` `classmethod` ¶

Create a config with the same lag applied to all sources.

`RegimeTiltConfig` `dataclass` ¶

Configuration for macro regime factor tilts.

Per-regime multiplicative tilts stored as tuples of (group_name, tilt_factor) for frozen-dataclass compatibility.

Parameters¶

enable : bool Whether to apply regime tilts. expansion_tilts : tuple[tuple[str, float], ...] Group tilts during expansion. slowdown_tilts : tuple[tuple[str, float], ...] Group tilts during slowdown. recession_tilts : tuple[tuple[str, float], ...] Group tilts during recession. recovery_tilts : tuple[tuple[str, float], ...] Group tilts during recovery.

`for_moderate_tilts()` `classmethod` ¶

Enable moderate regime-conditional tilts.

`for_no_tilts()` `classmethod` ¶

Disable regime tilts (default).

`SelectionConfig` `dataclass` ¶

Configuration for stock selection from scored universe.

Parameters¶

method : SelectionMethod Fixed-count or quantile-based selection. target_count : int Number of stocks to select (for FIXED_COUNT). target_quantile : float Quantile threshold for selection (for QUANTILE, 0-1). exit_quantile : float Exit quantile for hysteresis (for QUANTILE). buffer_fraction : float Buffer zone fraction around selection boundary. sector_balance : bool Whether to enforce sector-proportional representation. sector_tolerance : float Maximum deviation from parent universe sector weights.

`for_top_100()` `classmethod` ¶

Select top 100 stocks by composite score.

`for_top_quintile()` `classmethod` ¶

Select top quintile by composite score.

`for_concentrated()` `classmethod` ¶

Concentrated portfolio of top 30 stocks.

`SelectionMethod` ¶

Bases: str, Enum

Stock selection method.

`StandardizationConfig` `dataclass` ¶

Configuration for cross-sectional factor standardization.

Parameters¶

method : StandardizationMethod Z-score or rank-normal standardization. winsorize_lower : float Lower percentile for winsorization (0-1). winsorize_upper : float Upper percentile for winsorization (0-1). neutralize_sector : bool Whether to sector-neutralize scores. neutralize_country : bool Whether to country-neutralize scores.

`for_heavy_tailed()` `classmethod` ¶

Rank-normal for heavy-tailed distributions (e.g. value ratios).

`for_normal()` `classmethod` ¶

Z-score for approximately normal factors (e.g. momentum).

`StandardizationMethod` ¶

Bases: str, Enum

Cross-sectional standardization method.

`FactorPCAResult` `dataclass` ¶

Principal component analysis result for a factor score matrix.

Attributes¶

explained_variance_ratio : ndarray, shape (n_components,) Fraction of variance explained by each principal component, sorted in descending order. loadings : pd.DataFrame, shape (n_factors, n_components) PCA loading matrix. Rows are factor names; columns are PC1, PC2, ... . Each column is a unit eigenvector of the correlation matrix of the factor scores. n_components_95pct : int Smallest number of components whose cumulative explained variance ratio is ≥ 0.95.

`FactorExposureConstraints` `dataclass` ¶

Enforceable linear constraints on portfolio factor exposure.

Encodes the set of per-factor inequalities::

lb_g <= sum_i w_i * z_{i,g} <= ub_g

as a pair of matrices ready to be passed directly to :class:skfolio.optimization.MeanRisk (or any optimizer that accepts left_inequality / right_inequality).

Parameters¶

left_inequality : np.ndarray of shape (2 * n_factors, n_assets) Inequality matrix A in the constraint A @ w <= b. Two rows per factor: -z (lower bound) and +z (upper bound). right_inequality : np.ndarray of shape (2 * n_factors,) Bound vector b in the constraint A @ w <= b. factor_names : list[str] Names of the constrained factors (in the same order as the row pairs in left_inequality). lower_bounds : np.ndarray of shape (n_factors,) Lower exposure bound per factor. upper_bounds : np.ndarray of shape (n_factors,) Upper exposure bound per factor.

`NetAlphaResult` `dataclass` ¶

Result of net alpha calculation after transaction cost deduction.

Attributes¶

gross_alpha : float Annualised IC-based alpha proxy: mean(IC) * sqrt(annualisation). avg_turnover : float Mean one-way turnover across consecutive rebalancing dates, computed via :func:~optimizer.rebalancing._rebalancer.compute_turnover. total_cost : float Cost deduction: avg_turnover * cost_bps / 10_000. net_alpha : float Net annualised alpha after cost deduction: gross_alpha - total_cost. net_icir : float Net information coefficient information ratio: net_alpha / (std(IC) * sqrt(annualisation)). 0.0 when the IC series has zero variance.

`QuintileSpreadResult` `dataclass` ¶

Quintile spread analysis result for a single factor.

Attributes¶

quintile_returns : pd.DataFrame Dates × Q1..Qn equal-weight portfolio returns per quantile bucket. Q1 = bottom (lowest scores), Qn = top (highest scores). spread_returns : pd.Series Qn − Q1 long-short spread return series indexed by date. Equals quintile_returns.iloc[:, -1] - quintile_returns.iloc[:, 0] element-wise. annualised_mean : float spread_returns.mean() * 252. t_stat : float Two-tailed t-statistic: mean / (std / sqrt(T)). sharpe : float Annualised Sharpe ratio: mean * sqrt(252) / std.

`FactorOOSConfig` `dataclass` ¶

Configuration for rolling block OOS validation.

Parameters¶

train_months : int Length of the training window in months. Default: 36. val_months : int Length of the validation window in months. Default: 12. step_months : int Number of months to roll forward between folds. Default: 6.

`FactorOOSResult` `dataclass` ¶

Results from rolling block OOS factor validation.

Attributes¶

per_fold_ic : pd.DataFrame n_folds × factors matrix of mean IC per fold per factor. per_fold_spread : pd.DataFrame n_folds × factors matrix of mean quintile spread per fold. mean_oos_ic : pd.Series Mean OOS IC aggregated across folds (one value per factor). mean_oos_icir : pd.Series OOS ICIR (mean IC / std IC across folds) per factor. n_folds : int Number of folds generated.

`CorrectedPValues` `dataclass` ¶

Multiple-testing corrected p-values.

Attributes¶

holm : ndarray Holm-Bonferroni adjusted p-values (controls FWER). bh : ndarray Benjamini-Hochberg adjusted p-values (controls FDR).

`FactorValidationReport` `dataclass` ¶

Complete validation report for all factors.

`ICResult` `dataclass` ¶

Information coefficient analysis results for a single factor.

`ICStats` `dataclass` ¶

Full IC statistics for a single factor including Newey-West inference.

Attributes¶

mean : float Mean IC over the evaluation period. variance_nw : float Newey-West HAC variance of the IC series. t_stat_nw : float Newey-West adjusted t-statistic: IC_mean / sqrt(Var_NW / T). p_value : float Two-tailed p-value derived from the Newey-West t-statistic. icir : float Information Coefficient Information Ratio: mean(IC) / std(IC).

`QuantileSpreadResult` `dataclass` ¶

Quantile spread analysis results for a single factor.

`compute_gross_alpha(net_alpha, avg_turnover, cost_bps=10.0)` ¶

Compute gross alpha by adding back estimated transaction costs.

Formula::

gross = net_alpha + avg_turnover * cost_bps / 10_000

Parameters¶

net_alpha : float Net alpha after transaction costs (annualised). avg_turnover : float Average one-way turnover (e.g. 0.5 means 50% of portfolio traded per period). cost_bps : float One-way transaction cost in basis points.

Returns¶

float Gross alpha before transaction costs.

`factor_scores_to_expected_returns(scores, betas, factor_premiums, risk_free_rate=0.0)` ¶

Convert factor Z-scores to expected returns via linear model.

Implements the formula::

E[r_i] = r_f + λ_mkt · β_i + Σ_g λ_g · z_{i,g}

where λ_mkt is read from factor_premiums["market"] and each λ_g is read from factor_premiums[g] for factor group g.

Parameters¶

scores : pd.DataFrame Assets × factor-groups matrix of standardised Z-scores. Rows are ticker symbols; columns are factor group names (e.g. "value", "momentum"). betas : pd.Series Market (CAPM) beta per asset, indexed by ticker. Assets missing from this Series are treated as having a beta of 1.0 (market neutral assumption). factor_premiums : dict[str, float] Mapping of premium label → annualised premium (e.g. {"market": 0.05, "value": 0.03, "momentum": 0.04}). The reserved "market" key provides λ_mkt; all other keys are matched against columns in scores. risk_free_rate : float, default 0.0 Annualised risk-free rate r_f.

Returns¶

pd.Series Annualised expected return per ticker, indexed by scores.index.

Examples¶

import pandas as pd scores = pd.DataFrame( ... {"value": [1.0, -1.0], "momentum": [0.5, 0.0]}, ... index=["AAPL", "MSFT"], ... ) betas = pd.Series({"AAPL": 1.2, "MSFT": 0.8}) factor_premiums = {"market": 0.05, "value": 0.03, "momentum": 0.04} factor_scores_to_expected_returns(scores, betas, factor_premiums, 0.02) AAPL 0.132 MSFT 0.018 dtype: float64

`align_to_pit(data, period_date_col, as_of_date, lag_days, ticker_col='ticker')` ¶

Filter time-series data to records published before as_of_date.

A record with period end date D is considered published lag_days calendar days after D. A record is available as of as_of_date only when D + lag_days <= as_of_date, equivalently when D <= as_of_date - lag_days.

For each ticker, the most recent record satisfying the availability constraint is returned so that callers receive a cross-sectional view as of as_of_date.

Parameters¶

data : pd.DataFrame Time-series data containing period_date_col and optionally ticker_col. period_date_col : str Name of the column holding the period end date. as_of_date : pd.Timestamp or str The computation date. Only records available on or before this date (after the lag has elapsed) are returned. lag_days : int Calendar days between period end and data availability. ticker_col : str Column holding the ticker identifier. Defaults to "ticker".

Returns¶

pd.DataFrame Cross-sectional view: one row per ticker (the most recent available record), indexed by ticker_col when present. Returns an empty DataFrame with the same columns if no records pass the cutoff.

`compute_all_factors(fundamentals, price_history, volume_history=None, analyst_data=None, insider_data=None, config=None)` ¶

Compute all configured factors.

Parameters¶

fundamentals : pd.DataFrame Cross-sectional data indexed by ticker. price_history : pd.DataFrame Price matrix (dates x tickers). volume_history : pd.DataFrame or None Volume matrix. analyst_data : pd.DataFrame or None Analyst recommendation data. insider_data : pd.DataFrame or None Insider transaction data. config : FactorConstructionConfig or None Construction parameters.

Returns¶

pd.DataFrame Tickers x factors matrix.

`compute_factor(factor_type, fundamentals, price_history, volume_history=None, analyst_data=None, insider_data=None, config=None)` ¶

Compute a single factor.

Parameters¶

factor_type : FactorType Which factor to compute. fundamentals : pd.DataFrame Cross-sectional data indexed by ticker. price_history : pd.DataFrame Price matrix (dates x tickers). volume_history : pd.DataFrame or None Volume matrix (dates x tickers). analyst_data : pd.DataFrame or None Analyst recommendation data. insider_data : pd.DataFrame or None Insider transaction data. config : FactorConstructionConfig or None Construction parameters.

Returns¶

pd.Series Factor values indexed by ticker.

`check_survivorship_bias(returns, final_periods=12, zero_threshold=1e-10)` ¶

Check for potential survivorship bias in a return panel.

Survivorship bias occurs when delisted or failed assets are excluded from the sample. A simple heuristic: if no asset has near-zero returns in the final final_periods rows (i.e., no asset appears to have stopped trading), the panel may suffer from survivorship bias.

Parameters¶

returns : pd.DataFrame Dates × assets return matrix. final_periods : int Number of trailing periods to inspect. zero_threshold : float Absolute threshold below which a return is considered "zero".

Returns¶

bool True if survivorship bias is suspected, False otherwise.

`compute_factor_pca(scores, n_components=None)` ¶

Compute PCA on a cross-sectional factor score matrix.

Rows with any NaN are dropped before fitting. Scores are standardised (zero mean, unit variance per factor) so that PCA operates on the correlation structure rather than the covariance structure.

Parameters¶

scores : pd.DataFrame Tickers × factors matrix of factor scores. Columns are factor names; rows are asset observations. n_components : int or None, default None Number of principal components to retain. None keeps all components (min(n_samples, n_features)).

Returns¶

FactorPCAResult See :class:FactorPCAResult for field descriptions.

Raises¶

ValueError If fewer than 2 factors or fewer than 2 observations are available after dropping NaN rows.

`flag_redundant_factors(scores, vif_threshold=10.0)` ¶

Return factor names whose VIF exceeds vif_threshold.

A VIF above the threshold indicates that the factor's variance is largely explained by the remaining factors, making it a candidate for merging or removal from the composite score.

Parameters¶

scores : pd.DataFrame Tickers × factors matrix of factor scores. Must contain at least 2 factor columns. vif_threshold : float, default 10.0 VIF cutoff above which a factor is considered redundant. Commonly used values: 5 (conservative) or 10 (standard).

Returns¶

list[str] Factor names with VIF > vif_threshold, in the order they appear in scores.columns. Empty list if none exceed the threshold.

Raises¶

ValueError Propagated from :func:compute_vif if fewer than 2 factors are provided.

`build_factor_bl_views(factor_scores, factor_premia, selected_tickers)` ¶

Generate Black-Litterman views from factor scores.

Creates relative views: top-scored assets outperform bottom-scored by the factor premium.

Parameters¶

factor_scores : pd.DataFrame Tickers x factors matrix of standardized scores. factor_premia : dict[str, float] Expected premium per factor. selected_tickers : pd.Index Tickers in the portfolio.

Returns¶

tuple[list[tuple[str, ...]], list[float]] (views, confidences) for Black-Litterman.

`build_factor_exposure_constraints(factor_scores, bounds)` ¶

Build enforceable linear factor exposure constraints.

For each factor g, the constraint enforces::

lb_g <= sum_i w_i * z_{i,g} <= ub_g

The result is expressed as left_inequality @ w <= right_inequality (two rows per factor) and can be passed directly to :class:skfolio.optimization.MeanRisk via its left_inequality / right_inequality constructor arguments.

Parameters¶

factor_scores : pd.DataFrame Tickers x factors matrix of standardised factor scores. The tickers must match the assets used in the optimizer fit. bounds : tuple[float, float] or dict[str, tuple[float, float]] Exposure bounds applied to every factor (uniform) when given as a single (lower, upper) tuple, or per-factor bounds when given as a dict mapping factor name → (lower, upper).

Returns¶

FactorExposureConstraints Dataclass holding left_inequality, right_inequality, and metadata. Pass left_inequality and right_inequality as keyword arguments to the optimizer.

Warns¶

UserWarning If the equal-weight portfolio exposure lies outside [lb, ub] for any factor (i.e. the constraint may be infeasible or very tight under a balanced allocation).

`compute_net_alpha(ic_series, weights_history, cost_bps=10.0, annualisation=252)` ¶

Compute factor net alpha after deducting turnover-based transaction costs.

Combines IC-based gross alpha with the turnover cost from a weights history to produce a single net performance metric::

gross_alpha  = mean(IC) * sqrt(annualisation)
avg_turnover = mean one-way turnover across rebalancing dates
total_cost   = avg_turnover * cost_bps / 10_000
net_alpha    = gross_alpha - total_cost
net_icir     = net_alpha / (std(IC) * sqrt(annualisation))

Parameters¶

ic_series : pd.Series Time series of period information coefficients (Spearman or Pearson rank correlation between factor scores and forward returns), one value per rebalancing period. weights_history : pd.DataFrame Portfolio weights at each rebalancing date: rows = dates, columns = assets. Turnover is computed between every pair of consecutive rows. cost_bps : float, default=10.0 Round-trip transaction cost in basis points. annualisation : int, default=252 Number of periods per year (252 for daily, 12 for monthly).

Returns¶

NetAlphaResult Dataclass with gross_alpha, avg_turnover, total_cost, net_alpha, and net_icir.

`estimate_factor_premia(factor_mimicking_returns)` ¶

Estimate annualized factor premia from long-short returns.

Parameters¶

factor_mimicking_returns : pd.DataFrame Dates x factors matrix of factor-mimicking portfolio returns.

Returns¶

dict[str, float] Annualized premium per factor.

`build_factor_mimicking_portfolios(scores, returns, quantile=0.3, weighting='equal', beta_neutral=False, market_returns=None)` ¶

Build long-short factor-mimicking portfolio return time series.

For each date the top quantile fraction of assets (by factor score) are held long and the bottom quantile fraction are held short. The long-short return is the equal- or value-weighted long leg minus the corresponding short leg.

The function handles one factor at a time: scores is a dates × assets DataFrame encoding cross-sectional scores for a single factor. For multiple factors, call once per factor and concatenate the results::

factor_returns = pd.concat(
    [
        build_factor_mimicking_portfolios(scores_value, returns)
            .rename(columns={"factor_return": "value"}),
        build_factor_mimicking_portfolios(scores_mom, returns)
            .rename(columns={"factor_return": "momentum"}),
    ],
    axis=1,
)

Parameters¶

scores : pd.DataFrame Dates × assets matrix of cross-sectional factor scores. Index = dates; columns = asset tickers. returns : pd.DataFrame Dates × assets matrix of asset returns, aligned with scores on the date index. Columns may be a superset or subset of scores columns; the intersection is used. quantile : float, default 0.30 Fraction of the asset universe assigned to each leg. Must be in (0, 0.5]. weighting : {"equal", "value"}, default "equal" Weighting scheme within each leg. "equal" — every asset in the leg receives the same weight. "value" — assets are weighted by the absolute value of their factor score. beta_neutral : bool, default False When True, hedge the long-short portfolio against market beta exposure. The hedge ratio adjusts the short-leg weight so that the portfolio beta is approximately zero. market_returns : pd.Series or None Market return series, required when beta_neutral=True.

Returns¶

pd.DataFrame Dates × 1 DataFrame of long-short portfolio returns. Column name is "factor_return". Index is the intersection of scores and returns dates. Missing periods (fewer than 2 * k valid observations) are filled with NaN.

Raises¶

ValueError If quantile is outside (0, 0.5] or weighting is unknown.

`compute_cross_factor_correlation(factor_returns)` ¶

Compute the Pearson correlation matrix across factor-mimicking portfolios.

Parameters¶

factor_returns : pd.DataFrame Dates × factors DataFrame of long-short factor returns, as returned by build_factor_mimicking_portfolios (possibly concatenated across multiple factors).

Returns¶

pd.DataFrame Factors × factors symmetric correlation matrix. Diagonal entries are exactly 1.0. Computed on the rows where all factors have non-NaN returns (pairwise-complete otherwise).

`compute_quintile_spread(scores, returns, n_quantiles=5)` ¶

Compute quintile portfolio returns and spread for a single factor.

At each date assets are ranked by factor score and split into n_quantiles equal-count buckets (Q1 = lowest scores, Qn = highest). Each bucket return is the equal-weight average of its members. The long-short spread is Qn − Q1.

Ties in scores are broken by rank order (method="first"), ensuring every bucket is populated at every date.

Parameters¶

scores : pd.DataFrame Dates × assets matrix of cross-sectional factor scores. returns : pd.DataFrame Dates × assets matrix of asset returns, aligned with scores. n_quantiles : int, default 5 Number of equal-count buckets. 5 = quintiles, 10 = deciles. Must be ≥ 2.

Returns¶

QuintileSpreadResult See :class:QuintileSpreadResult for field descriptions.

Raises¶

ValueError If n_quantiles < 2.

`fit_gbt_composite(scores, forward_returns, max_depth=3, n_estimators=50)` ¶

Fit a gradient-boosted tree model mapping factor scores to forward returns.

Parameters¶

scores : pd.DataFrame Historical tickers x factors matrix (training observations). forward_returns : pd.Series Forward return per ticker for the training period. max_depth : int Maximum depth of individual regression trees (3–5 recommended to limit extrapolation and retain interpretability). n_estimators : int Number of boosting rounds.

Returns¶

GradientBoostingRegressor Fitted GBT model.

`fit_ridge_composite(scores, forward_returns, alpha=1.0)` ¶

Fit a ridge regression model mapping factor scores to forward returns.

Parameters¶

scores : pd.DataFrame Historical tickers x factors matrix (training observations). Must be aligned with forward_returns on the index. forward_returns : pd.Series Forward return per ticker for the training period. alpha : float L2 regularisation strength. A single-element array is passed to RidgeCV so cross-validation still runs internally if multiple alphas are desired; here we keep one alpha for determinism.

Returns¶

RidgeCV Fitted ridge model. Call predict(scores) for new data.

`predict_composite_scores(model, scores)` ¶

Apply a fitted ridge or GBT model to produce normalised composite scores.

The raw predictions are standardised to zero mean and unit variance so the output is on the same scale as z-score factor inputs.

Parameters¶

model : RidgeCV or GradientBoostingRegressor A model returned by :func:fit_ridge_composite or :func:fit_gbt_composite. scores : pd.DataFrame Current-period tickers x factors matrix.

Returns¶

pd.Series Normalised composite score per ticker (zero mean, unit variance). Tickers with all-NaN factor rows receive NaN.

`run_factor_oos_validation(scores, returns, config=None, cpcv_config=None)` ¶

Rolling block or CPCV out-of-sample validation of factor IC and spreads.

Parameters¶

scores : pd.DataFrame Panel of standardised factor scores with a two-level row MultiIndex (date, ticker) and one column per factor. returns : pd.DataFrame Forward returns panel with the same (date, ticker) MultiIndex and a single return column. config : FactorOOSConfig or None Rolling window parameters. Defaults to FactorOOSConfig(). Ignored when cpcv_config is provided. cpcv_config : CPCVConfig or None When provided, uses combinatorial purged cross-validation instead of rolling blocks. Overrides config.

Returns¶

FactorOOSResult Per-fold and aggregate IC and quintile spread statistics.

Notes¶

The validation window computation uses only val-window dates; no training-window data is used. Fold count equals floor((total_months - train_months) / step_months) for rolling, or C(n_folds, n_test_folds) for CPCV.

`apply_regime_tilts(group_weights, regime, config=None)` ¶

Apply regime-conditional multiplicative tilts to group weights.

Parameters¶

group_weights : dict[FactorGroupType, float] Base group weights. regime : MacroRegime Current macro regime. config : RegimeTiltConfig or None Tilt configuration.

Returns¶

dict[FactorGroupType, float] Tilted group weights (re-normalized to sum to original total).

`classify_regime(macro_data)` ¶

Classify the current macro-economic regime.

Uses a simple heuristic based on GDP growth and leading indicators. The regime is determined by the latest observation's position relative to trend.

Parameters¶

macro_data : pd.DataFrame Macro indicators with columns that may include gdp_growth, leading_indicator, yield_spread, unemployment_rate. Index is date.

Returns¶

MacroRegime Current regime classification.

`get_regime_tilts(regime, config=None)` ¶

Get multiplicative tilts for a given regime.

Parameters¶

regime : MacroRegime Current macro regime. config : RegimeTiltConfig or None Tilt configuration.

Returns¶

dict[FactorGroupType, float] Multiplicative tilt per group. Groups not listed get a tilt of 1.0.

`compute_composite_score(standardized_factors, coverage, config=None, ic_history=None, training_scores=None, training_returns=None, group_weights=None)` ¶

Compute composite score from standardized factors.

Parameters¶

standardized_factors : pd.DataFrame Tickers x factors matrix. coverage : pd.DataFrame Boolean coverage matrix. config : CompositeScoringConfig or None Scoring configuration. ic_history : pd.DataFrame or None Required when config.method is IC_WEIGHTED or ICIR_WEIGHTED. Columns must match group names; each column is treated as the IC time series for that group. training_scores : pd.DataFrame or None Required when config.method is RIDGE_WEIGHTED or GBT_WEIGHTED. Historical tickers x factors matrix used to train the ML model (must not overlap with current-period data). training_returns : pd.Series or None Required when config.method is RIDGE_WEIGHTED or GBT_WEIGHTED. Forward returns aligned with training_scores. group_weights : dict[str, float] or None Pre-computed group weights (e.g. from regime tilts). Threaded through to the inner scoring functions.

Returns¶

pd.Series Composite score per ticker.

`compute_equal_weight_composite(group_scores, config=None, group_weights=None)` ¶

Equal-weight composite with core/supplementary tiering.

Parameters¶

group_scores : pd.DataFrame Tickers x groups matrix. config : CompositeScoringConfig or None Scoring configuration. group_weights : dict[str, float] or None Pre-computed group weights (e.g. from regime tilts). When provided, skip tier-based derivation and use these weights directly.

Returns¶

pd.Series Composite score per ticker.

`compute_group_scores(standardized_factors, coverage)` ¶

Average factor scores within each group.

Parameters¶

standardized_factors : pd.DataFrame Tickers x factors matrix of standardized scores. coverage : pd.DataFrame Boolean matrix of non-NaN coverage.

Returns¶

pd.DataFrame Tickers x groups matrix of group-level scores.

`compute_ic_weighted_composite(group_scores, ic_history, config=None, group_weights=None)` ¶

IC-weighted composite score.

Uses trailing information coefficient history to weight groups.

Parameters¶

group_scores : pd.DataFrame Tickers x groups matrix. ic_history : pd.DataFrame Periods x groups matrix of IC values. config : CompositeScoringConfig or None Scoring configuration. group_weights : dict[str, float] or None Pre-computed group weights (e.g. from regime tilts). When provided, use as tier multipliers instead of config core/supplementary weights.

Returns¶

pd.Series Composite score per ticker.

`compute_icir_weighted_composite(group_scores, ic_series_per_group, config=None, group_weights=None)` ¶

ICIR-weighted composite score.

Weights each group by |ICIR| = |mean(IC) / std(IC)|, normalised to sum to 1. Groups with zero or undefined ICIR receive zero weight. Falls back to equal-weight when all groups have ICIR = 0.

Parameters¶

group_scores : pd.DataFrame Tickers x groups matrix. ic_series_per_group : dict[str, pd.Series] Per-group IC time series. Keys must match group_scores columns. config : CompositeScoringConfig or None Scoring configuration. group_weights : dict[str, float] or None Pre-computed group weights (e.g. from regime tilts). When provided, use as tier multipliers instead of config core/supplementary weights.

Returns¶

pd.Series Composite score per ticker.

`compute_ml_composite(standardized_factors, training_scores, training_returns, config)` ¶

ML composite score using ridge regression or gradient-boosted trees.

Trains the model on historical (training_scores, training_returns) and predicts on the current-period standardized_factors. The prediction is normalised to zero mean and unit variance.

The training window must end strictly before the prediction date to avoid look-ahead bias; callers are responsible for this temporal split.

Parameters¶

standardized_factors : pd.DataFrame Current-period tickers x factors matrix (prediction target). training_scores : pd.DataFrame Historical tickers x factors matrix aligned with training_returns. training_returns : pd.Series Forward return per ticker for the training period. config : CompositeScoringConfig Must have method set to RIDGE_WEIGHTED or GBT_WEIGHTED.

Returns¶

pd.Series Normalised composite score per ticker (zero mean, unit variance).

`apply_sector_balance(selected, scores, sector_labels, parent_universe, tolerance=0.05)` ¶

Adjust selection for sector-proportional representation.

Ensures no sector is over- or under-represented relative to the parent universe by more than tolerance.

Parameters¶

selected : pd.Index Initially selected tickers. scores : pd.Series Composite scores for all candidates. sector_labels : pd.Series Sector label per ticker. parent_universe : pd.Index Full universe for computing target sector weights. tolerance : float Maximum deviation from parent sector weights.

Returns¶

pd.Index Sector-balanced selection.

`compute_selection_turnover(current, new, universe)` ¶

Compute selection turnover as fraction of universe changed.

Parameters¶

current : pd.Index Currently selected tickers. new : pd.Index Newly selected tickers. universe : pd.Index Full investable universe.

Returns¶

float len(added | removed) / len(universe), or 0.0 if universe is empty.

`select_fixed_count(scores, target_count, buffer_fraction=0.1, current_members=None)` ¶

Select top N stocks by composite score with buffer.

Parameters¶

scores : pd.Series Composite scores indexed by ticker. target_count : int Target number of stocks. buffer_fraction : float Buffer as a fraction of target_count. Current members within the buffer zone are retained. current_members : pd.Index or None Tickers currently selected.

Returns¶

pd.Index Selected tickers.

`select_quantile(scores, target_quantile=0.8, exit_quantile=None, current_members=None)` ¶

Select stocks above a quantile threshold.

Parameters¶

scores : pd.Series Composite scores indexed by ticker. target_quantile : float Quantile threshold for entry (0-1). exit_quantile : float or None Quantile threshold for exit (hysteresis). If None, uses target_quantile. current_members : pd.Index or None Currently selected tickers.

Returns¶

pd.Index Selected tickers.

`select_stocks(scores, config=None, current_members=None, sector_labels=None, parent_universe=None, return_turnover=False)` ¶

Select stocks from scored universe.

Parameters¶

scores : pd.Series Composite scores indexed by ticker. config : SelectionConfig or None Selection configuration. current_members : pd.Index or None Currently selected tickers for buffer/hysteresis. sector_labels : pd.Series or None Sector labels for sector balancing. parent_universe : pd.Index or None Full universe for sector weight targets. return_turnover : bool When True, return (selected, turnover) tuple.

Returns¶

pd.Index or tuple[pd.Index, float] Selected tickers, optionally with turnover.

`neutralize_sector(scores, sector_labels, country_labels=None)` ¶

Demean scores within each sector (and optionally country).

Parameters¶

scores : pd.Series Standardized factor scores. sector_labels : pd.Series Sector label per ticker. country_labels : pd.Series or None Country label per ticker for country neutralization.

Returns¶

pd.Series Sector-neutralized scores.

`orthogonalize_factors(factor_scores, method='pca', min_variance_explained=0.95)` ¶

Project factor scores onto orthogonal principal components.

Eliminates multicollinearity among factor scores by projecting them into a lower-dimensional PCA space. Retains the minimum number of components that explain at least min_variance_explained of the total variance.

Parameters¶

factor_scores : pd.DataFrame Tickers × factors matrix of factor scores. method : str Projection method. Only "pca" is supported. min_variance_explained : float Minimum cumulative explained variance ratio for retained components. Must be in (0, 1].

Returns¶

pd.DataFrame Tickers × PCs matrix with columns named PC1, PC2, .... Rows with NaN in the input are filled with NaN in the output but otherwise preserve the original index.

Raises¶

ConfigurationError If method is not "pca". DataError If fewer than 2 factors or fewer than 2 non-NaN observations.

`rank_normal_standardize(scores)` ¶

Rank-normal (inverse normal) standardization.

Uses Phi^-1((rank - 0.5) / N) to map ranks to a normal distribution, robust to heavy-tailed distributions.

Parameters¶

scores : pd.Series Factor scores (may contain NaN).

Returns¶

pd.Series Rank-normalized scores.

`standardize_all_factors(raw_factors, config=None, sector_labels=None, country_labels=None)` ¶

Standardize all factors and compute coverage.

Parameters¶

raw_factors : pd.DataFrame Tickers x factors matrix of raw values. config : StandardizationConfig or None Standardization parameters. sector_labels : pd.Series or None Sector labels for neutralization. country_labels : pd.Series or None Country labels for neutralization.

Returns¶

tuple[pd.DataFrame, pd.DataFrame] (standardized_scores, coverage) where coverage is a boolean DataFrame indicating non-NaN values.

`standardize_factor(raw_scores, config=None, sector_labels=None, country_labels=None)` ¶

Full standardization pipeline for a single factor.

Parameters¶

raw_scores : pd.Series Raw factor values. config : StandardizationConfig or None Standardization parameters. sector_labels : pd.Series or None Sector labels for neutralization. country_labels : pd.Series or None Country labels for neutralization.

Returns¶

pd.Series Standardized factor scores.

`winsorize_cross_section(scores, lower_pct=0.01, upper_pct=0.99)` ¶

Clip scores at percentile boundaries.

Parameters¶

scores : pd.Series Raw factor scores. lower_pct : float Lower percentile (0-1). upper_pct : float Upper percentile (0-1).

Returns¶

pd.Series Winsorized scores.

`z_score_standardize(scores)` ¶

Z-score standardization: (x - mean) / std.

Parameters¶

scores : pd.Series Factor scores (may contain NaN).

Returns¶

pd.Series Standardized scores with mean 0 and std 1.

`benjamini_hochberg(p_values, alpha=0.05)` ¶

Benjamini-Hochberg FDR correction.

Parameters¶

p_values : pd.Series Raw p-values indexed by factor name. alpha : float FDR significance level.

Returns¶

pd.Series Boolean series indicating significant factors.

`compute_ic_series(factor_scores_history, returns_history, factor_name)` ¶

Compute IC time series for a factor.

Parameters¶

factor_scores_history : pd.DataFrame Dates x tickers matrix of factor scores. returns_history : pd.DataFrame Dates x tickers matrix of forward returns. factor_name : str Used only for labeling.

Returns¶

pd.Series IC values indexed by date.

`compute_ic_stats(ic_series, lags=5)` ¶

Compute full IC statistics including Newey-West t-stat and ICIR.

Parameters¶

ic_series : pd.Series Time series of IC values (one per cross-section date). lags : int Number of lags for Newey-West HAC standard errors.

Returns¶

ICStats Dataclass containing mean, variance_nw, t_stat_nw, p_value, and icir.

`compute_icir(ic_series)` ¶

Compute the IC Information Ratio (mean IC / std IC).

ICIR penalises factors with high average IC but also high IC volatility (inconsistent predictors). Use this as the weighting signal in ICIR-weighted composite scoring.

Parameters¶

ic_series : pd.Series Time series of IC values (one per cross-section date).

Returns¶

float ICIR value, or 0.0 if std(IC) == 0 or fewer than 2 non-NaN observations.

`compute_monthly_ic(factor_scores, forward_returns)` ¶

Compute rank information coefficient (Spearman correlation).

Parameters¶

factor_scores : pd.Series Cross-sectional factor scores. forward_returns : pd.Series Forward returns for the same tickers.

Returns¶

float Rank IC (Spearman correlation).

`compute_newey_west_tstat(ic_series, n_lags=6)` ¶

Compute Newey-West t-statistic for IC significance.

Parameters¶

ic_series : pd.Series Time series of IC values. n_lags : int Number of lags for HAC standard errors.

Returns¶

tuple[float, float] (t_statistic, p_value).

`compute_quantile_spread(factor_scores, forward_returns, n_quantiles=5)` ¶

Compute long-short quantile spread return.

Parameters¶

factor_scores : pd.Series Cross-sectional factor scores. forward_returns : pd.Series Forward returns. n_quantiles : int Number of quantile buckets.

Returns¶

float Top quantile return minus bottom quantile return.

`compute_vif(factor_matrix)` ¶

Compute variance inflation factors for multicollinearity.

Parameters¶

factor_matrix : pd.DataFrame Tickers x factors matrix (no NaN). Must contain at least 2 factors.

Returns¶

pd.Series VIF per factor. Values are ≥ 1.0 by construction.

Raises¶

ValueError If fewer than 2 factor columns are provided.

`correct_pvalues(p_values, alpha=0.05)` ¶

Apply Holm-Bonferroni and Benjamini-Hochberg multiple testing corrections.

Parameters¶

p_values : ndarray, shape (m,) Raw p-values in any order. alpha : float Significance level used to compute the adjustments (does not filter here; callers compare adjusted p-values against alpha).

Returns¶

CorrectedPValues holm — FWER-controlling Holm-Bonferroni adjusted p-values. bh — FDR-controlling Benjamini-Hochberg adjusted p-values. Both arrays are returned in the same order as the input.

`run_factor_validation(factor_scores_history, returns_history, config=None)` ¶

Run complete factor validation suite.

Parameters¶

factor_scores_history : dict[str, pd.DataFrame] Factor name -> (dates x tickers) score history. returns_history : pd.DataFrame Dates x tickers forward return matrix. config : FactorValidationConfig or None Validation parameters.

Returns¶

FactorValidationReport Complete validation results.

`validate_factor_universe(ic_matrix, lags=5, alpha=0.05)` ¶

Validate all factors simultaneously with multiple testing correction.

Parameters¶

ic_matrix : pd.DataFrame Dates × factors matrix of IC values (one IC per period per factor). lags : int Number of Newey-West HAC lags. alpha : float Significance level for both FWER and FDR rejection decisions.

Returns¶

pd.DataFrame Factor × statistic summary with columns: ic_mean, icir, t_stat_nw, p_value_raw, p_value_holm, p_value_bh, significant_holm, significant_bh.

factors¶

optimizer.factors ¶

CompositeMethod ¶

CompositeScoringConfig dataclass ¶

Parameters¶

for_equal_weight() classmethod ¶

for_ic_weighted() classmethod ¶

for_icir_weighted() classmethod ¶

for_ridge_weighted() classmethod ¶

for_gbt_weighted() classmethod ¶

FactorConstructionConfig dataclass ¶

Parameters¶

for_core_factors() classmethod ¶

for_all_factors() classmethod ¶

FactorGroupType ¶

FactorIntegrationConfig dataclass ¶

Parameters¶

for_linear_mapping() classmethod ¶

for_black_litterman() classmethod ¶

FactorType ¶

FactorValidationConfig dataclass ¶

Parameters¶

for_strict() classmethod ¶

for_standard() classmethod ¶

GroupWeight ¶

MacroRegime ¶

PublicationLagConfig dataclass ¶

Parameters¶

uniform(days) classmethod ¶

RegimeTiltConfig dataclass ¶

Parameters¶

for_moderate_tilts() classmethod ¶

for_no_tilts() classmethod ¶

SelectionConfig dataclass ¶

Parameters¶

for_top_100() classmethod ¶

for_top_quintile() classmethod ¶

for_concentrated() classmethod ¶

SelectionMethod ¶

StandardizationConfig dataclass ¶

Parameters¶

for_heavy_tailed() classmethod ¶

for_normal() classmethod ¶

StandardizationMethod ¶

FactorPCAResult dataclass ¶

Attributes¶

FactorExposureConstraints dataclass ¶

Parameters¶

NetAlphaResult dataclass ¶

Attributes¶

QuintileSpreadResult dataclass ¶

Attributes¶

FactorOOSConfig dataclass ¶

Parameters¶

FactorOOSResult dataclass ¶

Attributes¶

CorrectedPValues dataclass ¶

Attributes¶

FactorValidationReport dataclass ¶

ICResult dataclass ¶

ICStats dataclass ¶

Attributes¶

QuantileSpreadResult dataclass ¶

compute_gross_alpha(net_alpha, avg_turnover, cost_bps=10.0) ¶

Parameters¶

Returns¶

factor_scores_to_expected_returns(scores, betas, factor_premiums, risk_free_rate=0.0) ¶

Parameters¶

Returns¶

Examples¶

align_to_pit(data, period_date_col, as_of_date, lag_days, ticker_col='ticker') ¶

Parameters¶

Returns¶

compute_all_factors(fundamentals, price_history, volume_history=None, analyst_data=None, insider_data=None, config=None) ¶

Parameters¶

Returns¶

compute_factor(factor_type, fundamentals, price_history, volume_history=None, analyst_data=None, insider_data=None, config=None) ¶

Parameters¶

Returns¶

check_survivorship_bias(returns, final_periods=12, zero_threshold=1e-10) ¶

`optimizer.factors` ¶

`CompositeMethod` ¶

`CompositeScoringConfig` `dataclass` ¶

`for_equal_weight()` `classmethod` ¶

`for_ic_weighted()` `classmethod` ¶

`for_icir_weighted()` `classmethod` ¶

`for_ridge_weighted()` `classmethod` ¶

`for_gbt_weighted()` `classmethod` ¶

`FactorConstructionConfig` `dataclass` ¶

`for_core_factors()` `classmethod` ¶

`for_all_factors()` `classmethod` ¶

`FactorGroupType` ¶

`FactorIntegrationConfig` `dataclass` ¶

`for_linear_mapping()` `classmethod` ¶

`for_black_litterman()` `classmethod` ¶

`FactorType` ¶

`FactorValidationConfig` `dataclass` ¶

`for_strict()` `classmethod` ¶

`for_standard()` `classmethod` ¶

`GroupWeight` ¶

`MacroRegime` ¶

`PublicationLagConfig` `dataclass` ¶

`uniform(days)` `classmethod` ¶

`RegimeTiltConfig` `dataclass` ¶

`for_moderate_tilts()` `classmethod` ¶

`for_no_tilts()` `classmethod` ¶

`SelectionConfig` `dataclass` ¶

`for_top_100()` `classmethod` ¶

`for_top_quintile()` `classmethod` ¶

`for_concentrated()` `classmethod` ¶

`SelectionMethod` ¶

`StandardizationConfig` `dataclass` ¶

`for_heavy_tailed()` `classmethod` ¶

`for_normal()` `classmethod` ¶

`StandardizationMethod` ¶

`FactorPCAResult` `dataclass` ¶

`FactorExposureConstraints` `dataclass` ¶

`NetAlphaResult` `dataclass` ¶

`QuintileSpreadResult` `dataclass` ¶

`FactorOOSConfig` `dataclass` ¶

`FactorOOSResult` `dataclass` ¶

`CorrectedPValues` `dataclass` ¶

`FactorValidationReport` `dataclass` ¶

`ICResult` `dataclass` ¶

`ICStats` `dataclass` ¶

`QuantileSpreadResult` `dataclass` ¶

`compute_gross_alpha(net_alpha, avg_turnover, cost_bps=10.0)` ¶

`factor_scores_to_expected_returns(scores, betas, factor_premiums, risk_free_rate=0.0)` ¶

`align_to_pit(data, period_date_col, as_of_date, lag_days, ticker_col='ticker')` ¶

`compute_all_factors(fundamentals, price_history, volume_history=None, analyst_data=None, insider_data=None, config=None)` ¶

`compute_factor(factor_type, fundamentals, price_history, volume_history=None, analyst_data=None, insider_data=None, config=None)` ¶

`check_survivorship_bias(returns, final_periods=12, zero_threshold=1e-10)` ¶

`compute_factor_pca(scores, n_components=None)` ¶

`flag_redundant_factors(scores, vif_threshold=10.0)` ¶

`build_factor_bl_views(factor_scores, factor_premia, selected_tickers)` ¶

`build_factor_exposure_constraints(factor_scores, bounds)` ¶

`compute_net_alpha(ic_series, weights_history, cost_bps=10.0, annualisation=252)` ¶

`estimate_factor_premia(factor_mimicking_returns)` ¶

`build_factor_mimicking_portfolios(scores, returns, quantile=0.3, weighting='equal', beta_neutral=False, market_returns=None)` ¶

`compute_cross_factor_correlation(factor_returns)` ¶

`compute_quintile_spread(scores, returns, n_quantiles=5)` ¶

`fit_gbt_composite(scores, forward_returns, max_depth=3, n_estimators=50)` ¶

`fit_ridge_composite(scores, forward_returns, alpha=1.0)` ¶

`predict_composite_scores(model, scores)` ¶

`run_factor_oos_validation(scores, returns, config=None, cpcv_config=None)` ¶

`apply_regime_tilts(group_weights, regime, config=None)` ¶

`classify_regime(macro_data)` ¶

`get_regime_tilts(regime, config=None)` ¶

`compute_composite_score(standardized_factors, coverage, config=None, ic_history=None, training_scores=None, training_returns=None, group_weights=None)` ¶

`compute_equal_weight_composite(group_scores, config=None, group_weights=None)` ¶

`compute_group_scores(standardized_factors, coverage)` ¶

`compute_ic_weighted_composite(group_scores, ic_history, config=None, group_weights=None)` ¶

`compute_icir_weighted_composite(group_scores, ic_series_per_group, config=None, group_weights=None)` ¶