Fine-tuning TabICL: when 30 epochs of GPU time buys you 0.3 pp

TabICL exposes a built-in fine-tuning pipeline via FinetunedTabICLClassifier. On five real-world classification datasets, I compared zero-shot TabICL against fine-tuned TabICL (30 epochs, early stopping, validation-driven hyperparameter selection). The result: fine-tuning helps on some datasets, hurts on others, and never moves AUC by more than ±0.7 pp. On telco-churn it is consistently beneficial (+0.16 to +0.59 pp). On cc-fraud it is completely flat — zero-shot is already near-perfect. The only consistent signal is that fine-tuning with too little data or the wrong seed can degrade performance.

May 18, 2026 · 13 min · Maxime Guerreiro

Fine-tuning TabICL: when 30 epochs of GPU time buys you 0.3 pp

TabICL exposes a built-in fine-tuning pipeline via FinetunedTabICLClassifier. On five real-world classification datasets, I compared zero-shot TabICL against fine-tuned TabICL (30 epochs, early stopping, validation-driven hyperparameter selection). The result: fine-tuning helps on some datasets, hurts on others, and never moves AUC by more than ±0.7 pp. On telco-churn it is consistently beneficial (+0.16 to +0.59 pp). On cc-fraud it is completely flat — zero-shot is already near-perfect. The only consistent signal is that fine-tuning with too little data or the wrong seed can degrade performance.

May 18, 2026 · 8 min · Maxime Guerreiro

Agent architecture: where the work runs

Hermes Agent orchestrates two persistent machines — a free-tier ARM64 VPS and a custom x86-64 workstation — to run Rust and PyTorch workloads without sandbox churn.

May 18, 2026 · 6 min · Maxime Guerreiro

Agent architecture: where the work runs

Hermes Agent orchestrates two persistent machines — a free-tier ARM64 VPS and a custom x86-64 workstation — to run Rust and PyTorch workloads without sandbox churn.

May 18, 2026 · 5 min · Maxime Guerreiro

When stacking works: it depends on which features your models look at

Stacking TabPFN3, TabICL, and XGBoost provides at most +0.5 pp AUC on most tabular datasets. But on heavily imbalanced fraud detection, the ensemble is dramatically more robust. The reason is not model diversity in the abstract—it is concrete feature disagreement. XGBoost and TabPFN disagree strongly on which features matter for fraud (Spearman ρ = 0.24), while they agree closely on every other dataset (ρ = 0.67–0.95). When models look at different features, stacking hedges correlated failure modes. When they look at the same features, stacking is just expensive averaging.

May 17, 2026 · 18 min · Maxime Guerreiro

When stacking works: it depends on which features your models look at

Stacking TabPFN3, TabICL, and XGBoost provides at most +0.5 pp AUC on most tabular datasets. But on heavily imbalanced fraud detection, the ensemble is dramatically more robust. The reason is not model diversity in the abstract—it is concrete feature disagreement. XGBoost and TabPFN disagree strongly on which features matter for fraud (Spearman ρ = 0.24), while they agree closely on every other dataset (ρ = 0.67–0.95). When models look at different features, stacking hedges correlated failure modes. When they look at the same features, stacking is just expensive averaging.

May 17, 2026 · 15 min · Maxime Guerreiro

TabPFN3 vs TabICL: a matched-size fraud-benchmark sweep

PFN wins below 10k rows, ICL catches up by 100k, and PFN degrades beyond 200k. Both are 2× apart in speed because PFN is 2× larger. We also found a clean 21% inference speedup with bfloat16 autocast.

May 16, 2026 · 19 min · Maxime Guerreiro

TabPFN3 vs TabICL: a matched-size fraud-benchmark sweep

PFN wins below 10k rows, ICL catches up by 100k, and PFN degrades beyond 200k. Both are 2× apart in speed because PFN is 2× larger. We also found a clean 21% inference speedup with bfloat16 autocast.

May 16, 2026 · 15 min · Maxime Guerreiro

Inlining Tokio MPSC recv: removing the async tax

Two #[inline] annotations on the innermost recv path improve large-object throughput by 14.7% and medium objects by 11% with no regressions.

May 16, 2026 · 7 min · Maxime Guerreiro

Inlining Tokio MPSC recv: removing the async tax

Two #[inline] annotations on the innermost recv path improve large-object throughput by 14.7% and medium objects by 11% with no regressions.

May 16, 2026 · 3 min · Maxime Guerreiro