Fine-tuning TabICL: when 30 epochs of GPU time buys you 0.3 pp
TabICL exposes a built-in fine-tuning pipeline via FinetunedTabICLClassifier. On five real-world classification datasets, I compared zero-shot TabICL against fine-tuned TabICL (30 epochs, early stopping, validation-driven hyperparameter selection). The result: fine-tuning helps on some datasets, hurts on others, and never moves AUC by more than ±0.7 pp. On telco-churn it is consistently beneficial (+0.16 to +0.59 pp). On cc-fraud it is completely flat — zero-shot is already near-perfect. The only consistent signal is that fine-tuning with too little data or the wrong seed can degrade performance.