Evidence suggests the calibration of hypothetical and actual behavior is good-specific. We examine whether clustering commodities into mutual categories can reduce the burden. While we reject a common calibration across sets of commodities, a sport-specific calibration function cannot be rejected.