After poking around a bit in the synthesis report, it turned out that XST was synthesizing not one, but two multipliers (one signed and one unsigned) and putting a multiplexer on the output, despite me having requested area-optimized synthesis.
This was the relevant Verilog (inside a clocked always block):
if(multiply_is_signed) mdu_product <= $signed(execute_regval_a_buf) *
$signed(execute_regval_b_buf); else mdu_product <= execute_regval_a_buf * execute_regval_b_buf;
I experimented with a lot of different ways to write the same code before finding something that worked:
mdu_product <= multiply_is_signed ? ( $signed(execute_regval_a_buf) * $signed(execute_regval_b_buf) ) : ( execute_regval_a_buf * execute_regval_b_buf );
The second snippet should, by any reasonable reading, turn into the exact same netlist, but instead it produced a single multiplier. (It also absorbs pipeline registers correctly, unlike the first).
My only guess is that XST's multiplier-inference code operates at the level of single assignments and that any if-then statement will turn into a mux.