SQL ORDER BY Column Numbers and Expressions
Background
An issue with specifying column numbers in ORDER BY
gained attention through tom__bo’s article about behavioral changes in prepared statements in 8.0.22.
In the article, the following was mentioned:
Details
For a prepared statement of the form SELECT expr1, expr2, … FROM table ORDER BY ?, passing an integer value N for the parameter no longer causes ordering of the results by the Nth expression in the select list; the results are no longer ordered, as is expected with ORDER BY constant.
The phrase “the results are no longer ordered, as is expected with ORDER BY constant” left me confused, so I decided to investigate.
SQL ORDER BY
In SQL, it’s common to specify column names in ORDER BY
to sort the output.
SELECT * FROM test_table ORDER BY c1;
1,Orange
2,Melon
3,Apple
You can specify column names even if they’re not included in the SELECT list.
SELECT c2 FROM test_table ORDER BY c1;
Orange
Melon
Apple
Furthermore, you can use not just column names but also “expressions”.
SELECT c1%2, c1, c2 FROM test_table ORDER BY c1%2;
0,2,Melon
1,1,Orange
1,3,Apple
Column Number Specification in ORDER BY
And here’s where it gets tricky. Many implementations allow the use of column numbers in ORDER BY
.
This was once included in the SQL-92 standard (though it was later removed), so most implementations support it.
SELECT c1,c2 FROM test_table ORDER BY 1;
// Sorted by c1
1,Orange
2,Melon
3,Apple
While it works, it’s generally deprecated. It causes confusion and seems implementation-wise problematic. To support both column numbers and “expressions”, the “result of expressions” and column numbers must be handled separately.
This means:
SELECT c2 FROM test_table ORDER BY 2;
and
SELECT c2 FROM test_table ORDER BY 1+1;
are different things. The latter sorts by the result of 1+1 (all rows have the same result, so no sorting occurs), which makes sense.
SELECT 1+1, c2 FROM test_table ORDER BY 1+1;
2,Orange
2,Melon
2,Apple
I’ve only tested with PostgreSQL, MySQL, and SQLite3, but I believe most implementations behave similarly.
When you specify a non-existent column number, you get an error, but when interpreted as an expression, no error occurs.
SELECT c2 FROM test_table ORDER BY 4;
Error 1054: Unknown column '4' in 'order clause'
SELECT c2 FROM test_table ORDER BY 4+0;
Orange
Melon
Apple
However, PostgreSQL, MySQL, and SQLite3 behave slightly differently in some cases. When you include non-integer constants (like strings), PostgreSQL gives a “non-integer constant” error, while MySQL and SQLite3 treat it as an expression.
SELECT c2 FROM test_table ORDER BY '1';
ERROR: 42601: non-integer constant in ORDER BY
If you make it interpretable as an expression, PostgreSQL behaves the same way.
SELECT c2 FROM test_table ORDER BY '1'||'';
Orange
Melon
Apple
Based on the behavior, it seems that ORDER BY first determines whether it’s a column number, and if not, it moves to regular expression evaluation (usable elsewhere).
In this case, PostgreSQL probably had users accidentally entering ORDER BY '1'
or ORDER BY 'name'
and wondering “Why isn’t it sorting??”
Someone likely complained, “Nobody would enter this intentionally, so make it an error!” ← Pure speculation
Summary
So, the behavioral change in prepared statements in 8.0.22 makes sense if we consider that “prepared statements” changed from interpreting it as a column number to expression evaluation.
However, I think “the results are no longer ordered, as is expected with ORDER BY constant” is somewhat misleading. I was actually misled by it.
In other words, this isn’t about “column numbers” but rather “fixed values that aren’t interpreted as column numbers” (though directly specifying integers would be interpreted as column numbers).