ResultSet: Retrieving column values by index versus retrieving by label
When using JDBC, I often come across constructs like
ResultSet rs = ps.executeQuery();
while (rs.next()) {
int id = rs.getInt(1);
// Some other actions
}
I asked myself (and authors of code too) why not to use labels for retrieving column values:
int id = rs.getInt("CUSTOMER_ID");
The best explanation I've heard is something concerning performance. But actually, does it make processing extremely fast? I don't believe so, though I have never performed measurements. Even if retrieving by label would be a bit slower, nevertheless, it provide better readability and flexibility, in my opinion.
So could someone give me good explanation of avoiding to retrieve column values by column index instead of column label? What are pros and cons of both approaches (maybe, concerning certain DBMS)?
Asked by: Stuart927 | Posted: 21-01-2022
Answer 1
Warning: I'm going to get bombastic here, because this drives me crazy.
99%* of the time, it's a ridiculous micro-optimization that people have some vague idea makes things 'better'. This completely ignores the fact that, unless you're in an extremely tight and busy loop over millions of SQL results all the time, which is hopefully rare, you'll never notice it. For everyone who's not doing that, the developer time cost of maintaing, updating, and fixing bugs in the column indexing are far greater than the incremental cost of hardware for your infinitesimally-worse-performing application.
Don't code optimizations like this in. Code for the person maintaining it. Then observe, measure, analyse, and optimize. Observe again, measure again, analyse again, and optimize again.
Optimization is pretty much the last step in development, not the first.
* Figure is made up.
Answered by: Audrey730 | Posted: 22-02-2022Answer 2
You should use string labels by default.
Pros:
- Independence of column order
- Better readability/maintainability
Cons:
- You have no control over the column names (access via stored procedures)
Which would you prefer?
ints?
int i = 1;
customerId = resultSet.getInt(i++);
customerName = resultSet.getString(i++);
customerAddress = resultSet.getString(i++);
or Strings?
customerId = resultSet.getInt("customer_id");
customerName = resultSet.getString("customer_name");
customerAddress = resultSet.getString("customer_address");
And what if there is a new column inserted at position 1? Which code would you prefer? Or if the order of the columns is changed, which code version would you need to change at all?
That's why you should use string labels by default.
Answered by: Maddie218 | Posted: 22-02-2022Answer 3
The answer has been accepted, none-the-less, here is some additional information and personal experience that I have not seen put forward yet.
Use column names (constants and not literals is preferred) in general and if possible. This is both clearer, is easier to maintain, and future changes are less likely to break the code.
There is, however, a use for column indexes. In some cases these are faster, but not sufficiently that this should override the above reasons for names*. These are very valuable when developing tools and general methods dealing with ResultSet
s. Finally, an index may be required because the column does not have a name (such as an unnamed aggregate) or there are duplicate names so there is no easy way to reference both.
*Note that I have written some JDBC drivers and looked inside some open sources one and internally these use column indexes to reference the result columns. In all cases I have worked with, the internal driver first maps a column name to an index. Thus, you can easily see that the column name, in all those cases, would always take longer. This may not be true for all drivers though.
Answered by: Ryan758 | Posted: 22-02-2022Answer 4
From the java documentation:
The ResultSet interface provides getter methods (getBoolean, getLong, and so on) for retrieving column values from the current row. Values can be retrieved using either the index number of the column or the name of the column. In general, using the column index will be more efficient. Columns are numbered from 1. For maximum portability, result set columns within each row should be read in left-to-right order, and each column should be read only once.
Of course each method (named or indexed) has its place. I agree that named columns should be the default. However, in cases where a huge number of loops are required, and where the SELECT statement is defined and maintained in the same section of code (or class), indexes should be ok - it is advisable to list the columns being selected, not just "SELECT * FROM...", since any table change will break the code.
Answered by: Ada820 | Posted: 22-02-2022Answer 5
I did some performance profiling on this exact subject on an Oracle database. In our code we have a ResultSet with numerous colums and a huge number of rows. Of the 20 seconds (!) the request takes to execute method oracle.jdbc.driver.ScrollableResultSet.findColumn(String name) takes about 4 seconds.
Obviously there's something wrong with the overall design, but using indexes instead of the column names would probably take this 4 seconds away.
Answered by: Max721 | Posted: 22-02-2022Answer 6
Sure, using column names increases readability and makes maintenance easy. But using column names has a flipside. As you know, SQL allows multiple column names with same name, there's no guarantee that the column name you typed in the getter method of resultSet actually points to the column name you intend to access. In theory, using index numbers instead of column names is preferred, but it reduces the readability.
Answered by: Ryan454 | Posted: 22-02-2022Answer 7
You can have the best of both! The speed of using indexes with the maintainability and security of using column names.
First - unless you are looping thru a result set just use column names.
Define a set of integer variables, one for each column you will access. The names of the variables can include the name of the column: e.g. iLast_Name.
Before the result set loop iterate thru the column metadata and set the value of each integer variable to the column index of the corresponding column name. If the index of the 'Last_Name' column is 3 then set the value of 'iLast_Name' to 3.
In the result set loop use the integer variable names in the GET/SET methods. The variable name is a visual clue to the developer/maintainer as to the actual column name being accessed but the value is the column index and will give the best performance.
NOTE: the initial mapping (i.e. column name to index mapping) is only done once before the loop rather than for every record and column in the loop.
Answered by: Kellan214 | Posted: 22-02-2022Answer 8
I don't think using the labels impacts performance by much. But there is another reason not to use String
s. Or int
s, for that matter.
Consider using constants. Using an int
constant makes the code more readably, but also less likely to have errors.
Besides being more readable, the constant also prevents you from making typo's in the label names - the compiler will throw an error if you do. And any IDE worth anything will pick it up. This is not the case if you use String
s or ints
.
Answer 9
The JDBC driver takes care for the column to index look-up. So if you extract values by column name each time the driver makes a look-up (usually in hash map) to check the corresponding index for the column name.
Answered by: Gianna626 | Posted: 22-02-2022Answer 10
I agree with previous answers that performance is not something that can force us to select either of the approaches. It would be good to consider the following things instead:
- Code readability: for every developer reading your code labels have much more sense than indexes.
- Maintenance: think of the SQL query and the way it is maintained. What is more likely to happen in your case after fixing/improving/refactoring SQL query: changing the order of the columns extracted or changing result column names. It seems for me that changing the order of the columns extracted (as the results of adding/deleting new columns in result set) has greater probability to happen.
- Encapsulation: in spite of the way you choose try to isolate the code where you run SQL query and parse result set in the same component and make only this component aware about the column names and their mapping to the indexes (if you decided to use them).
Answer 11
Using the index is an attempt at optimization.
The time saved by this is wasted by the extra effort it takes the developer to look up the necessary data to check if their code will work properly after the changes.
I think it's our built-in instinct to use numbers instead of text.
Answered by: John961 | Posted: 22-02-2022Answer 12
Besides the look up in Map for labels it also leads to an extra String creation. Though it will happens on stack but still it caries a cost with it.
It all depends on the individual choice and till date I have used only indexes :-)
Answered by: Hailey257 | Posted: 22-02-2022Answer 13
As it is pointed out by other posters, I would stick to column names unless you have a really powerful reason not to do so. The impact in performance is negligible compared to, for example, query optimization. In this case, maintenance is much more important than an small optmization.
Answered by: David535 | Posted: 22-02-2022Similar questions
sql - Java ResultSet: Does the DB only update if updateRow is called?
In java.sql.ResulSet, can the updateObject, updateString, update[Type] methods update the underlying database without a call to updateRow()?
Thanks,
ktm
java - ResultSet: Get RowId (index in jdbc) from the inserted row
Lets see the list of commands for the abstract database (JDBC)
rs.moveToInsertRow();
rs.updateString(1, "AINSWORTH");
rs.updateInt(2,35);
rs.updateBoolean(3, true);
rs.insertRow();
Ok, after that I expected that
rs.getRow(); //Retrieves the current row number.
returns new row id for this row, but I got null.
Read documnetation:
java - ResultSet: Values are not able to insert to HashMap
I have a function, which sends a query to my db and saves the result into a resultset. Another function should take the values of the resultset and put it in a hashmap. if i debug my code, the function which gets the data of db, returns values within the resultset, but if i give that resultset as an parameter to the hashmap function, it says that the resultset would be empty. I am sitting on this problem since yesterday an...
Still can't find your answer? Check out these amazing Java communities for help...
Java Reddit Community | Java Help Reddit Community | Dev.to Java Community | Java Discord | Java Programmers (Facebook) | Java developers (Facebook)