Best way to select out of millions of rows in an Oracle DB

G'day!

I have one million different words which I'd like to query for in a table with 15 million rows. The result of synonyms together with the word is getting processed after each query.

table looks like this:

    synonym      word
    ---------------------
    ancient      old
    anile        old
    centenarian  old
    darkened     old
    distant      far
    remote       far
    calm         gentle
    quite        gentle

This is how it is done in Java currently:

....
PreparedStatement stmt;
ResultSet wordList;
ResultSet syns;
...

stmt = conn.prepareStatement("select distinct word from table");
wordList = stmt.executeQuery();

while (wordList.next()) {
    stmt = conn.prepareStatement("select synonym from table where word=?");
    stmt.setString(1, wordList.getString(1));
    syns = stmt.executeQuery();

    process(syns, wordList.getString(1));
}
...

This is incredible slow. What's the fastest way to do stuff like this?

Cheers, Chris


Asked by: Haris337 | Posted: 23-01-2022






Answer 1

  1. Ensure that there is an index on the 'word' column.

  2. Move the second prepareStatement outside the word loop. Each time you create a new statement, the database compiles and optimizes the query - but in this case the query is the same, so this is unnecessary.

  3. Combine the statements as sblundy above has done.

Answered by: Jared564 | Posted: 24-02-2022



Answer 2

Two ideas:

a) How about making it one query:

select synonym from table where word in (select distinct word from table)

b) Or, if you process method needs to deal with them as a set of synonyms of one word, why not sort them by word and start process anew each time word is different? That query would be:

select word, synonym 
from table 
order by word

Answered by: Julian404 | Posted: 24-02-2022



Answer 3

Why are you querying the synonyms inside the loop if you're querying all of them anyway? You should use a single select word, synonym from table order by word, and then split by words in the Java code.

Answered by: Miranda356 | Posted: 24-02-2022



Answer 4

PreparedStatement stmt;
ResultSet syns;
...

stmt = conn.prepareStatement("select distinct " + 
                             "  sy.synonm " + 
                             "from " +
                             "  table sy " +
                             "  table wd " +
                             "where sy.word = wd.word");
syns = stmt.executeQuery();
process(syns);

Answered by: Lenny626 | Posted: 24-02-2022



Answer 5

related but unrelated:

while (wordList.next()) {
    stmt = conn.prepareStatement("select synonym from table where word=?");
    stmt.setString(1, wordList.getString(1));
    syns = stmt.executeQuery();

    process(syns, wordList.getString(1));
}

You should move that preparestatement call outside the loop:

stmt = conn.prepareStatement("select synonym from table where word=?");
while (wordList.next()) {
    stmt.setString(1, wordList.getString(1));
    syns = stmt.executeQuery();

    process(syns, wordList.getString(1));
}

The whole point of preparing a statement is for the db to compile/cache/etc because you're going to use the statement repeatedly. You also may need to clean up your result sets explicitly if you're going to do that many queries, to ensure that you don't run out of cursors.

Answered by: Carina757 | Posted: 24-02-2022



Answer 6

You should also consider utilizing the statement object's setFetchSize method to reduce the context switches between your application and the database. If you know you are going to process a million records, you should use setFetchSize(someRelativelyHighNumberLike1000). This tells java to grab up to 1000 records each time it needs more from Oracle [instead of grabbing them one at a time, which is a worst-case-scenario for this kind of batch processing operation]. This will improve the speed of your program. You should also consider refactoring and doing batch processing of your word/synonyms, as

  1. fetch 1
  2. process 1
  3. repeat

is slower than

  1. fetch 50/100/1000
  2. process 50/100/1000
  3. repeat

just hold the 50/100/1000 [or however many you retrieve at once] in some array structure until you process them.

Answered by: Julia459 | Posted: 24-02-2022



Answer 7

The problem is solved. The important point is, that the table can be sorted by word. Therefore, I can easily iterate through the whole table. Like this:

....
Statement stmt;
ResultSet rs;
String currentWord;
HashSet<String> syns = new HashSet<String>();
...

stmt = conn.createStatement();
rs = stmt.executeQuery(select word, synonym from table order by word);

rs.next();
currentWord = rs.getString(1);
syns.add(rs.getString(2));

while (rs.next()) {
    if (rs.getString(1) != currentWord) {
        process(syns, currentWord);
        syns.clear();
        currentWord = rs.getString(1);
    }
    syns.add(rs.getString(2));
}
...

Answered by: Walter108 | Posted: 24-02-2022



Similar questions

java - Select 100+ millions of rows in HSQLDB

I have to iterate through a table with 100+ millions of records with JDBC on a HSQLDB database and I don't manage to do it in a reasonable time. I use hsqldb v2.4.0. I tried to use a PreparedStatement with the following query to slice the data: String select = "SELECT ID, NAME, VALUE FROM MY_TABLE ORDER BY ID OFFSET ? ROWS FETCH ? ROWS ONLY"; The problem is that it tak...


swing - What is the best way to display millions of images in Java?

You see that ? Each brick of each house is an image of 16x16 pixels. What you can see here a version based on simple JavaFX, with some Imageview moved on X and Y to give the effect of "construction". I just adapt this to Swing using paintComponent. The problem: - ...


java - WildCARD search using lucene in a large file containing 100 millions Strings taking too much time. i want the result in 1-2 seconds

i have a file size 1.43 gb. the file contains 100 millions strings ( 3 - 80 characters length) separated line by line in the file. i am doing WILDCARD search on the file using lucene. right now i am creating one document for each string. i want total count of the search keyword (searchkeyword).here is my code lucene.demo.java public class LuceneDemo {


java - Move millions of records from sql server to oracle everyday

Closed. This question is off-topic. It is not curre...


java - JPA query calling getters millions of times

I've been profiling my code to try to understand what the bottleneck is and have come across some rather odd behaviour when running a db query and I just can't explain it: To set the scene: class Edge { int from; int to; } class Foo { ... //50 attributes } class Bar { ... //50 attributes } class EdgeDAO extends JpaDaoSupport { List&lt;Edge&gt; getEdgesFrom(int from) { ...


best way to read file which consist about several millions lines in java

I have a file that consist about several millions lines. I need to read it multithreaded as fast as it possible. And which line I need to be sent via http request. May be I should split that file on others smaller files and read. I need some ideas.


java - Sorted Key value pairs for millions of entries

TreeMap is an efficient data structure to keep key value pairs in a sorted order. In my scenario I have around 200 millions of key value pairs to manage through java program. I could see observe my TreeMap performance visibly slow when more than 12 millions key values are supplied to it. What is the correct way harness the TreeMap potential. Is this the TreeMap limitations. Even If I manage to put all the key value pairs h...


Reading a csv file with millions of row via java as fast as possible

I want to read a csv files including millions of rows and use the attributes for my decision Tree algorithm. My code is below: String csvFile = "myfile.csv"; List&lt;String[]&gt; rowList = new ArrayList(); String line = ""; String cvsSplitBy = ","; String encoding = "UTF-8"; BufferedReader br2 = null; try { int counterRow = 0; br2 = new BufferedReader(new InputStreamReader(new FileInputStream(csvF...


java - Append 1 millions rows and specify cell position using Google Sheets API v4

We are using Google Sheets API V4. We are looking to add 1 millions rows as from v4 we have support to write 2 million cells in Spread sheet. So, we try to add 80,000 row with 6 columns. 80,000 * 6 = 480000 Cells but we got following error. { "code": 400, "errors": [ { "domain": "global", "message": "Invalid requests[0].appendCells: This action would increase the number of cells in the ...


java - Parsing double numbers and millions

I have a file that represents numbers of millions like this 29.879.999 and double numbers like this 28.09. I am trying to parse them using NumberFormat and convert them all(integer and double) to double. Here is what i wrote: for (String[] entry : data) { NumberFormat nf = NumberFormat.getInstance(); Number value = nf.parse(entry[4]).doubleValue(); System....


java - Fastest way to compare millions of rows in one table with millions of rows in another

Closed. This question needs to be more focused. It ...






Still can't find your answer? Check out these amazing Java communities for help...



Java Reddit Community | Java Help Reddit Community | Dev.to Java Community | Java Discord | Java Programmers (Facebook) | Java developers (Facebook)



top