database - How to force SQLite to use an index when joining a temporary table instead of scanning a large table? - Stack Overflo

admin2025-04-15  0

I have a SQLite database with a table containing a billion rows and a temporary table used for row selection. The tables are created as follows:

CREATE TABLE [obj2](
  [id] integer PRIMARY KEY ASC,
  [name] text,
  [height] integer,
  [identification] text,
  [width] integer
);

CREATE TEMPORARY TABLE [__tmp_selection] (id INTEGER PRIMARY KEY);

If I use the IN clause with specific row IDs, SQLite efficiently uses the index on obj2.id:

EXPLAIN QUERY PLAN
SELECT obj2.* FROM obj2 WHERE rowid IN (10,20,30,40,50);

Query Plan Output:

id parent notused detail
2 0 0 SEARCH obj2 USING INTEGER PRIMARY KEY (rowid=?)

I have a SQLite database with a table containing a billion rows and a temporary table used for row selection. The tables are created as follows:

CREATE TABLE [obj2](
  [id] integer PRIMARY KEY ASC,
  [name] text,
  [height] integer,
  [identification] text,
  [width] integer
);

CREATE TEMPORARY TABLE [__tmp_selection] (id INTEGER PRIMARY KEY);

If I use the IN clause with specific row IDs, SQLite efficiently uses the index on obj2.id:

EXPLAIN QUERY PLAN
SELECT obj2.* FROM obj2 WHERE rowid IN (10,20,30,40,50);

Query Plan Output:

id parent notused detail
2 0 0 SEARCH obj2 USING INTEGER PRIMARY KEY (rowid=?)

This shows that SQLite properly uses the index on obj2.id.


However, when I insert the same IDs into the temporary table and join it with obj2, SQLite instead performs a full table scan on obj2, which is inefficient:

INSERT INTO __tmp_selection(id) VALUES (10);
INSERT INTO __tmp_selection(id) VALUES (20);
INSERT INTO __tmp_selection(id) VALUES (30);
INSERT INTO __tmp_selection(id) VALUES (40);
INSERT INTO __tmp_selection(id) VALUES (50);

EXPLAIN QUERY PLAN
SELECT obj2.* FROM obj2
JOIN __tmp_selection ON obj2.id = __tmp_selection.id;

Query Plan Output:

id parent notused detail
3 0 0 SCAN obj2
5 0 0 SEARCH __tmp_selection USING INTEGER PRIMARY KEY (rowid=?)

This suggests that SQLite is scanning the entire obj2 table, even though the join should be able to use the index on obj2.id.


How can I force SQLite to scan __tmp_selection (which contains only a few rows) instead of performing a full table scan on obj2?

Share Improve this question edited Mar 18 at 13:21 jp_m0rgh 172 bronze badges asked Mar 13 at 17:39 zdenekzdenek 24.6k2 gold badges18 silver badges35 bronze badges 4
  • 1 What if you do WHERE obj2.id IN (SELECT id FROM __tmp_selection)? – Barmar Commented Mar 13 at 17:45
  • Btw, I suppose you've meant id for rowid in rowid IN (10,20,... ..? – Barbaros Özhan Commented Mar 13 at 17:52
  • IN with subquery looks promising. Thanks! I still need to test whether this approach can handle tens of thousands of rows in the selection table. – zdenek Commented Mar 13 at 18:02
  • In this case, id and rowid should be aliases. – zdenek Commented Mar 13 at 18:04
Add a comment  | 

1 Answer 1

Reset to default 2

Reverse the order of the tables and use a CROSS JOIN:

sqlite> EXPLAIN QUERY PLAN
   ...> SELECT obj2.* FROM __tmp_selection
   ...> CROSS JOIN obj2 ON obj2.id = __tmp_selection.id;
QUERY PLAN
|--SCAN __tmp_selection
`--SEARCH obj2 USING INTEGER PRIMARY KEY (rowid=?)

Using a CROSS JOIN instead of JOIN forces the query planner to always use that order of tables.

or, better option, run ANALYZE on the tables to give the query planner more information about the costs of operations on them so it can pick a more efficient order itself; see also PRAGMA optimize:

sqlite> ANALYZE __tmp_selection;
sqlite> ANALYZE obj2; -- assuming this table is populated
sqlite> EXPLAIN QUERY PLAN
   ...> SELECT obj2.* FROM obj2
   ...> JOIN __tmp_selection ON obj2.id = __tmp_selection.id;
QUERY PLAN
|--SCAN __tmp_selection
`--SEARCH obj2 USING INTEGER PRIMARY KEY (rowid=?)

(All testing done with Sqlite 3.49.1)

转载请注明原文地址:http://conceptsofalgorithm.com/Algorithm/1744687331a261218.html

最新回复(0)