sql - Insert Overwrite in Hive (Impala) a single file instead of multiple small files - Stack Overflow

admin2025-04-21  1

I'm inserting into the table db.A_full using this query:

INSERT OVERWRITE db.A_full
PARTITION (year, month, day)
SELECT DISTINCT
    A.ID
    , A.doc_type
    , A.city
    , A.gender
    , A.age
    , A.year
    , A.month
    , A.day
FROM db.A_parcial as A

db.A_parcial only contains a single date and it's a relatively small table that should be saved in a single file of 148Mb in db.A_full; but when I look in the clouse SHOW PARTITIONS db.A_full the single partition was saved as 138 files. How can I do the insert avoiding small files creation?

I'm inserting into the table db.A_full using this query:

INSERT OVERWRITE db.A_full
PARTITION (year, month, day)
SELECT DISTINCT
    A.ID
    , A.doc_type
    , A.city
    , A.gender
    , A.age
    , A.year
    , A.month
    , A.day
FROM db.A_parcial as A

db.A_parcial only contains a single date and it's a relatively small table that should be saved in a single file of 148Mb in db.A_full; but when I look in the clouse SHOW PARTITIONS db.A_full the single partition was saved as 138 files. How can I do the insert avoiding small files creation?

Share Improve this question asked Feb 12 at 21:05 Alejandro SalazarAlejandro Salazar 335 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

There is a way to compact an insert by setting some gonfig options:

-- this next command tells impala to use 1 node 
-- it has to be set JUST for the insert

set num_nodes = 1 ;  
set PARQUET_FILE_SIZE = 256m ;  -- this tells impala the filesize of result
 
insert into table my_table as select fields from other_table ;
 
set num_nodes = 0 ;  -- this has to be set immediatelly after the insert
 
compute stats my_table ;  -- this is just good manners

This is an easy way to set an insert to reduce the number of files on an insert. However, letting the set num_nodes = 1 will impact your performance, so it only needs to be set on an insert.

转载请注明原文地址:http://conceptsofalgorithm.com/Algorithm/1745202094a290115.html

最新回复(0)