bigquery external table on bigtable - fetch perf with rowkeys join on smaller bq table - Stack Overflow

admin2025-04-20 0

Scenario:

Bigquery EXTERNAL table on bigtable table (Bigtable table has ~250 million rowkeys)
Bigquery stored table that has 100,000 rowkeys (subset of rowkeys on BT)

When running a query like below, it takes pretty long to get the results of bigquery query, even though we know that querying bigtable with a given rowkey is pretty fast

Query:

SELECT
        bt.rowkey,
        bt.column1,
        ..., 
        bt.column14
FROM
      `bq_dataset_mine.my_bq_small_table_with_rowkeys` sm

JOIN  `bq_dataset_mine.my_bq_external_table_on_bt_table1` bt
  ON
      bt.rowkey = sm.rowkey

Is there a possibility of speeding up this bigquery query that joins a small bigquery table having rowkeys with the larger BQ external table on BT?

Here is what my query plan looks like on BQ

Step details
  READ
  $1:rowkey, $2:my_cf1.cq1.cell.timestamp, $3:my_cf1.cq2.cell.value, $4:my_cf2.cq3.cell.timestamp, $5:my_cf1.cq4.cell.value, ..., $11:my_cf1.cq10.cell.value, ...

  FROM my-proj.my-dset.my_bt_ext_table

  READ
  $30

  FROM __stage00_output

  JOIN
  $40 := $1, $41 := $2, $42 := $3, $43 := $4, $44 := $5, $45 := $6, $46 := $7, $47 := $8, $48 := $9, $49 := $10, $50 := $11, $51 := $12, $52 := $13, $53 := $14, $54 := $15, $55 := $16

  SEMI HASH JOIN EACH  WITH ALL  ON $1 = $30

  WRITE
  $40, $41, $42, $43, $44, $45, $46, $47, $48, $49, $50, $51, $52, $53, $54, $55

  TO __stage01_output

Shuffle
  Shuffle output bytes    2.2 MiB

Looking at the plan, seems like the explanation might be because of the fact that we are selecting lot of the columns from the BT table rather than just testing for existence of the rowkey, so all of those columns need to be stored into the hash table (Hash Semi Join) thus maybe the time and cost go up!

If someone from Google Bigtable or Bigquery team can look at the plan and see if there is a possibility of speeding up here, it would be great.

转载请注明原文地址:http://conceptsofalgorithm.com/Algorithm/1745085213a284097.html

bigquery external table on bigtablefetch perf with rowkeys join on smaller bq tableStack Overflow

最新回复(0)