sql - Flatten a column with JSON strings with different numbers of keys to a table in bigquery (Dynamic) - Stack Overflow

admin2025-04-19  1

This is my first time posting here!

Saw someone gave an answer to a similar question here: How to flatten a colum of large JSON strings with different numbers of keys to a table in bigquery but in my case, the column looks as follows:

id keywords
862 [{'id': 931, 'name': 'jealousy'}, {'id': 4290, 'name': 'toy'}, {'id': 5202, 'name': 'boy'}, {'id': 6054, 'name': 'friendship'}, {'id': 9713, 'name': 'friends'}, {'id': 9823, 'name': 'rivalry'}, {'id': 165503, 'name': 'boy next door'}, {'id': 170722, 'name': 'new toy'}, {'id': 187065, 'name': 'toy comes to life'}]
8844 [{'id': 10090, 'name': 'board game'}, {'id': 10941, 'name': 'disappearance'}, {'id': 15101, 'name': "based on children's book"}, {'id': 33467, 'name': 'new home'}, {'id': 158086, 'name': 'recluse'}, {'id': 158091, 'name': 'giant insect'}]

This is my first time posting here!

Saw someone gave an answer to a similar question here: How to flatten a colum of large JSON strings with different numbers of keys to a table in bigquery but in my case, the column looks as follows:

id keywords
862 [{'id': 931, 'name': 'jealousy'}, {'id': 4290, 'name': 'toy'}, {'id': 5202, 'name': 'boy'}, {'id': 6054, 'name': 'friendship'}, {'id': 9713, 'name': 'friends'}, {'id': 9823, 'name': 'rivalry'}, {'id': 165503, 'name': 'boy next door'}, {'id': 170722, 'name': 'new toy'}, {'id': 187065, 'name': 'toy comes to life'}]
8844 [{'id': 10090, 'name': 'board game'}, {'id': 10941, 'name': 'disappearance'}, {'id': 15101, 'name': "based on children's book"}, {'id': 33467, 'name': 'new home'}, {'id': 158086, 'name': 'recluse'}, {'id': 158091, 'name': 'giant insect'}]

How can I create a SQL table using the same method described in the other answer which dynamically identifies the Key objects and creates columns while automatically flattening the results. When I run this script I get an error: Query error: SyntaxError: Expected property name or '}' in JSON at position 2 (line 1 column 3) at undefined line 1, columns 2-3 at [24:3]

create temp function  extract_keys(input string) returns array<string> language js as """
return Object.keys(JSON.parse(input));
""";
create temp function  extract_values(input string) returns array<string> language   js as """
return Object.values(JSON.parse(input));
""";

create temp function extract_all_leaves(input string) returns string language js as '''
function flattenObj(obj, parent = '', res = {}){
for(let key in obj){
    let propName = parent ? parent + '.' + key : key;
    if(typeof obj[key] == 'object'){
        flattenObj(obj[key], propName, res);
    } else {
        res[propName] = obj[key];
    }
}
return JSON.stringify(res);
}
return flattenObj(JSON.parse(input));
''';

create temp table temp_table as (
 select offset, key, value, id
 from your_table t, 
 unnest([struct(extract_all_leaves(payload) as leaves)]),
 unnest(extract_keys(leaves)) key with offset
join unnest(extract_values(leaves)) value with offset
using(offset)  
);   

execute immediate (select '''
select * from (select * except(offset) from temp_table)
 pivot (any_value(value) for replace(key, '.', '__') in (''' || keys_list || '''
 ))'''
from (select string_agg('"' || replace(key, '.', '__') || '"', ',' order by offset) keys_list from (
select key, min(offset) as offset from temp_table group by key
))
);  }
                                                           
Share Improve this question edited Mar 6 at 0:46 Dale K 27.5k15 gold badges58 silver badges83 bronze badges asked Mar 6 at 0:45 Julian CerrilloJulian Cerrillo 13 bronze badges 2
  • Could you add an example entry that includes a text field with both ' and "? As in based on children's book "Disappeared" from X. Y.: which character gets chosen to delimit the attribute's value, ('name': '…' or 'name': "…") and how does the character itself get escaped when used literally (backslashed \", or doubled "", … or not handled, breaking the app?). This would make an interesting test case (and in fact answers should be tested against it to be deemed complete). – Guillaume Outters Commented Mar 6 at 9:06
  • please provide example of expected output. fix looks quite simple, but it is absolutely not clear what is expected output! – Mikhail Berlyant Commented Mar 9 at 3:24
Add a comment  | 

2 Answers 2

Reset to default 0

… Else as GBQ allows Javascript functions and if you're confident in the data (if you're sure it does contain no Javascript injection),
you can simply replace the 3 JSON.parse(input) by eval(input).


Your data is nearly JSON; in fact it looks like Javascript, and JSON being a strict subset of Javascript there are some things to change to make it a valid, strict JSON;
however, as GBQ understands more than strict JSON, with a single change you can make it understand simple entries of your data (like the first one of your example): replace ' by " in field values.

To that goal, replace extract_all_leaves(payload) by extract_all_leaves(replace(payload, "'", '"')).

But as your data can embed ' and " (in the text fields, apart from those delimiting text fields), you'll need a bit more of intelligence to handle them.

Taking in consideration that we may have "…'…", '…"…' but probably some "…'…\"…" too (see my comment question about it),
this will require a regex to handle the two cases, either "starting and ending with double quotes with no double quote in it or it is preceded by a \", or same with a single quote,
we can handle those by replacing:
extract_all_leaves(payload)
with
extract_all_leaves(regexp_replace(payload, "'(\\\\.|[^']+)*'|\"(\\\\.|[^\"]+)*\"", '"\\1\\2"'))

(we cannot handle only the ' case and let the "s as is, because then the ' of "O'Connor" would be interpreted as the start of a new '…' sequence, shifting everything and finally making the expression fail.

转载请注明原文地址:http://conceptsofalgorithm.com/Algorithm/1745001039a279226.html

最新回复(0)