I would like to know the better way to manage the consumption of kafka messages in JSON format, the modification and conversion of each kafka JSON message, transformation into CSV and writing to a file in CSV format with high performance.
My aim is to implement the best possible solution. So I need to read the KAFKA JSON messages in blocks, then process these messages in blocks and write them in blocks to a CSV file.
Thanks a lot for your help
I would like to try the following NIFI Flow: ConsumeKafka -> MergeContent -> ConvertJSONToAvro -> UpdateRecord -> ConvertAvroToCSV -> PutFile So read the Kafka messages in blocks of 10,000, group the messages via the MergeContent and process all the messages together then create a CSV file. What do you think ? Do you have another recommendations?
I would like to know the better way to manage the consumption of kafka messages in JSON format, the modification and conversion of each kafka JSON message, transformation into CSV and writing to a file in CSV format with high performance.
My aim is to implement the best possible solution. So I need to read the KAFKA JSON messages in blocks, then process these messages in blocks and write them in blocks to a CSV file.
Thanks a lot for your help
I would like to try the following NIFI Flow: ConsumeKafka -> MergeContent -> ConvertJSONToAvro -> UpdateRecord -> ConvertAvroToCSV -> PutFile So read the Kafka messages in blocks of 10,000, group the messages via the MergeContent and process all the messages together then create a CSV file. What do you think ? Do you have another recommendations?
Following your feedback, I looked at the ConsumeKafkaRecord and I think that yes you're right I could apply the following Flow: ConsumeKafkaRecord(ScriptedReader, CSVWriter) => MergeContent => UpdateAttributes => PutFile.
1/ In the ConsumeKafkaRecord, I'd like to use a ScriptedReader to convert and modify the json message and a CSVWriter to write the new message.
2/ MergeContent to merge the stream files.
3/ UpdateAttributes to change the file name.
4/ PutFile to write the file
The only problem is the header I want to write to the CSV file, as I only want one header.
Do you agree with this flow?
Thanks a lot.