NIFI - Best way of managing the consumption of kafka messages in JSON format and write in CSV with high performance - Stack Over

admin2025-04-20  0

I would like to know the better way to manage the consumption of kafka messages in JSON format, the modification and conversion of each kafka JSON message, transformation into CSV and writing to a file in CSV format with high performance.

My aim is to implement the best possible solution. So I need to read the KAFKA JSON messages in blocks, then process these messages in blocks and write them in blocks to a CSV file.

Thanks a lot for your help

I would like to try the following NIFI Flow: ConsumeKafka -> MergeContent -> ConvertJSONToAvro -> UpdateRecord -> ConvertAvroToCSV -> PutFile So read the Kafka messages in blocks of 10,000, group the messages via the MergeContent and process all the messages together then create a CSV file. What do you think ? Do you have another recommendations?

I would like to know the better way to manage the consumption of kafka messages in JSON format, the modification and conversion of each kafka JSON message, transformation into CSV and writing to a file in CSV format with high performance.

My aim is to implement the best possible solution. So I need to read the KAFKA JSON messages in blocks, then process these messages in blocks and write them in blocks to a CSV file.

Thanks a lot for your help

I would like to try the following NIFI Flow: ConsumeKafka -> MergeContent -> ConvertJSONToAvro -> UpdateRecord -> ConvertAvroToCSV -> PutFile So read the Kafka messages in blocks of 10,000, group the messages via the MergeContent and process all the messages together then create a CSV file. What do you think ? Do you have another recommendations?

Share Improve this question asked Mar 3 at 6:54 Patrick BaiePatrick Baie 153 bronze badges 5
  • 1 Did you try the Nifi Flow: ConsumeKafkaRecord(CSV Reader) -> MergeRecord ? – tonykoval Commented Mar 3 at 9:54
  • 0) What's preventing you from this idea, currently? Stackoverflow isn't for whiteboarding 1) Proto or Avro are superior to JSON. From the outset! Don't convert anything 2) Why write to CSV? Performance would come from distributed writes to a Parquet file per topic partition, or to an async database buffer (Mongo, Cassandra, Couchbase, Postgres, etc)... Nifi is over engineered. Use Kafka Connect; it's built-in – OneCricketeer Commented Mar 5 at 8:40
  • 0) Because I'm new in NIFI and ask me a lot of question about the best solution 1) OK 2) I have to write in CSV because the goal is to convert Kafka message in csv file for other application. I don't know Kafka Connect but I'm going to check this – Patrick Baie Commented Mar 5 at 10:39
  • 2) I have to write in CSV because the goal is to convert Kafka message in csv file for other application but with some modifications. So to convert the kafka message before write it in csv file – Patrick Baie Commented Mar 5 at 10:53
  • @tonykoval : no because my topic kafka contains json format message so I have to read in JSON, after I have to modify the message by converting some values and after write the message in csv file. But I want to read 10000 messages kafka, convert each message and maybe mergeRecord in 10000 rows and write in csv file – Patrick Baie Commented Mar 7 at 10:41
Add a comment  | 

1 Answer 1

Reset to default -1

Following your feedback, I looked at the ConsumeKafkaRecord and I think that yes you're right I could apply the following Flow: ConsumeKafkaRecord(ScriptedReader, CSVWriter) => MergeContent => UpdateAttributes => PutFile.

1/ In the ConsumeKafkaRecord, I'd like to use a ScriptedReader to convert and modify the json message and a CSVWriter to write the new message.

2/ MergeContent to merge the stream files.

3/ UpdateAttributes to change the file name.

4/ PutFile to write the file

The only problem is the header I want to write to the CSV file, as I only want one header.

Do you agree with this flow?

Thanks a lot.

转载请注明原文地址:http://conceptsofalgorithm.com/Algorithm/1745106402a285316.html

最新回复(0)