I am getting "OSError: Errno 95: Operation not supported for the code below. I have 'openpyxl 3.1.5' installed on the cluster and have imported all required modules. I am sure this is something small, but I can't put my finger on it why this is erroring out. Thanks for taking a look!
df = spark.sql(f"""
SELECT
custId,
locId
FROM salesorders
WHERE year = '2025'
LIMIT 100
""")
df_region = spark.sql(f"""
SELECT DISTINCT Id
FROM sales.region
WHERE isactive = 1 AND year = '2025'
""")
for row in df_region.rdd.collect():
regId = row.__getitem__('Id')
df_filtered = df.where(df.locId == regId)
df_filtered.toPandas().to_excel(filePath.format(locId=regId), index=False, engine='openpyxl')
I am getting "OSError: Errno 95: Operation not supported for the code below. I have 'openpyxl 3.1.5' installed on the cluster and have imported all required modules. I am sure this is something small, but I can't put my finger on it why this is erroring out. Thanks for taking a look!
df = spark.sql(f"""
SELECT
custId,
locId
FROM salesorders
WHERE year = '2025'
LIMIT 100
""")
df_region = spark.sql(f"""
SELECT DISTINCT Id
FROM sales.region
WHERE isactive = 1 AND year = '2025'
""")
for row in df_region.rdd.collect():
regId = row.__getitem__('Id')
df_filtered = df.where(df.locId == regId)
df_filtered.toPandas().to_excel(filePath.format(locId=regId), index=False, engine='openpyxl')
Got it figure out. Databricks File System (DBFS) does not support random writes, which are required for writing Excel files. This limitation necessitates workarounds like writing to a local disk first and then copying files to DBFS. I leveraged the cluster, then copied the files to the storage account.