How can I format numbers in a pandas
table, when a cell displays a list of floating point numbers instead of a single float value?
Here's a code example:
import pandas as pd
df = pd.DataFrame(data={'Values':[[0.1231245678,0,0],[1e-10,0,0]]})
df
I would like to format the numbers in the table as %.2f
. So the table data should be displayed as
[0.12, 0.00, 0.00]
[0.00, 0.00, 0.00]
The usual options:
pd.set_option('float_format','{:20,.2f}'.format)
pd.set_option('display.chop_threshold', 0.001)
only work when table cells contain single numbers.
How can I format numbers in a pandas
table, when a cell displays a list of floating point numbers instead of a single float value?
Here's a code example:
import pandas as pd
df = pd.DataFrame(data={'Values':[[0.1231245678,0,0],[1e-10,0,0]]})
df
I would like to format the numbers in the table as %.2f
. So the table data should be displayed as
[0.12, 0.00, 0.00]
[0.00, 0.00, 0.00]
The usual options:
pd.set_option('float_format','{:20,.2f}'.format)
pd.set_option('display.chop_threshold', 0.001)
only work when table cells contain single numbers.
As far as I know, there is no way to handle this via pandas options. You can however modify the values in the columns, or if you need to retain original values as well, create a new column and call that when you want it displayed.
df['ValuesForDisplay'] = df['Values'].apply(lambda x: [round(item, 2) for item in x])
Define a formatter that takes a list and prints it the way you want it. Then apply it to the dataframe:
formatter = lambda l: ', '.join('{:0.2f}'.format(i) for i in l)
df.style.format(formatter)
Should print out what you want:
Values
0 0.12, 0.00, 0.00
1 0.00, 0.00, 0.00
You can use dataframe.apply
format_numbers = lambda x : [f"{num:.2f}" for num in x]
df['Values'] = df['Values'].apply(format_numbers)
Note that this will convert the numbers to strings, so if you need to perform numerical operations on the data, you may need to convert them back to numbers. You can use np.float64
or float
instead at that time
format_numbers = lambda x : [np.float64(f"{num:.2f}") for num in x]
df['Values'] = df['Values'].apply(format_numbers
Now , if this is a minimal reproducible example and your code consists more data it is not good to use df.apply
and then convert back to np.float64
it will take O(n * m)
. Alternatively , you can use
df['Values'] = df['Values'].apply(np.array)
# vectorized operations
df['Values'] = df['Values'].apply(lambda x: np.format_float_positional(x, precision=2))
print(df)
And if you have multi-core processor, you can use parallel processing libraries like joblib
or dask
to speed up the formatting process.
from joblib import Parallel, delayed
def format_numbers(x):
return [f"{num:.2f}" for num in x]
df['Values'] = Parallel(n_jobs=-1)(delayed(format_numbers)(x) for x in df['Values'])
See joblib.Parallel
for more information
To properly format the data I am using a function to iterate over each list to format the number manually. It doesn't seem to be very optimised approach as it takes O(M*N) -> where M is number of element in each list and N is number of rows in dataframe.
>>> df = pd.DataFrame(data={'Value': [[0.1231245678, 0, 0], [1e-10,0,0]]})
>>> df
Value
0 [0.1231245678, 0, 0]
1 [1e-10, 0, 0]
>>> func = lambda X: [f"{i:.2f}" for i in X]
>>> df = df['Value'].apply(func)
>>> df
0 [0.12, 0.00, 0.00]
1 [0.00, 0.00, 0.00]
Name: Value, dtype: object