I have this kind of array
a = np.array([[-999, 9, 7, 3],
[2, 1, -999, 1],
[1, 5, 4, 6],
[0, 6, -999, 9],
[1, -999, -999, 6],
[8, 4, 4, 8]])
I want to get 40% percentile of each row in that array where it is not equal -999
If I use np.percentile(a, 40, axis=1)
I will get array([ 3.8, 1. , 4.2, 1.2, -799. , 4.8])
which is still include -999
the output I want will be like this
[
6.2, # 3 or 7 also ok
1,
4.2, # 4 or 5 also ok
4.8, # 0 or 6 also ok
1,
4
]
Thank you
I have this kind of array
a = np.array([[-999, 9, 7, 3],
[2, 1, -999, 1],
[1, 5, 4, 6],
[0, 6, -999, 9],
[1, -999, -999, 6],
[8, 4, 4, 8]])
I want to get 40% percentile of each row in that array where it is not equal -999
If I use np.percentile(a, 40, axis=1)
I will get array([ 3.8, 1. , 4.2, 1.2, -799. , 4.8])
which is still include -999
the output I want will be like this
[
6.2, # 3 or 7 also ok
1,
4.2, # 4 or 5 also ok
4.8, # 0 or 6 also ok
1,
4
]
Thank you
You can replace the -999
s with NaNs and use nanpercentile
.
import numpy as np
a = np.array([[-999, 9, 7, 3],
[2, 1, -999, 1],
[1, 5, 4, 6],
[0, 6, -999, 9],
[1, -999, -999, 6],
[8, 4, 4, 8]], dtype=np.float64)
a[a == -999] = np.nan
np.nanpercentile(a, 40, axis=-1, keepdims=True)
# array([[6.2],
# [1. ],
# [4.2],
# [4.8],
# [3. ],
# [4.8]])
# Use the `method` argument if you want a different type of estimate
# `keepdims=True` keeps the result a column, which it looks like you want
You asked for a solution "in NumPy", and that's it. (Unless you want to re-implement percentile
, which is not so hard. Or I suppose you could use apply_along_axis
on a function that removes the -999
s before taking the quantile, but that will just loop in Python over the slices, which can be slow.)
If you don't want to have to change the dtype
and replace with NaNs to perform the operation, you can use NumPy masked arrays with scipy.stats.mquantiles
.
import numpy as np
from scipy import stats
a = np.array([[-999, 9, 7, 3],
[2, 1, -999, 1],
[1, 5, 4, 6],
[0, 6, -999, 9],
[1, -999, -999, 6],
[8, 4, 4, 8]])
mask = a == -999
b = np.ma.masked_array(a, mask=mask)
stats.mstats.mquantiles(b, 0.4, alphap=1, betap=1, axis=-1)
# alphap=1, betap=1 are the settings to reproduce the same values produced by NumPy's default `method`.
But beware that mquantiles
is on its way out, superseded by new features in the next release.