python - In numpy find a percentile in 2d with some condition - Stack Overflow

admin2025-04-20  0

I have this kind of array

a = np.array([[-999, 9, 7, 3],
   [2, 1, -999, 1],
   [1, 5, 4, 6],
   [0, 6, -999, 9],
   [1, -999, -999, 6],
   [8, 4, 4, 8]])

I want to get 40% percentile of each row in that array where it is not equal -999

If I use np.percentile(a, 40, axis=1) I will get array([ 3.8, 1. , 4.2, 1.2, -799. , 4.8]) which is still include -999

the output I want will be like this

[
   6.2, # 3 or 7 also ok
   1,
   4.2, # 4 or 5 also ok
   4.8, # 0 or 6 also ok
   1,
   4
]

Thank you

I have this kind of array

a = np.array([[-999, 9, 7, 3],
   [2, 1, -999, 1],
   [1, 5, 4, 6],
   [0, 6, -999, 9],
   [1, -999, -999, 6],
   [8, 4, 4, 8]])

I want to get 40% percentile of each row in that array where it is not equal -999

If I use np.percentile(a, 40, axis=1) I will get array([ 3.8, 1. , 4.2, 1.2, -799. , 4.8]) which is still include -999

the output I want will be like this

[
   6.2, # 3 or 7 also ok
   1,
   4.2, # 4 or 5 also ok
   4.8, # 0 or 6 also ok
   1,
   4
]

Thank you

Share Improve this question asked Mar 3 at 2:06 d_frEakd_frEak 4824 silver badges14 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 3

You can replace the -999s with NaNs and use nanpercentile.

import numpy as np
a = np.array([[-999, 9, 7, 3],
   [2, 1, -999, 1],
   [1, 5, 4, 6],
   [0, 6, -999, 9],
   [1, -999, -999, 6],
   [8, 4, 4, 8]], dtype=np.float64)
a[a == -999] = np.nan
np.nanpercentile(a, 40, axis=-1, keepdims=True)
# array([[6.2],
#        [1. ],
#        [4.2],
#        [4.8],
#        [3. ],
#        [4.8]])
# Use the `method` argument if you want a different type of estimate
# `keepdims=True` keeps the result a column, which it looks like you want 

You asked for a solution "in NumPy", and that's it. (Unless you want to re-implement percentile, which is not so hard. Or I suppose you could use apply_along_axis on a function that removes the -999s before taking the quantile, but that will just loop in Python over the slices, which can be slow.)


If you don't want to have to change the dtype and replace with NaNs to perform the operation, you can use NumPy masked arrays with scipy.stats.mquantiles.

import numpy as np
from scipy import stats
a = np.array([[-999, 9, 7, 3],
   [2, 1, -999, 1],
   [1, 5, 4, 6],
   [0, 6, -999, 9],
   [1, -999, -999, 6],
   [8, 4, 4, 8]])
mask = a == -999
b = np.ma.masked_array(a, mask=mask)
stats.mstats.mquantiles(b, 0.4, alphap=1, betap=1, axis=-1)
# alphap=1, betap=1 are the settings to reproduce the same values produced by NumPy's default `method`.

But beware that mquantiles is on its way out, superseded by new features in the next release.

转载请注明原文地址:http://conceptsofalgorithm.com/Algorithm/1745112549a285666.html

最新回复(0)