I am trying to estimate quantiles for some snow data using the log pearson type 3 distribution in Python and comparing with R. I do this by reading in the data, log transforming it, fitting Pearson type 3, estimating quantiles, then transforming back from log space.
In python:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import lmoments3 as lm
import lmoments3.distr as ld
data=np.array([[1079],
[ 889],
[ 996],
[1476],
[1567],
[ 897],
[ 991],
[1222],
[1372],
[1450],
[1077],
[1354],
[1029],
[1699],
[ 965],
[1133],
[1951],
[1621],
[1069],
[ 930],
[1039],
[1839]])
return_periods = np.array([2,3,5,10,20,50,100,200,1000])
log_data = np.log(data)
params = stats.pearson3.fit(log_data) #Max likelihood estimation method
quantiles = np.exp(stats.pearson3.ppf(1 - 1 / return_periods, *params))
paramsmm=ld.pe3.lmom_fit(log_data) #lmoments estimation method
paramsmm2=(paramsmm["skew"], paramsmm['loc'], paramsmm['scale'][0])
quantilesmm = np.exp(ld.pe3.ppf(1 - 1 / return_periods, *paramsmm2))
print(quantiles)
print(quantilesmm)
in R:
library(lmom)
library(lmomco)
library(FAdist)
swe_data <- c(1079,
889,
996,
1476,
1567,
897,
991,
1222,
1372,
1450,
1077,
1354,
1029,
1699,
965,
1133,
1951,
1621,
1069,
930,
1039,
1839)
return_periods <- c(2, 3, 5, 10, 20, 50, 100, 200, 1000)
exceedance_probabilities <- 1 / return_periods # P = 1/T
nonexceedance_probabilities <- 1 - exceedance_probabilities # P_nonexceedance = 1 - P_exceedance
log_swe <- log(swe_data)
loglmoments <- lmom.ub(log_swe_data)
fit_LP3 <- parpe3(loglmoments) #pars estimated using lmoments
LP3_est=exp(quape3(nonexceedance_probabilities, fit_LP3))
print(LP3_est)
The quantiles estimated are the following:
MLE/scipy stats:
params=(2.0246357656236125, 7.10812763271725, 0.32194785836668816) #skew, loc scale
quantiles=[1105.86050592 1259.46110488 1484.67412496 1857.18767881 2324.18036925 3127.68767927 3916.2007443 4904.15011095 8271.24322709]
Lmoments/python:
params=(-2.2194418726874434, 7.1069179914286424, 0.07535915093401913) #skew, loc scale
quantiles=[1251.30865382 1276.35189073 1291.29995882 1300.06624583 1303.59129662 1305.31725745 1305.78638777 1305.98555852 1306.11275037]
Lmoments/R:
params= (7.1069180 0.2566677 0.9365001) #mu, sigma, gamma
quantiles=[1173.116 1313.849 1485.109 1721.131 1969.817 2326.812 2623.112 2945.728 3814.692]
I would expect the latter two methods, both using lmoments, to produce the same result. Based on comparisons with other distributions, it seems like R is giving the most realistic result. Any explanation for the large differences? How might I get a similar result in Python?
I am trying to estimate quantiles for some snow data using the log pearson type 3 distribution in Python and comparing with R. I do this by reading in the data, log transforming it, fitting Pearson type 3, estimating quantiles, then transforming back from log space.
In python:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import lmoments3 as lm
import lmoments3.distr as ld
data=np.array([[1079],
[ 889],
[ 996],
[1476],
[1567],
[ 897],
[ 991],
[1222],
[1372],
[1450],
[1077],
[1354],
[1029],
[1699],
[ 965],
[1133],
[1951],
[1621],
[1069],
[ 930],
[1039],
[1839]])
return_periods = np.array([2,3,5,10,20,50,100,200,1000])
log_data = np.log(data)
params = stats.pearson3.fit(log_data) #Max likelihood estimation method
quantiles = np.exp(stats.pearson3.ppf(1 - 1 / return_periods, *params))
paramsmm=ld.pe3.lmom_fit(log_data) #lmoments estimation method
paramsmm2=(paramsmm["skew"], paramsmm['loc'], paramsmm['scale'][0])
quantilesmm = np.exp(ld.pe3.ppf(1 - 1 / return_periods, *paramsmm2))
print(quantiles)
print(quantilesmm)
in R:
library(lmom)
library(lmomco)
library(FAdist)
swe_data <- c(1079,
889,
996,
1476,
1567,
897,
991,
1222,
1372,
1450,
1077,
1354,
1029,
1699,
965,
1133,
1951,
1621,
1069,
930,
1039,
1839)
return_periods <- c(2, 3, 5, 10, 20, 50, 100, 200, 1000)
exceedance_probabilities <- 1 / return_periods # P = 1/T
nonexceedance_probabilities <- 1 - exceedance_probabilities # P_nonexceedance = 1 - P_exceedance
log_swe <- log(swe_data)
loglmoments <- lmom.ub(log_swe_data)
fit_LP3 <- parpe3(loglmoments) #pars estimated using lmoments
LP3_est=exp(quape3(nonexceedance_probabilities, fit_LP3))
print(LP3_est)
The quantiles estimated are the following:
MLE/scipy stats:
params=(2.0246357656236125, 7.10812763271725, 0.32194785836668816) #skew, loc scale
quantiles=[1105.86050592 1259.46110488 1484.67412496 1857.18767881 2324.18036925 3127.68767927 3916.2007443 4904.15011095 8271.24322709]
Lmoments/python:
params=(-2.2194418726874434, 7.1069179914286424, 0.07535915093401913) #skew, loc scale
quantiles=[1251.30865382 1276.35189073 1291.29995882 1300.06624583 1303.59129662 1305.31725745 1305.78638777 1305.98555852 1306.11275037]
Lmoments/R:
params= (7.1069180 0.2566677 0.9365001) #mu, sigma, gamma
quantiles=[1173.116 1313.849 1485.109 1721.131 1969.817 2326.812 2623.112 2945.728 3814.692]
I would expect the latter two methods, both using lmoments, to produce the same result. Based on comparisons with other distributions, it seems like R is giving the most realistic result. Any explanation for the large differences? How might I get a similar result in Python?
Okey to my understanding basically, The main difference between the Python and R results stems from the estimation methods used: Python's scipy.stats
uses Maximum Likelihood Estimation (MLE) by default.
R's lmom
package uses L-moments
estimation.
These methods can produce different parameter estimates, especially for small sample sizes or highly skewed distributions like the Log-Pearson Type III.To get a similar or closer result you can use the lmoments3
library in Python, which implements the L-moments method. Ensure correct parameter extraction from the lmoments3 results.
To get them to agree, use the lmoments3
library in Python – it uses the same L-moments
method as R. Just double-check how you're pulling out the parameters, especially the 'scale'
value, from the results. The code example in the edit shows how to do this right, and it should get your Python results much closer to R.
Here's how you can get python result much closer to R results.
import numpy as np
import scipy.stats as stats
import lmoments3 as lm
import lmoments3.distr as ld
# Input data
data = np.array([1079, 889, 996, 1476, 1567, 897, 991, 1222, 1372, 1450,
1077, 1354, 1029, 1699, 965, 1133, 1951, 1621, 1069,
930, 1039, 1839])
# Return periods and probabilities
return_periods = np.array([2, 3, 5, 10, 20, 50, 100, 200, 1000])
non_exceedance_probabilities = 1 - 1 / return_periods
# Log-transform the data
log_data = np.log(data)
# L-moments method (using lmoments3)
paramsmm = ld.pe3.lmom_fit(log_data)
quantiles_lmom = np.exp(ld.pe3.ppf(non_exceedance_probabilities,
paramsmm["skew"],
paramsmm['loc'],
paramsmm['scale']))
print("L-moments method results:")
print(quantiles_lmom)
# Maximum Likelihood Estimation method (using scipy.stats)
params_mle = stats.pearson3.fit(log_data)
quantiles_mle = np.exp(stats.pearson3.ppf(non_exceedance_probabilities, *params_mle))
print("\nMaximum Likelihood Estimation results:")
print(quantiles_mle)
# R results for comparison
r_quantiles = np.array([1173.116, 1313.849, 1485.109, 1721.131, 1969.817,
2326.812, 2623.112, 2945.728, 3814.692])
print("\nR results:")
print(r_quantiles)
# Compare results
print("\nDifference between Python L-moments and R results:")
print(quantiles_lmom - r_quantiles)