sql - Why does the adding of a COUNT(DISTINCT ...) affect the result of a SUM()? - Stack Overflow

admin2025-04-19  0

I have a simple aggregation query that works as expected:

SELECT report_date, SUM(col1) AS sum_col1
FROM my_table
GROUP BY report_date
ORDER BY report_date;

However, when I add another aggregation like this, the result of sum_col1 changes:

SELECT report_date, SUM(col1) AS sum_col1, COUNT(DISTINCT col2) AS cnt_col2
FROM my_table
GROUP BY report_date
ORDER BY report_date;

For some of the result rows, sum_col1 is now greater, for others it is smaller. (For most, it is the same.) This is reproducible: if I remove the COUNT(DISTINCT), I get the values from before. If I re-add it, I get the changed values.

This happens on a somewhat large dataset (thousands of rows); I didn't succeed to reproduce this in a small toy example that I could post here.

AIUI, the addition of an aggregation function shouldn't change the values of the other aggregation functions. The only possible anomaly is that col1 contains NULL values, but I can't see why this would change the result from one query to the other.

Does someone have an explanation for this? Does the COUNT(DISTINCT ...) somehow affect the way the grouping works for all aggregations?

Thanks.

Edit: Here are a few of the things I tried while getting to a minimal table that reproduces the problem.

  • From my original dataset, I copied two of the problematic groups out. Both show the problem, but they still have some 1700 rows each.
  • The table with these two groups doesn't have any indexes, so there can't be a problem with indexes.
  • Just to be sure, I ran REINDEX TABLE my_table, which rebuilds PostgreSQL's metadata. The problem persists.
  • Any attempt to copy the data after ordering by col2 (sliced or not) has resulted in a table that doesn't show the problem.
  • Ordering by col1 without slicing still shows the problem, but slicing is difficult. If I cut more than a few (couple 100 from the beginning, less than 100 from the end), the problem doesn't happen anymore. The same happens when I slice without ordering.

(By "slicing" I mean applying OFFSET and/or LIMIT and creating a new table with the result of the query.)

Edit 2: Re rounding error: Summing up 1700 values results in a difference like 17395708 vs 17395696. That's a difference of 12. The types are real. This got me thinking... could this be a rounding error due to different ordering of the summing? Weird is the exact repeatability of two different values and the switching between these exact two values in all scenarios I have tried.

I have a simple aggregation query that works as expected:

SELECT report_date, SUM(col1) AS sum_col1
FROM my_table
GROUP BY report_date
ORDER BY report_date;

However, when I add another aggregation like this, the result of sum_col1 changes:

SELECT report_date, SUM(col1) AS sum_col1, COUNT(DISTINCT col2) AS cnt_col2
FROM my_table
GROUP BY report_date
ORDER BY report_date;

For some of the result rows, sum_col1 is now greater, for others it is smaller. (For most, it is the same.) This is reproducible: if I remove the COUNT(DISTINCT), I get the values from before. If I re-add it, I get the changed values.

This happens on a somewhat large dataset (thousands of rows); I didn't succeed to reproduce this in a small toy example that I could post here.

AIUI, the addition of an aggregation function shouldn't change the values of the other aggregation functions. The only possible anomaly is that col1 contains NULL values, but I can't see why this would change the result from one query to the other.

Does someone have an explanation for this? Does the COUNT(DISTINCT ...) somehow affect the way the grouping works for all aggregations?

Thanks.

Edit: Here are a few of the things I tried while getting to a minimal table that reproduces the problem.

  • From my original dataset, I copied two of the problematic groups out. Both show the problem, but they still have some 1700 rows each.
  • The table with these two groups doesn't have any indexes, so there can't be a problem with indexes.
  • Just to be sure, I ran REINDEX TABLE my_table, which rebuilds PostgreSQL's metadata. The problem persists.
  • Any attempt to copy the data after ordering by col2 (sliced or not) has resulted in a table that doesn't show the problem.
  • Ordering by col1 without slicing still shows the problem, but slicing is difficult. If I cut more than a few (couple 100 from the beginning, less than 100 from the end), the problem doesn't happen anymore. The same happens when I slice without ordering.

(By "slicing" I mean applying OFFSET and/or LIMIT and creating a new table with the result of the query.)

Edit 2: Re rounding error: Summing up 1700 values results in a difference like 17395708 vs 17395696. That's a difference of 12. The types are real. This got me thinking... could this be a rounding error due to different ordering of the summing? Weird is the exact repeatability of two different values and the switching between these exact two values in all scenarios I have tried.

Share Improve this question edited Mar 6 at 16:04 Gerhard asked Mar 6 at 0:23 GerhardGerhard 8517 silver badges10 bronze badges 6
  • 2 Can you provide a minimal reproducible example – Dale K Commented Mar 6 at 0:26
  • 3 make successive copies of your table, cutting away as much as you can on each step, until you are left with a minimal table that causes the fail. E.g., if a group fails, that one probably fails even if it's the only one in the table. – Walter Tross Commented Mar 6 at 0:33
  • 4 Carefully check again. Different expressions in SELECT clause should not affect each other. Do note since you do not use ORDER BY, both queries can return different order of rows. – Parfait Commented Mar 6 at 0:36
  • 3 What happens if you dump all the data into a new table with the exact same schema, constraints and indexes? Do you get this as well? If not then I'd suspect corruption in one or more indexes. – Charlieface Commented Mar 6 at 1:18
  • 1 Can you extract the problematic rows with surroundings? Are we talking of millionths of a typical col1 value (rounding error), or same order of magnitude than col1 (bizarre switching of order between two approaching report_dates)? The former could be explained by the fact that PostgreSQL sums up as soon as it reads values, in the GROUP BY phase, before the ORDER BY, so if the additional changes the reading order the sum could result a bit differently. – Guillaume Outters Commented Mar 6 at 6:19
 |  Show 1 more comment

1 Answer 1

Reset to default 1

Kudos to @GuillaumeOutters for bringing up rounding errors.

It seems that different orderings of summing results in (exactly?) two different results, but the difference is something like 17395708 vs 17395696 -- that's a difference of 12, and already way outside the significant digits of the real type. I shouldn't even have considered this difference as significant.

I confirmed this by casting the values to double precision before summing, and the problem went away.

Even though it feels strange that the result switches between these exact two values (for one group; the other groups have different values but with similar characteristics), that's likely just up to how the real type behaves.

Thanks to everybody who chimed in (in the comments to the question).

转载请注明原文地址:http://conceptsofalgorithm.com/Algorithm/1745001558a279257.html

最新回复(0)