sql - AVG function returns only 1 or 0, Apache Hive - Stack Overflow

admin2025-04-19  0

Imagine I have a table with purchases, where I see id of customer, id of store where he bought something (I see other cols like prices, dates and others but not that important now). I have second table where I have every customers id and a sign if the customer is VIP. I want to see what is the % of VIP customers in stores. Now, I write a query like this:

select 
 p.store_id,
 count(distinct p.customer_id) as count_customers,
 avg(v.is_vip) as percentage_vip
from schema.purchases p
left join (
    select 
     customer_id,
     is_vip
    from schema.vip_customers
    where data_year = 2024
) v on v.customer_id = p.customer_id
where purchase_date between '2024-1-1' and '2024-12-31'
and p.purchase_ok = 1
group by 1

For each store, only 1 or 0 is returned as AVG, even though I can see there are stores with variablity when I do not group the table or when I group it on the is_vip column. I just cant see why AVG does not work as expected (my expectations, but i guess, when you put some amounts of 0s and different amount of 1s in this function, i expect it to return something else then 1 or 0).

EDIT: Tim added perfect answer and it was really just matter of casting - the column was casted as integer, so it would really return only 1 or 0. When casted to decimal, it works as expected. See answers.

Imagine I have a table with purchases, where I see id of customer, id of store where he bought something (I see other cols like prices, dates and others but not that important now). I have second table where I have every customers id and a sign if the customer is VIP. I want to see what is the % of VIP customers in stores. Now, I write a query like this:

select 
 p.store_id,
 count(distinct p.customer_id) as count_customers,
 avg(v.is_vip) as percentage_vip
from schema.purchases p
left join (
    select 
     customer_id,
     is_vip
    from schema.vip_customers
    where data_year = 2024
) v on v.customer_id = p.customer_id
where purchase_date between '2024-1-1' and '2024-12-31'
and p.purchase_ok = 1
group by 1

For each store, only 1 or 0 is returned as AVG, even though I can see there are stores with variablity when I do not group the table or when I group it on the is_vip column. I just cant see why AVG does not work as expected (my expectations, but i guess, when you put some amounts of 0s and different amount of 1s in this function, i expect it to return something else then 1 or 0).

EDIT: Tim added perfect answer and it was really just matter of casting - the column was casted as integer, so it would really return only 1 or 0. When casted to decimal, it works as expected. See answers.

Share edited Mar 4 at 14:03 quakenstein asked Mar 4 at 8:44 quakensteinquakenstein 641 silver badge7 bronze badges 3
  • What is your database vendor (e.g. Oracle, MySQL, SQL Server) ? – Tim Biegeleisen Commented Mar 4 at 8:46
  • Sorry, i included it in the post - and its Apache Hive (at least I hope, I am not very.. technical:)). – quakenstein Commented Mar 4 at 9:02
  • Please provide a minimal reproducible example with sample data and desired results – Dale K Commented Mar 4 at 9:14
Add a comment  | 

1 Answer 1

Reset to default 1

Assuming the column is_vip be integer, then the average calculation, if also operating in integer mode, would return only 0 or 1 after truncation of the decimal component. One way to get around this might be to cast to some numeric type with a decimal component, e.g.

AVG(CAST(v.is_vip AS DECIMAL(10,2)) AS percentage_vip
转载请注明原文地址:http://conceptsofalgorithm.com/Algorithm/1745054347a282307.html

最新回复(0)