PostgreSQL: Grouping and Aggregating on multiple columns

Question

Problem Statement:

I am working on this simple dataset from Kaggle. I have provided a snippet of data with only required columns in below table. Dataset is quite simple, it has all IPL (cricket) matches listed with teams who played each match (team1 and team2) along with winner of that match.

Now I am trying to get total matches played by all teams along with matches won by each team, I have again provided a snippet of output below the code. Same can be performed by "finding all occurrences of a particular team in column team1" + "finding all occurrences of a particular team in column team2".

While the code does give proper result, I can sense this is not the best approach. I would like to know some better way to do it along with good practices and naming conventions to follow.

Dataset:

team1	team2	winner
Royal Challengers Bangalore	Kolkata Knight Riders	Kolkata Knight Riders
Kings XI Punjab	Chennai Super Kings	Chennai Super Kings
Delhi Daredevils	Rajasthan Royals	Delhi Daredevils
Mumbai Indians	Royal Challengers Bangalore	Royal Challengers Bangalore
Kolkata Knight Riders	Deccan Chargers	Kolkata Knight Riders
Rajasthan Royals	Kings XI Punjab	Rajasthan Royals

Code:

SELECT t1.team1 AS team, c_t1 + c_t2 AS played, c_w AS won, CAST(c_w AS FLOAT) / (c_t1 + c_t2) * 100 AS won_percentage FROM (SELECT team1, count(team1) AS c_t1 FROM ipl_m GROUP BY team1) AS t1 JOIN (SELECT team2, count(team2) AS c_t2 FROM ipl_m GROUP BY team2) AS t2 ON t1.team1 = t2.team2 JOIN (SELECT winner, count(winner) AS c_w FROM ipl_m GROUP BY winner) AS w ON t1.team1 = w.winner OR t2.team2 = w.winner ORDER BY won_percentage DESC;

Resulting Table:

team	played	won	won_percentage
Chennai Super Kings	178	106	59.55056179775281
Mumbai Indians	203	120	59.11330049261084
Delhi Capitals	33	19	57.57575757575758
Sunrisers Hyderabad	124	66	53.2258064516129
Kolkata Knight Riders	192	99	51.5625

Table Definition:

CREATE TABLE ipl_m ( id integer PRIMARY KEY, match_id integer NOT NULL, city VARCHAR(20) NOT NULL, date DATE NOT NULL, player_of_match VARCHAR(50), venue VARCHAR(75) NOT NULL, neutral_venue BOOLEAN NOT NULL, team1 VARCHAR(50) NOT NULL, team2 VARCHAR(50) NOT NULL, toss_winner VARCHAR(50) NOT NULL, toss_decision VARCHAR(5) NOT NULL, winner VARCHAR(50), result VARCHAR(10), result_margin float, eliminator CHAR(1) NOT NULL, method VARCHAR(3), umpire1 VARCHAR(50), umpire2 VARCHAR(50) );

Mast · Accepted Answer · 2021-10-31 11:09:27Z

Each row in ipl_m table has one winner and one loser. So first extract winners and set field result (it will be used in counting) to 1:

SELECT winner AS team, 1 as result FROM ipl_m

Next extract losers and set field result to 0:

SELECT CASE WHEN team1 = winner THEN team2 ELSE team1 AS team, 0 as result FROM ipl_m

Combine two sets with UNION. Now SELECT from resulting set grouping by team column.

SELECT t.team AS team , COUNT(*) AS played , SUM(t.result) AS won FROM ( SELECT winner AS team, 1 as result FROM ipl_m UNION SELECT CASE WHEN team1 = winner THEN team2 ELSE team1 AS team, 0 as result FROM ipl_m ) AS t GROUP BY t.team

Your solution uses 4 SELECT and 2 JOIN operators. Mine uses 3 SELECT and 1 UNION. Using fewer operations is usually preferred.

Thanks for the answer. Initially your code didn't work as is, just needed to add an END to that CASE statement. Also UNION removes duplicates, so UNION ALL should be used instead. After making those changes, things work perfectly! — gautham, CommentedNov 2, 2021 at 11:13

Stack Exchange Network

PostgreSQL: Grouping and Aggregating on multiple columns

Problem Statement:

Dataset:

Code:

Resulting Table:

Table Definition:

1 Answer 1

Hot Network Questions

PostgreSQL: Grouping and Aggregating on multiple columns

Problem Statement:

Dataset:

Code:

Resulting Table:

Table Definition:

1 Answer 1

Related

Hot Network Questions