5
\$\begingroup\$

Problem Statement:

I am working on this simple dataset from Kaggle. I have provided a snippet of data with only required columns in below table. Dataset is quite simple, it has all IPL (cricket) matches listed with teams who played each match (team1 and team2) along with winner of that match.

Now I am trying to get total matches played by all teams along with matches won by each team, I have again provided a snippet of output below the code. Same can be performed by "finding all occurrences of a particular team in column team1" + "finding all occurrences of a particular team in column team2".

While the code does give proper result, I can sense this is not the best approach. I would like to know some better way to do it along with good practices and naming conventions to follow.

Dataset:

team1team2winner
Royal Challengers BangaloreKolkata Knight RidersKolkata Knight Riders
Kings XI PunjabChennai Super KingsChennai Super Kings
Delhi DaredevilsRajasthan RoyalsDelhi Daredevils
Mumbai IndiansRoyal Challengers BangaloreRoyal Challengers Bangalore
Kolkata Knight RidersDeccan ChargersKolkata Knight Riders
Rajasthan RoyalsKings XI PunjabRajasthan Royals

Code:

SELECT t1.team1 AS team, c_t1 + c_t2 AS played, c_w AS won, CAST(c_w AS FLOAT) / (c_t1 + c_t2) * 100 AS won_percentage FROM (SELECT team1, count(team1) AS c_t1 FROM ipl_m GROUP BY team1) AS t1 JOIN (SELECT team2, count(team2) AS c_t2 FROM ipl_m GROUP BY team2) AS t2 ON t1.team1 = t2.team2 JOIN (SELECT winner, count(winner) AS c_w FROM ipl_m GROUP BY winner) AS w ON t1.team1 = w.winner OR t2.team2 = w.winner ORDER BY won_percentage DESC; 

Resulting Table:

teamplayedwonwon_percentage
Chennai Super Kings17810659.55056179775281
Mumbai Indians20312059.11330049261084
Delhi Capitals331957.57575757575758
Sunrisers Hyderabad1246653.2258064516129
Kolkata Knight Riders1929951.5625

Table Definition:

CREATE TABLE ipl_m ( id integer PRIMARY KEY, match_id integer NOT NULL, city VARCHAR(20) NOT NULL, date DATE NOT NULL, player_of_match VARCHAR(50), venue VARCHAR(75) NOT NULL, neutral_venue BOOLEAN NOT NULL, team1 VARCHAR(50) NOT NULL, team2 VARCHAR(50) NOT NULL, toss_winner VARCHAR(50) NOT NULL, toss_decision VARCHAR(5) NOT NULL, winner VARCHAR(50), result VARCHAR(10), result_margin float, eliminator CHAR(1) NOT NULL, method VARCHAR(3), umpire1 VARCHAR(50), umpire2 VARCHAR(50) ); 
\$\endgroup\$
0

    1 Answer 1

    3
    \$\begingroup\$

    Each row in ipl_m table has one winner and one loser. So first extract winners and set field result (it will be used in counting) to 1:

    SELECT winner AS team, 1 as result FROM ipl_m 

    Next extract losers and set field result to 0:

    SELECT CASE WHEN team1 = winner THEN team2 ELSE team1 AS team, 0 as result FROM ipl_m 

    Combine two sets with UNION. Now SELECT from resulting set grouping by team column.

    SELECT t.team AS team , COUNT(*) AS played , SUM(t.result) AS won FROM ( SELECT winner AS team, 1 as result FROM ipl_m UNION SELECT CASE WHEN team1 = winner THEN team2 ELSE team1 AS team, 0 as result FROM ipl_m ) AS t GROUP BY t.team 

    Your solution uses 4 SELECT and 2 JOIN operators. Mine uses 3 SELECT and 1 UNION. Using fewer operations is usually preferred.

    \$\endgroup\$
    2
    • \$\begingroup\$Thanks for the answer. Initially your code didn't work as is, just needed to add an END to that CASE statement. Also UNION removes duplicates, so UNION ALL should be used instead. After making those changes, things work perfectly!\$\endgroup\$
      – gautham
      CommentedNov 2, 2021 at 11:13
    • \$\begingroup\$CASE ... END - my bad. UNION ALL - you are right !\$\endgroup\$
      – JulStrat
      CommentedNov 2, 2021 at 11:43

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.