Problem Statement:
I am working on this simple dataset from Kaggle. I have provided a snippet of data with only required columns in below table. Dataset is quite simple, it has all IPL (cricket) matches listed with teams who played each match (team1 and team2) along with winner of that match.
Now I am trying to get total matches played by all teams along with matches won by each team, I have again provided a snippet of output below the code. Same can be performed by "finding all occurrences of a particular team in column team1" + "finding all occurrences of a particular team in column team2".
While the code does give proper result, I can sense this is not the best approach. I would like to know some better way to do it along with good practices and naming conventions to follow.
Dataset:
team1 | team2 | winner |
---|---|---|
Royal Challengers Bangalore | Kolkata Knight Riders | Kolkata Knight Riders |
Kings XI Punjab | Chennai Super Kings | Chennai Super Kings |
Delhi Daredevils | Rajasthan Royals | Delhi Daredevils |
Mumbai Indians | Royal Challengers Bangalore | Royal Challengers Bangalore |
Kolkata Knight Riders | Deccan Chargers | Kolkata Knight Riders |
Rajasthan Royals | Kings XI Punjab | Rajasthan Royals |
Code:
SELECT t1.team1 AS team, c_t1 + c_t2 AS played, c_w AS won, CAST(c_w AS FLOAT) / (c_t1 + c_t2) * 100 AS won_percentage FROM (SELECT team1, count(team1) AS c_t1 FROM ipl_m GROUP BY team1) AS t1 JOIN (SELECT team2, count(team2) AS c_t2 FROM ipl_m GROUP BY team2) AS t2 ON t1.team1 = t2.team2 JOIN (SELECT winner, count(winner) AS c_w FROM ipl_m GROUP BY winner) AS w ON t1.team1 = w.winner OR t2.team2 = w.winner ORDER BY won_percentage DESC;
Resulting Table:
team | played | won | won_percentage |
---|---|---|---|
Chennai Super Kings | 178 | 106 | 59.55056179775281 |
Mumbai Indians | 203 | 120 | 59.11330049261084 |
Delhi Capitals | 33 | 19 | 57.57575757575758 |
Sunrisers Hyderabad | 124 | 66 | 53.2258064516129 |
Kolkata Knight Riders | 192 | 99 | 51.5625 |
Table Definition:
CREATE TABLE ipl_m ( id integer PRIMARY KEY, match_id integer NOT NULL, city VARCHAR(20) NOT NULL, date DATE NOT NULL, player_of_match VARCHAR(50), venue VARCHAR(75) NOT NULL, neutral_venue BOOLEAN NOT NULL, team1 VARCHAR(50) NOT NULL, team2 VARCHAR(50) NOT NULL, toss_winner VARCHAR(50) NOT NULL, toss_decision VARCHAR(5) NOT NULL, winner VARCHAR(50), result VARCHAR(10), result_margin float, eliminator CHAR(1) NOT NULL, method VARCHAR(3), umpire1 VARCHAR(50), umpire2 VARCHAR(50) );