Union / Concat Tables

The concat function mimics the functionality of UNION in SQL queries and replaces the Merge functionality in Excel. It is a powerful function for dataframes in Pandas, that is mostly missing from Excel.

For this example, we will transform a table with game scores into a calculated standing table.

Open In Studio Lab

import pandas as pd

Loading raw table data

For this example, we will take an example of score table from Premier League in the UK, and calculate the standing table of that league using the league rules for points and rankings.

url = 'https://www.worldfootball.net/schedule/eng-premier-league-2020-2021-spieltag/1/'

import requests
from io import StringIO

response = requests.get(url)
response
<Response [200]>
scores = (
    pd
    .read_html(
        StringIO(
            response.text
        )
    )
)

As you can see in the web site, the format of the table is simple, and we can parse it with:

  • Take the second (index of 1) HTML table on the web site

  • Keep only the columns 2, 4, 5

  • Rename the columns to reflect the names of the home and away teams and the score

  • Parse the score with the first (index of 0) character is the number of goals of the home team

  • and the third (index of 2) character is the number of goals of the away team

(
    scores[1]
    [[2,4,5]]
    .rename(columns={
        2: 'Home',
        4: 'Away',
        5: 'Score'
    })
    .assign(home_goals = lambda r: r.Score.str[0])
    .assign(away_goals = lambda r: r.Score.str[2])
)
Home Away Score home_goals away_goals
0 Fulham FC Arsenal FC 0:3 (0:1) 0 3
1 Crystal Palace Southampton FC 1:0 (1:0) 1 0
2 Liverpool FC Leeds United 4:3 (3:2) 4 3
3 West Ham United Newcastle United 0:2 (0:0) 0 2
4 West Bromwich Albion Leicester City 0:3 (0:0) 0 3
5 Tottenham Hotspur Everton FC 0:1 (0:0) 0 1
6 Sheffield United Wolverhampton Wanderers 0:2 (0:2) 0 2
7 Brighton & Hove Albion Chelsea FC 1:3 (0:1) 1 3
8 Burnley FC Manchester United 0:1 (0:0) 0 1
9 Manchester City Aston Villa 2:0 (0:0) 2 0

Now we have the scores of the first round with the hosting team (Home), the visiting team (Away) and the number of goals that each team scored.

Concatinate multuple tables

We want to concatinate the scores of multiple rounds. Using a for-loop we will load the results of each round and add them to a list of rounds.

round_tables = []
for round in range(1,5):
    url = f'https://www.worldfootball.net/schedule/eng-premier-league-2020-2021-spieltag/{round}/'
    response = requests.get(url)
    round_table = (
        pd
        .read_html(
            StringIO(
                response.text
            )
        )
        [1]
        [[2,4,5]]
        .rename(columns={
            2: 'Home',
            4: 'Away',
            5: 'Score'
        })
        .assign(home_goals = lambda r: r.Score.str[0].astype(int))
        .assign(away_goals = lambda r: r.Score.str[2].astype(int))
    )
    round_tables.append(round_table)
    

Concatinate the rounds scores together

In Excel it is not easy to stitch multiple tables together into a single larger table. Nevertheless, with Pandas it is easy, as we can see in the following example:

  • Concatinate the list of round scores

  • Calculate the number of points of the home team (3 for a win, 1 for a draw)

  • Calculate the number of points of the away team

import numpy as np

scores_table = (
    pd
    .concat(round_tables)
    .assign(home_points = lambda r : np.select(
            [
                r.home_goals > r.away_goals, 
                r.home_goals == r.away_goals, 
            ], 
            [
                3, 
                1
            ], 
            default=0
        )
    )
        .assign(away_points = lambda r : np.select(
            [
                r.home_goals < r.away_goals, 
                r.home_goals == r.away_goals, 
            ], 
            [
                3, 
                1
            ], 
            default=0
        )
    )

)

scores_table
Home Away Score home_goals away_goals home_points away_points
0 Fulham FC Arsenal FC 0:3 (0:1) 0 3 0 3
1 Crystal Palace Southampton FC 1:0 (1:0) 1 0 3 0
2 Liverpool FC Leeds United 4:3 (3:2) 4 3 3 0
3 West Ham United Newcastle United 0:2 (0:0) 0 2 0 3
4 West Bromwich Albion Leicester City 0:3 (0:0) 0 3 0 3
5 Tottenham Hotspur Everton FC 0:1 (0:0) 0 1 0 3
6 Sheffield United Wolverhampton Wanderers 0:2 (0:2) 0 2 0 3
7 Brighton & Hove Albion Chelsea FC 1:3 (0:1) 1 3 0 3
8 Burnley FC Manchester United 0:1 (0:0) 0 1 0 3
9 Manchester City Aston Villa 2:0 (0:0) 2 0 3 0
0 Everton FC West Bromwich Albion 5:2 (2:1) 5 2 3 0
1 Leeds United Fulham FC 4:3 (2:1) 4 3 3 0
2 Manchester United Crystal Palace 1:3 (0:1) 1 3 0 3
3 Arsenal FC West Ham United 2:1 (1:1) 2 1 3 0
4 Southampton FC Tottenham Hotspur 2:5 (1:1) 2 5 0 3
5 Newcastle United Brighton & Hove Albion 0:3 (0:2) 0 3 0 3
6 Chelsea FC Liverpool FC 0:2 (0:0) 0 2 0 3
7 Leicester City Burnley FC 4:2 (1:1) 4 2 3 0
8 Aston Villa Sheffield United 1:0 (0:0) 1 0 3 0
9 Wolverhampton Wanderers Manchester City 1:3 (0:2) 1 3 0 3
0 Brighton & Hove Albion Manchester United 2:3 (1:1) 2 3 0 3
1 Crystal Palace Everton FC 1:2 (1:2) 1 2 0 3
2 West Bromwich Albion Chelsea FC 3:3 (3:0) 3 3 1 1
3 Burnley FC Southampton FC 0:1 (0:1) 0 1 0 3
4 Sheffield United Leeds United 0:1 (0:0) 0 1 0 3
5 Tottenham Hotspur Newcastle United 1:1 (1:0) 1 1 1 1
6 Manchester City Leicester City 2:5 (1:1) 2 5 0 3
7 West Ham United Wolverhampton Wanderers 4:0 (1:0) 4 0 3 0
8 Fulham FC Aston Villa 0:3 (0:2) 0 3 0 3
9 Liverpool FC Arsenal FC 3:1 (2:1) 3 1 3 0
0 Chelsea FC Crystal Palace 4:0 (0:0) 4 0 3 0
1 Everton FC Brighton & Hove Albion 4:2 (2:1) 4 2 3 0
2 Leeds United Manchester City 1:1 (0:1) 1 1 1 1
3 Newcastle United Burnley FC 3:1 (1:0) 3 1 3 0
4 Leicester City West Ham United 0:3 (0:2) 0 3 0 3
5 Southampton FC West Bromwich Albion 2:0 (1:0) 2 0 3 0
6 Arsenal FC Sheffield United 2:1 (0:0) 2 1 3 0
7 Wolverhampton Wanderers Fulham FC 1:0 (0:0) 1 0 3 0
8 Manchester United Tottenham Hotspur 1:6 (1:4) 1 6 0 3
9 Aston Villa Liverpool FC 7:2 (4:1) 7 2 3 0

Building Standing Table

We will concat again on the socres and calculate the standing table based on these scores.

  • We will concatinate the following two tables

  • First table, we will calculate the games, points, and goals for the home teams

  • Second table, we will repeat the process for the away teams

  • Next, we will sum the values from the two tables group-by each team

  • Calculate the goal difference that is used as tie breaker in case of equal points

  • Sort by the number of points and the goal difference

(
    pd
    .concat(
        [
            scores_table
            .rename(columns={'Home':'Team'})
            .groupby('Team')
            .agg(
                games = ('home_goals', 'count'), 
                goals = ('home_goals', 'sum'), 
                goals_against = ('away_goals', 'sum'), 
                points = ('home_points', 'sum')
            )
            ,
            scores_table
            .rename(columns={'Away':'Team'})
            .groupby('Team')
            .agg(
                games = ('away_goals', 'count'), 
                goals = ('away_goals', 'sum'), 
                goals_against = ('home_goals', 'sum'), 
                points = ('away_points', 'sum')
            )   
        ]
    )
    .groupby('Team')
    .sum()
    .assign(diff = lambda r : r.goals - r.goals_against)
    .sort_values(by=['points','diff'], ascending=False)
)
games goals goals_against points diff
Team
Everton FC 4 12 5 12 7
Aston Villa 4 11 4 9 7
Leicester City 4 12 7 9 5
Arsenal FC 4 8 5 9 3
Liverpool FC 4 11 11 9 0
Tottenham Hotspur 4 12 5 7 7
Chelsea FC 4 10 6 7 4
Leeds United 4 9 8 7 1
Manchester City 4 8 7 7 1
Newcastle United 4 6 5 7 1
West Ham United 4 8 4 6 4
Southampton FC 4 5 6 6 -1
Crystal Palace 4 5 7 6 -2
Wolverhampton Wanderers 4 4 7 6 -3
Manchester United 4 6 11 6 -5
Brighton & Hove Albion 4 8 10 3 -2
West Bromwich Albion 4 5 13 1 -8
Sheffield United 4 1 6 0 -5
Burnley FC 4 3 9 0 -6
Fulham FC 4 3 11 0 -8