Linear Programming (Optimization)

In this example, we will use an common analysis that is used to optimize decision based on constrains, and it is based on the great blog post How to Solve Optimization Problems with Python

Open In Studio Lab

import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

Loading the data

We will get the data from HTML table that are available from Rotoguru, that is gather stats data from fantasy league. We will focus on the data of the NBA players. To keep the consistency of the notebook, we will check the data for a specific week (‘2021-03-02’)

year = 2021
month = 3
day = 2
data_url = f'http://rotoguru1.com/cgi-bin/hyday.pl?mon={month}&day={day}&year={year}&game=fd'
dfs = (
    pd
    .read_html(data_url)
)

There are a few HTML table in the page, and after some scorlling we can see that the table with the players data is the 6th one (index=5):

  • Start with the dataframe at index 5 of the list of table from the HTML page above

  • Rename the column to be: Position, Name,FD Points, Salary, Team, Opp., Score, Min, Stats

  • Filter out rows with Position string longer than 2 characters (header lines)

  • Add a column called Points with the float value of the points to be used as the target of the optimization

  • Remove the dollar sign and commas ([$,]) from the Salary column and convert it to its integer value

all_players = (
    dfs[5]
    .rename(
        columns={
            0:'Position',
            1:'Name',
            2:'FD Points',
            3:'Salary',
            4:'Team',
            5:'Opp.',
            6:'Score',
            7:'Min',
            8:'Stats'
            }
    )
    .assign(position_length = lambda x : x.Position.str.len())
    .query('position_length <= 2')
    .assign(Points = lambda x : x['FD Points'].astype(float))
    .assign(Salary = lambda x : x.Salary.str.replace('[$,]','', regex=True).astype(int))
)
all_players
Position Name FD Points Salary Team Opp. Score Min Stats position_length Points
2 PG Morant, Ja^ 55 7100 mem @ was 125-111 33:02 35pt 5rb 10as 1st 4to 2trey 11-18fg 11-14ft 2 55.0
3 PG Westbrook, Russell^ 53.7 9600 was v mem 111-125 35:31 23pt 6rb 15as 3st 8to 3trey 8-16fg 4-9ft 2 53.7
4 SG George, Paul^ 45 7900 lac @ bos 112-117 38:39 32pt 5rb 4as 1st 2to 5trey 12-26fg 3-3ft 2 45.0
5 SG DeRozan, DeMar^ 43.3 8200 sas v nyk 119-93 29:44 10pt 4rb 11as 4st 2-5fg 6-6ft 2 43.3
6 PG Murray, Jamal^ 41.6 8500 den @ mil 128-97 37:59 24pt 3rb 6as 2st 1to 1trey 10-17fg 3-3ft 2 41.6
... ... ... ... ... ... ... ... ... ... ... ...
199 C Robinson, Mitchell 0 4500 nyk @ sas 93-119 NaN NaN 1 0.0
200 C Bryant, Thomas 0 6200 was v mem 111-125 NaN NaN 1 0.0
201 C Leonard, Meyers 0 3500 mia v atl 80-94 NaN NaN 1 0.0
202 C Gasol, Marc 0 4800 lal v pho 104-114 NaN NaN 1 0.0
203 C Oturu, Daniel 0 3500 lac @ bos 112-117 NaN NaN 1 0.0

198 rows × 11 columns

Installing the LP Library

Many times you can find a python library that is implementing many of the fucntions that you need. This is another major benefit of Python over close tools such as Excel.

pip install pulp
Requirement already satisfied: pulp in /opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages (2.6.0)
WARNING: There was an error checking the latest version of pip.

Note: you may need to restart the kernel to use updated packages.

Defining the problem variables

The most complicated part of the analysis is to understand how to map it into the domain of the algorithm.

The current problem that we want to solve using Linear Programming (LP) is what is the best roaster of NBA players we want to choose for our fantasy league. The output is this case is the list of the players names.

We will create a list of all the players and assign them either 1 (selected to the team) or 0 (not selected)

  • Create a Linear Programming variables

  • from a dictionary

  • with no prefix to the names

  • and index of \(x_i\) as the player names

  • and the value is an Integer

  • that is either 0 (not selected)

  • or 1 (selected)

from pulp import *

player_vars = (
    LpVariable
    .dicts(
        '', 
        list(all_players['Name']), 
        cat='Integer',
        lowBound=0, 
        upBound=1 
    )
)

Problem target

In the fantasy league games, we want to maximize the number of points that the player that we choose will score in the next game.

Let’s start with defining the LP problem as a maximization problem:

total_score = LpProblem('Fantasy Points Problem', LpMaximize)

Variable Dictionaries

The library is expecting to get a set of dictionaries each with the name of the player as the common index and the numeric value for each. We will do that for the Points (the target to maximize), the Positions (that we need to pick a couple of each position), and the Salaries (that we need to cap to a specific limit).

points = (
    all_players
    .set_index('Name')
    .to_dict()
    ['Points']
)
salaries = (
    all_players
    .set_index('Name')
    .to_dict()
    ['Salary']
)

Linear Programming (LP) Definition

The LP problem is defined as a set of linear expressions of the form:

\(𝑎_1𝑥_1+𝑎_2𝑥_2+𝑎_3𝑥_3+...𝑎_𝑛𝑥_𝑛 {<=,=,>=} 𝑏\)

Where \(x_1, x_2 ... x_n\) are the decision variables we want to choose (the player names), and \(a_1, a_2, ... a_n\) are the objective and constrains variables we have in our data.

First, we add the LP sum of the points where \(a_i\) is the points each player scored, and \(x_i\) is the player:

total score = \(\sum p_i*x_i\)

  • \(p_i\) - points scored by player\(_i\)

  • \( x_i = \begin{cases} 1 & \text{if player in team} \\ 0 & \text{if player not in team} \end{cases} \)

total_score += (
    lpSum(
        [
            points[player] * player_vars[player] 
            for player in player_vars
        ]
    )
)

In a real application we will want to have the projected number of points and not the sum of past points, and we will calculated a forecasting model that can identify trends or recent injuries. But for this simple example, we will take that sum as a good estimator for their expected points in the next game.

Second, we add the constrain the salaries where \(a_i\) is the salary of each player, and \(x_i\) is the player, and the constrain value for the sum of all the salaries of the players we choose not to exceed $60,000:

\(\sum s_i*x_i <= 60,000\)

  • \(s_i\) - salary of player\(_i\)

  • \( x_i = \begin{cases} 1 & \text{if player in team} \\ 0 & \text{if player not in team} \end{cases} \)

total_score += (
    lpSum(
        [
            salaries[player] * player_vars[player] 
            for player in player_vars
        ]
    ) <= 60_000
)

Last, we want to make sure that we have the right number of players in each position in the fantasy team. We will create a list of players for each of the positions: Point Guard (PG), Shooting Guard (SG), Small Forward (SF), Power Forward (PF), and Center (C)

pg = (
    all_players
    .query("Position == 'PG'")
    ['Name']
    .to_list()
)
sg = (
    all_players
    .query("Position == 'SG'")
    ['Name']
    .to_list()
)
sf = (
    all_players
    .query("Position == 'SF'")
    ['Name']
    .to_list()
)
pf = (
    all_players
    .query("Position == 'PF'")
    ['Name']
    .to_list()
)
c = (
    all_players
    .query("Position == 'C'")
    ['Name']
    .to_list()
)

And add the constrains to have exactly two PG, two SG, two SF, two PF and one C:

\(\sum x_i = 2 , \text{if player is in PG, SG, SF, PF} \\ \sum x_i = 1 , \text{if player is in C} \)

  • \( x_i = \begin{cases} 1 & \text{if player in team} \\ 0 & \text{if player not in team} \end{cases} \)

# Set Constraints
total_score += lpSum([player_vars[player] for player in pg]) == 2
total_score += lpSum([player_vars[player] for player in sg]) == 2
total_score += lpSum([player_vars[player] for player in sf]) == 2
total_score += lpSum([player_vars[player] for player in pf]) == 2
total_score += lpSum([player_vars[player] for player in c]) == 1

LP Solution

Finally, we can solve the set of linear expressions:

total_score.solve()
Welcome to the CBC MILP Solver 
Version: 2.10.3 
Build Date: Dec 15 2019 

command line - /opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/pulp/apis/../solverdir/cbc/linux/64/cbc /tmp/b3c16ae7c75f40999454a5bc4373562a-pulp.mps max timeMode elapsed branch printingOptions all solution /tmp/b3c16ae7c75f40999454a5bc4373562a-pulp.sol (default strategy 1)
At line 2 NAME          MODEL
At line 3 ROWS
At line 11 COLUMNS
At line 927 RHS
At line 934 BOUNDS
At line 1133 ENDATA
Problem MODEL has 6 rows, 198 columns and 396 elements
Coin0008I MODEL read with 0 errors
Option for timeMode changed from cpu to elapsed
Continuous objective value is 406.74 - 0.00 seconds
Cgl0003I 0 fixed, 2 tightened bounds, 0 strengthened rows, 0 substitutions
Cgl0003I 0 fixed, 1 tightened bounds, 0 strengthened rows, 0 substitutions
Cgl0004I processed model has 6 rows, 145 columns (145 integer (139 of which binary)) and 290 elements
Cutoff increment increased from 1e-05 to 0.0999
Cbc0038I Initial state - 2 integers unsatisfied sum - 0.2
Cbc0038I Solution found of -405.2
Cbc0038I Cleaned solution of -405.2
Cbc0038I Before mini branch and bound, 143 integers at bound fixed and 0 continuous
Cbc0038I Full problem 6 rows 145 columns, reduced to 0 rows 0 columns
Cbc0038I Mini branch and bound did not improve solution (0.00 seconds)
Cbc0038I Round again with cutoff of -405.5
Cbc0038I Reduced cost fixing fixed 138 variables on major pass 2
Cbc0038I Pass   1: suminf.    0.03896 (2) obj. -405.5 iterations 1
Cbc0038I Pass   2: suminf.    0.20000 (2) obj. -406.74 iterations 4
Cbc0038I Pass   3: suminf.    0.20000 (2) obj. -406.74 iterations 1
Cbc0038I Pass   4: suminf.    0.20000 (2) obj. -406.74 iterations 1
Cbc0038I Pass   5: suminf.    0.20000 (2) obj. -406.74 iterations 1
Cbc0038I Pass   6: suminf.    0.03896 (2) obj. -405.5 iterations 1
Cbc0038I Pass   7: suminf.    0.20000 (2) obj. -406.74 iterations 1
Cbc0038I Pass   8: suminf.    0.03896 (2) obj. -405.5 iterations 1
Cbc0038I Pass   9: suminf.    0.03896 (2) obj. -405.5 iterations 0
Cbc0038I Pass  10: suminf.    0.20000 (2) obj. -406.74 iterations 2
Cbc0038I Pass  11: suminf.    0.20000 (2) obj. -406.74 iterations 1
Cbc0038I Pass  12: suminf.    0.03896 (2) obj. -405.5 iterations 1
Cbc0038I Pass  13: suminf.    0.03896 (2) obj. -405.5 iterations 0
Cbc0038I Pass  14: suminf.    0.03896 (2) obj. -405.5 iterations 0
Cbc0038I Pass  15: suminf.    0.03896 (2) obj. -405.5 iterations 0
Cbc0038I Pass  16: suminf.    0.03896 (2) obj. -405.5 iterations 0
Cbc0038I Pass  17: suminf.    0.20000 (2) obj. -406.74 iterations 2
Cbc0038I Pass  18: suminf.    0.03896 (2) obj. -405.5 iterations 2
Cbc0038I Pass  19: suminf.    0.20000 (2) obj. -406.74 iterations 1
Cbc0038I Pass  20: suminf.    0.20000 (2) obj. -406.74 iterations 1
Cbc0038I Pass  21: suminf.    0.03896 (2) obj. -405.5 iterations 2
Cbc0038I Pass  22: suminf.    0.20000 (2) obj. -406.74 iterations 1
Cbc0038I Pass  23: suminf.    0.03896 (2) obj. -405.5 iterations 2
Cbc0038I Pass  24: suminf.    0.03896 (2) obj. -405.5 iterations 1
Cbc0038I Pass  25: suminf.    0.03896 (2) obj. -405.5 iterations 0
Cbc0038I Pass  26: suminf.    0.03896 (2) obj. -405.5 iterations 0
Cbc0038I Pass  27: suminf.    0.03896 (2) obj. -405.5 iterations 0
Cbc0038I Pass  28: suminf.    0.03896 (2) obj. -405.5 iterations 0
Cbc0038I Pass  29: suminf.    0.03896 (2) obj. -405.5 iterations 0
Cbc0038I Pass  30: suminf.    0.03896 (2) obj. -405.5 iterations 0
Cbc0038I No solution found this major pass
Cbc0038I Before mini branch and bound, 143 integers at bound fixed and 0 continuous
Cbc0038I Mini branch and bound did not improve solution (0.01 seconds)
Cbc0038I After 0.01 seconds - Feasibility pump exiting with objective of -405.2 - took 0.00 seconds
Cbc0012I Integer solution of -405.2 found by feasibility pump after 0 iterations and 0 nodes (0.01 seconds)
Cbc0001I Search completed - best objective -405.2, took 0 iterations and 0 nodes (0.01 seconds)
Cbc0035I Maximum depth 0, 131 variables fixed on reduced cost
Cuts at root node changed objective from -406.74 to -405.2
Probing was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Gomory was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Knapsack was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Clique was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
MixedIntegerRounding2 was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
FlowCover was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
TwoMirCuts was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
ZeroHalf was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)

Result - Optimal solution found

Objective value:                405.20000000
Enumerated nodes:               0
Total iterations:               0
Time (CPU seconds):             0.00
Time (Wallclock seconds):       0.01

Option for printingOptions changed from normal to all
Total time (CPU seconds):       0.00   (Wallclock seconds):       0.01
1

And filter the list of the possible players to the ones that were selected (their variable value is 1):

optimized_roster = ( 
    list(
        filter(
            lambda x : x.varValue == 1, 
            total_score.variables()
        )
    )
)
optimized_roster
[_Barton,_Will^,
 _George,_Paul^,
 _James,_LeBron^,
 _Jokic,_Nikola^,
 _Lyles,_Trey^,
 _Melton,_De'Anthony,
 _Morant,_Ja^,
 _Quickley,_Immanuel,
 _Saric,_Dario]

We will extract the name of the player from the variables of the linear optimizer, and clean up the “strange” characters.

optimized_roster_names = [
    name
    .name
    .replace('_',' ')
    .strip() 
    for name in optimized_roster
]

The list of the players above is changing every week based on their recent performance, and we can execute this analysis every time before we define our fantasy team for the next round.

We can also get the table that we used as the input for the optmization process and filter it show more details on each of our players. We will also calculate the sum of the salaries to verify that we are below the maximum budget of 60,000.

(
    all_players
    .query('Name in @optimized_roster_names')
    [['Position','Name','FD Points','Salary']]
    .pivot_table(
        index=['Name','Position','FD Points'],
        margins=True,
        margins_name='Total', 
        aggfunc=sum
    )
)
Salary
Name Position FD Points
Barton, Will^ SF 35.1 4900
George, Paul^ SG 45 7900
James, LeBron^ SF 59 10800
Jokic, Nikola^ C 69.5 11000
Lyles, Trey^ PF 33 4000
Melton, De'Anthony PG 38.3 4600
Morant, Ja^ PG 55 7100
Quickley, Immanuel SG 37.8 5100
Saric, Dario PF 32.5 4100
Total 59500