Making Data Driven Decisions in Football Manager — How I use Python and PostgreSQL to Manage My Team

mewteebee_
15 min readMay 14, 2023

--

So here I sit. At the end of my 37th season in charge of Real Madrid (2058/59). Suffices to say. This season did not quite go as planned.

In the league, we finished with 100 points… 100 points. One-zero-zero. ONE (1). HUNDRED (00). Points. But in the cruellest twist of fate our Catalan rivals still managed to beat us to the title with 105.

LaLiga Table 2058/59

To add insult to injury, we were leading the charge for almost 2/3rds of the season and the bad news doesn’t even stop there either. All things being equal this 100 point, 118 goals scored, 26 goal conceded season would still qualify as a resounding success in my book — if — it had come along with some silverware on the side.

But alas.

Real Madrid Cup Competition Performance 58/59 season

Suffices to say we all but sh#t the bed on that front too.

Disappointed as I may be at this point in time. This season was not a complete failure. I was at least able to pick up a few decent signings in the winter transfer window that will prove to be very good acquisitions long-term. However in the short-term, our primary goal is to close the gap we allowed to grow between ourselves and Barca. In order to ensure we’re able to improve upon the season just past. I have decided to outsource the responsibility of our data analysis to well… erm… me. Except instead of carrying out all of our data analysis in-house (within the game). I will be using the print-screen feature to output my players’ season stats to an Excel spreadsheet which I will then convert into a CSV file that can be read and stored within a PostgreSql database. The purpose of this is to create an out-of-game data centre which I can then access through Jupyter Notebooks and thus use the multitude of data analysis and evaluation tools the Python language has to offer. In doing things, I should be able to gain a better understanding of my players, their abilities and the best ways I can put their talents to use.

So we begin!

Step one is to educate myself on the necessary steps to get this done.

To ChatGPT we go!

How to access your PostgreSql database with Python for data evaluation!

According to ChatGPT the process for accessing data stored within a PostgreSql database in Jupyter notebooks is relatively straightforward. However before we even do that. We must import our CSV to Postgres and create a table that we could later access and play around with in Python.

The CSV in question — I’m so deep into this save that all of the players are ‘newgens’

The DB manager we’ll be using to achieve this is PgAdmin, it’s my personal favourite DB manager and one I’d hugely recommend to anyone looking to work with data management and querying for the first time.

PgAdmin v 4.6 — please ignore the time of day I’m doing all of this at

Following the directions of our favourite Natural Language Processing model. First course of action is to connect to your server — you may be required to create a server connection on your first opening of PgAdmin, however I’ve had one up and running for a minute now so I’m gonna go straight to the table creation/CSV import step.

Table creation

The naming and database creation step turned out to be far more involved than I imagined it would be. I imagined it would simply be a case of importing the CSV into PgAdmin and voila the behind the scenes magic would handle the rest for me. However, unfortunately in order to create a database — even one you plan on inserting ready made data to from a CSV file — you must create appropriate tables with appropriate columns to be populated later. Furthermore you must match your original data to fit the SQL compliant data types. I won’t go into detail regarding SQL data types here because it isn’t the most riveting of conversation topics but here is a link that should allow you to familiarise yourself with them should you feel the need to: PostgreSQL: Documentation: 15: Chapter 8. Data Types.

Long and tedious process but small price to pay — left is the naming window, right is the full SQL command for the table creation.

Now that we’ve created the table, we’re ready to populate the columns with the data we’ve drawn from FM. This is fairly intuitive, just right-click your table and click import/export. Select the CSV file’s path and then map each column in the newly created table to its corresponding column in the CSV.

It is extremely important that the number of columns you’re importing matches the number of columns in your CSV file

Assuming you’ve imported the file correctly, you should now be able to run SQL queries on your FM data through PgAdmin’s built-in query tool.

From FM to SQL in just a few simple steps :)

Ok, now that I’ve successfully imported my data into PostgreSql, the next step is accessing this data through Jupyter Notebooks. In order to do this I’ll need to import the psycopg2 library into my Jupyter Notebooks instance using the pip installer.

The steps as detailed by ChatGPT

Once connected, all that’s left is to create a cursor that will allow us to navigate the data stored within our database. With this cursor we will be able to execute SQL queries from within Jupyter Notebooks.

Using a ‘for-loop’ to iterate over all of the rows within the selected query

Et fin. You now have total access to your PostgreSql data through Jupyter Notebooks!

BUT WAIT! HOW HAS THIS IMPROVED YOUR TEAM!?

Excellent question attentive audience member. This by itself won’t improve my team. To improve my team, first course of action should be to see where we fell short in our bid to secure the LaLiga title.

LaLiga defensive records for the 58/59 season

The first and most obvious difference between ourselves and our title competitors is our respective ability to protect our goals. Over the course of 38 games FC Barcelona only conceded a frankly staggering 15 goals compared to our 26. Under normal circumstances, 26 goals conceded in 38 games would be a very very good defensive season. However, in this case, it puts us closer to 5th placed Valencia, who conceded 33, than to 1st placed Barca who conceded 11 fewer goals than we did.

The reason for this, I believe, is the formation I chose to deploy for this season. In a bid to be innovative, I chose to forgo deploying any defensive roles in my formation as I believed by adding more attacking bodies into our formation we would be able to win the title by simply outscoring every team we came across. However, injuries to key players throughout the season meant that often times the attack did not function at full capacity. In these games we would concede without being able to nullify our opponents by relentlessly attacking their goal.

To rectify this, for the upcoming season, I will be returning to a four-at-the-back formation. The objective being to ensure we’re able to maintain a strong defence even in games where our attack isn’t firing on all cylinders.

Left: Last season, Right: Next season

Hopefully the addition of full backs to the lineup, with the same three-man midfield spine that carried me through the previous season, along with the delegation of some defensive responsibility to my goalkeeper will help bring us closer to Barca’s defensive record next season.

Now, with my new shape in place, it’s time to decide who will play and where. The goalkeeper position essentially selects itself, between our two goalkeepers (Marcelo Pacheco and Abdoul Aziz Guira) the latter managed to maintain an almost identical average performance rating cross the entire season despite his far larger sample size.

Using the MatPlotLib library to generate some basic visualisations to aid our decisions 🤓

Along with our goalkeepers, we need four solid defenders to ensure we can keep our opponents out of our area. How can we decide on who deserves these spots? Well given I like to play in a very attacking, high-press possession-dominant style of football. The areas I would need my defenders to excel in would be their ability to win the ball in the air, their ability to play the ball to their teammates with their feet, their ability to cover a lot of ground and of course their ability to win the ball back from opponents on the ground whether by way of a tackle or by way of an interception.

So let’s have a look!

A data frame produced by the ‘Pandas’ library

Isolating our defensive personnel as we have here allows us to contrast and compare them according to the criteria I mentioned before and see which back four combination would best be able to protect our goal both in the air and on the ground as well as cover a lot of ground per game and of course facilitate our play out from the back with their passing.

In this graphic, right off the bat we can see Mihael Yakin is levels above his peers with regards to his ability in the air averaging 11.65 successful headers per game. On top of this, the Swiss also completes an impressive 70 passes per game, covers 11 km and manages almost 4 interceptions. With these statistics I don’t believe it’s far fetched to say he may be the key to closing the gap between ourselves and Barca next season.

Alongside Yakin, our next best aerial duel-er is Besmir Hoxha. A young Kosovar centerback who spent last season on loan with BSC Young Boys. While his ability in the air — especially at his age — is incredibly impressive, he falls short with his passing completing only 50 passes a game and covering only 10 km per 90 as well. Thus we’ll be going with Alfredo Pfeiffer (9.8 headers, 11.6 km, 66 passes and 3.6 interceptions per 90) next to his Swiss compatriot in the centerback position instead.

For the fullback positions our aim is to find players who possess the same ball winning qualities as our main central pairing but with greater emphasis on their ability to cover ground. Unlike centerbacks, fullbacks are required to marshal the entire flank they occupy. Both defensively as well as supplying support to attacking players when in possession. For this reason the two most ideal candidates for this role as per the last graphic would be Julien Blanc and Luca Vangheluwe. Between them the two average a combined 24 km per 90 of ground coverage with 12.1 and 12.2 km covered respectively. In addition to their incredible endurance the pair also average the highest rate of interceptions of all defensive players in our squad with 4 and 5 per 90 respectively proving their reliability in ground duels. Furthermore both also exhibit considerable strength in aerial situations as well with each boasting just under 10 aerial duels won per game.

Our back five for the 59/60 season

With all this taken into account, this is how our backline will be lining up come opening night of the 2059/60 season. With our back four and goalkeeper established we can now move onto our defensive midfield.

For defensive midfield, the criteria we are looking for is vaguely similar to that of our defensive line. These players will also need to be able to cover a lot of ground and reliably win the ball back from our opponents in the middle of the park — albeit primarily with their feet rather than in the air. However in addition to their defensive duties defensive midfielders must also be very useful with the ball at their feet. So we’ll need a couple of high-volume passers who are capable of making things happen in possession. As it happens, just by looking at the previous chart we can identify Federico and Uros Duranovic as ideal candidates for this role.

Federico and Uros statistics as compared to our other defensive personnel

If we are to take a closer look at their statistics we can see that the pair of them boast impressive 70 pass per 90 averages along with a strong 2.08 and 2.39 tackles per 90 average respectively. Not only does this mean that they both check the box for being high-volume passers as 70 passes per 90 places them in the top 3 for that category, but the fact that they boast such high tackle numbers in tandem with relatively weak aerial duel numbers indicates that they would be far better suited to the defensive midfield role where the majority of their duels will be ground based rather than aerial. Pairing this with the fact that the two of them cover 13 km of ground per game on average it is safe to say that their credentials — at least on paper — suit our midfield vacancies down to the ground. However as stated before, in addition to being high-volume passers, defensive midfielders must also be able to make things happen in games. It’s all well and good being able to lay off the ball to your teammates 70 odd times a game but when we’re under pressure and in need of some inspiration will they be able to deliver? To gain insight on whether or not this is the case, there is one other category we must look at before making a final decision.

The graph above illustrates the number of passes each of our defensive players completes per game against the number of key passes (a pass that directly leads to a shot/attempt on goal) they complete. The goal here being to select players who are capable of both dictating the team’s play from the midfield whilst also providing incisive passing from deep. In the top left of the graph we can see Shogo Nagai soaring high above everyone else on the graph with his just over 1.4 key passes per game. Whilst this indicates the Japanese youngster is a highly creative deep midfield player capable of generating chances with his passing on a regular basis. His 55 passes per game clip unfortunately disqualifies him from potentially holding a starting position within our lineup. As given our style of play we need DMs capable of producing tens to hundreds of passes per game on any given matchday.

As such, taking both passes per 90 and key passes per 90 into account. The best options to start for us in the middle of the park come next season are indeed Federico and Uros Duranovic.

Now that we’ve laid our foundation, it’s time to fill out our attack for the new season.

Our attacking line consists of four parts, a playmaking winger on the right, an inside forward on the left, a traditional striker up top and a roaming central midfielder to connect the defence to the attack. To begin with, we’ll start by profiling the players we want in our central midfield and winger positions as they’ll be the our primary contributors in terms of ball progression and creativity.

Real Madrid Attacking Players Dribbles vs Key Passes

In the graphic above we can see our attacking player’s dribble numbers as compared to their key pass numbers. Right off the bat we can see two significant outliers in the dribble department. The first we have the Spaniard Ximo with the second being the Cameroonian Pierre Moukoko. The pair of them boast a very impressive 3.48 dribbles per 90. The ability to carry the ball past opposing defenders with frequency and efficiency is an extremely valuable skill for attacking wide players as they frequently find themselves in situations where they’re isolated against opposing fullbacks where the ability to beat them one-on-one can disrupt opposition defences and thus open up a plethora of goal scoring opportunities. In addition to their dribbling ability, Ximo and Moukoko also display considerable creative chops. Ximo the lesser of the two lays on 2.1 key passes per game on average while his counterpart lays on a massively outlying 6.7.

With this discrepancy in their creative abilities it seems logical to deploy Ximo in the left-sided inside forward role where he’ll primarily be responsible for cutting inside from the flank, taking on opponents with his dribbling ability and linking up with the central striker. Moukoko on the other hand will take up the right-sided advanced playmaker role, where he’ll be tasked with not only cutting inside from the right but also dropping deeper to collect the ball from the midfield then looking to create scoring opportunities for either the striker or his opposite winger with his creative passing.

For the central midfield role, I’ve narrowed our options down to two.

Simone Hamde and Bogdan Cenkov performance in attacking statistics

The options are Simone Hamde and Bogdak Cenkov who — as can be seen in the table above — profile very similarly in their stats. Both possess great ability to create from their passing, both exhibit goal threat, neither are particularly strong in the dribbling department and both are relatively high volume passers who complete their passes at very high rates. With profiles this similar, the decision comes purely down to preference in this case. I’ve gone with Simone Hamde purely because he takes more shots per game than his peer, illustrating a far greater tendency towards directness in his attacking play.

Finally, we’re left with only the striker position to fill. For sake of transparency, I was always just going to choose whichever player scored the most goals last season to fill this position. BUT, as we’re here already and my PgAdmin and Jupyter Notebook are both already open. I’ll humour myself with a little bit of statistical analysis for statistical analysis’ sake.

Striker statistics

Above we have our three primary goal-getters from last season along with some of their key attacking statistics. Suffices to say we were blessed with a dearth of attacking talent. However, where last season saw us deploy a two-man frontline allowing for the burden of responsibility to be shared between two strikers. Next season we’re looking to concentrate all of our chance creation towards one focal point.

Radar Chart featuring our three strikers — margins finer than expected

With our strikers statistics compiled onto a chart like this we can see that although there is considerable difference between their season totals with regards to goals, assists and overall output. The per 90 (per game) averages are surprisingly similar. As far as goal ouput goes, the three of them are separated by only 0.1 goal per 90. However it must be noted that despite this minute difference in goal scoring, Bruno Alves is the striker that sustained his performance over the largest sample size. Furthermore, Bruno is also the striker producing the highest number of shots per game with the lowest number of key passes per game. Unlike Caraballo and Moussima who enjoy engaging in linkup play and chance creation for their contemporaries. Bruno Alves is a pure gunman whose number one priority is evidently putting the ball in the back of the net. For this reason, I believe he is our best option to lead the line into the coming season.

Real Madrid Lineup 2059/2060 Season

With that, we have put together the best possible lineup — statistically speaking — to head into the new season with. All thanks to Python, Postgres and a long weekend with no plans.

--

--

mewteebee_
mewteebee_

Written by mewteebee_

Navigating the world of virtual football through data analysis - Arsenal / Real Madrid

Responses (1)