top of page

Expected Finish Position - Explained

Inspiration


Have you ever talked to a fan and heard them say "My driver just can't catch a break this year?" Or read a post-race interview about a driver saying, "We didn't finish where we should have," or "We had speed, just didn't get the finish." Is that true? Did you really deserve a better finish? Where exactly should you have finished? What about everyone else? I would bet that most drivers feel they under-performed after the race, while only an honest few would say they over-performed. And if I'm being honest, there really is no way to know who should finish where. NASCAR races are complex and full of variables, most of which are completely random (2012 Daytona 500, 2004 Topicana 400 Qualifying, and whatever this and this is should be proof enough). But that doesn't mean we can't try to figure it out though, right? I have made a very simplistic model that can analyze data throughout the race and determine where I would expect a driver to finish, aptly named Expected Finish Position; and this post is all about explaining how it works [hopefully] and it's interesting advantages and unfortunate inaccuracies.



Background


If I've learned anything through Engineering school, it's that you always, always, state your assumptions at the beginning of each problem. The first constraint was that I only looked at green flag laps, because who really cares how fast they go under caution. I also threw away the lap a caution was initiated, because depending on where the car was on track at the time of the yellow can drastically affect lap times. Contrarily, I did not get rid of any laps during any green flag pit stops, since it would be unfair if I threw out everyone's lap data because 1 car pitted.



Procedure


Alright here's the good stuff... First I took all the driver's lap times and ranked how each driver did compared to everyone else on the same lap, that way it is not affected by track temperatures, grip level, etc. Then each driver's ranking will get averaged for all laps they ran. This was going to be my stopping point, but there are a few things I did not like about it. The main issue being unless someone runs the fastest lap for every lap, there is no way they will get an average to be around 1 or 2. In fact, even on a day where 1 driver dominates and leads every lap, they will still probably have an average rank of 4-5. That means there would never be a driver that would be "expected" to finish first or second... What kind of race doesn't have a winner? The same happens on the other end where no one will run the slowest for every lap and therefore no one will be "expected" to finish last. The first workaround was to rank everyone's average rank and that would be the EFP. However, that created a model that did not consider how large the gap between averages was, which is illustrated later, and I didn't like that either.


Finally, by taking each drivers average rank, and the minimum and maximum averages, a simple interpolation could be done between that and the real field size which becomes their Expected Finish Position. For example, say someone's average is 12. An average of 12 might not be that bad if the minimum average of all drivers is 10 - that driver would be pretty fast. But 12 would be a lot different if there were 20 other drivers that were less than 12. What this method does is normalizes each driver's average to all other driver's - which will illustrate the variations as much as the underlying data does. The simpler ranking method does not capture that variation. I may have lost some of you there, and frankly reading this back to myself, I'm not sure that it fully makes sense to me, so let's look at some numbers.



Analysis



ree

The "Average" column is the calculated average of each driver's rank for the entire race. You can see how Harvick has the lowest at 4.74, and Hill has the highest at 35.88. This means Harvick was, on average, the 5th fastest car every lap. That obviously wouldn't make sense as his "expected" position, as described earlier, since he would then be expected to finish 5th, but there was no 1st-4th. The "Rank Avg" column is simply those averages ranked smallest to largest, and the "EFP" column is the Expected Finish Position found through the interpolation.


The Rank Avg does a good job of getting the overall trend, but misses the fine details of what the data is trying to show. Look at the difference between Harvick, Truex, and Logano. Their Avg ranks are 1-3, which looks like Logano finished 1 position behind Truex, who finished 1 position behind Harvick. Realistically, the difference between Harvick's and Truex's averages (4.74 and 4.88) is significantly different than Truex's and Logano's (4.88 and 6.29). Truex and Harvick basically were even, but both were consistently faster than Logano.


Now looking at those drivers with the EFP, Harvick and Truex were 1.0 and 1.2, which again are nearly the same, while Logano's is a distant 2.9. Really, the race should have been a toss up between Harvick and Truex since they ran very similar speeds. Another example can be seen in the middle of the pack. Ragan (23.7), Menard (23.9), Wallace (24.1), and Allmendinger (24.2) are all within 0.5 EFP, which means they basically all ran the same speed all race - whoever finished the best was either lucky or faster at the right time. This is much more useful than saying Ragan should be 24th, Menard 25th, Wallace 26th, and Allmendinger 27th (like the Rank Avg did) since it illustrates the importance of the variation. The last column shows the percent difference between the "Rank Avg" and "EFP," which can be as high as 41%!


Another advantage is that this will only look at laps completed by each driver, which will still give them an "expected" value even if they do not finish the race. A driver could run really fast lap times and then blow a motor and be done for the day. Realistically they would not get a good finishing result, which obviously does not illustrate how fast that car truly was. This method will look at the laps they did complete and can help show overall speed throughout a season much better than finishing position or average running position might try to.



Issues


The first [non] issue is that there are decimals. A driver can't finish 1.2th or 19.6th. Sure we could round up or down, but you would lose the variation. This isn't really an issue since we want the variation, but can be kind of confusing. The more impactful issue is that it tends to rank drivers faster than they are. Through 60+ race analysis' I've noticed that the slower cars are way slower than the norm. Since the interpolation needs to highest and lowest values to work properly, a drastically high or low value will cause everyone's rankings to be skewed. This is really only an issue for the bottom 5 or so cars since they are very, very slow. I will normally throw out the bottom numbers until I can get a nice linear curve, the average of all the values is close to the average finishing position (18 would be for 36 cars), and there isn't a obvious trend of many drivers over/under performing. This does also happen if a driver is way faster than everyone else, everyone will be skewed and look like they under-performed. That is a much smaller scale then from the similar effect from the slower cars and does also show how dominate the leader was. Of course, no math model or calculation ever is 100% accurate and this is no exception.


The goal is to provide a single number that is easy to compare, while still showing the relative variation of each driver's speed, and I think this does that. I do monitor these values after races and check to see how it is working for all types of tracks, series, and situations, so any changes in the workflow will be posted here as well.

Comments


Commenting on this post isn't available anymore. Contact the site owner for more info.

© 2023 by The Real Speed Blog

bottom of page