In my previous blog post C# to F#: My Initial Experience and Reflections I wrote about learning F# and converting a C# formula model into an F# formula model. As of writing my previous post the jury was still out on performance. I am very happy to say that I have some very quantifiable results and I’m ecstatic to announce that F# took C# to school!
The formula model we created can be found here. The structure is essentially: Model contains many Leagues. A League contains many Divisions. A Division contains many Teams. A Team plays at every Stadium thus creating many StadiumTeamData objects. Each Stadium contains details. In the excel file you’ll find 2 Team sheets, a LeagueSummary sheet, a Stadiums sheet, and a Stadium Schedules sheet. The Stadium Schedules sheet contains the schedule for each Stadium found in the Stadiums sheet which is only a list of Stadiums and their details. Each Team sheet contains StadiumTeamData (a row of data) which is the lowest form of calculation in this model. The LeagueSummary sheet sums the 2 Team sheets and calculates 10 years of data which can be used to create a chart. Our sample apps do not chart as our test was not about prettiness, but rather about performance. The excel model is a very simple model. It was used only to prove the calculations were being performed correctly. In the source code included at the end of this article you will notice the existence of 2 data providers: MatchesModelNoMathDataProvider and PerformanceTestNoMathDataProvider. The matches model provider matches the excel scenario with 1 League, 1 Division, and 2 Teams and 2 Stadiums and a single mandatory Theoretical Stadium. The Los Angelos Stadium is ignored in code. The performance model however has 2 Leagues. Each League has 9 Divisions. Each Division has 10 Teams. Each Team references 68 Stadiums. There is also a single mandatory Theoretical Stadium. This gives a grand total of 12,240 StadiumTeamData instances. These instances represent the bulk of the formula work and in the case of PDFx the bulk of property dependency registrations.
C# and PDFx
The first implementation we created was in C# and uses the PDFx (Property Dependency Framework). This implementation represents the pattern we have used for the last year for client implementations. Due to familiarity this implementation took about 16-24 hours to implement. Which is pretty fast. This is why we really like the PDFx. It helps to simplify implementation in C#. Because PDFx is a pull based approach no custom events are required. The PropertyChanged even triggers everything under the hood for the PDFx. There is a catch though. This means that each property in a chain of dependent properties will raise the PropertyChanged event. In our example of 12,240 StadiumTeamData instances this means that PropertyChanged is called roughly 500,000 times just on the first calculation of top level data. With all of the properties in existence properties are accessed 2,487,431 times and of those 1,176,126 are doing work to setup the required property dependency registrations. So at the end of the day the C# with PDFx implementation takes about 55 seconds to load the object model and another 24 seconds to run the first calculation for a grand total of 79 seconds to load the application. Another really bummer part of PDFx and that it’s currently not thread safe so it must run on the UI thread which means that for about 1:20 the application looks like it’s not doing anything. Very bad, very very bad! On top of that each time we change a value via slider on a single StadiumTeamData it takes about 6 seconds to finish calculating. Again blocking the UI thread. A very important detail to note is that when a single StadiumTeamData has an input value change only objects that depend on that StadiumTeamData and objects that depend on those object etc. are recalulated. This means that out of 12,240 StadiumTeamData instances only 1 is being recalculated and only 1 team, 1 division, 1 league, and the top level values of the formula model are being recalculated. We have been trying to improve PDFx performance for some time now, and we have a few more tricks up our sleeves, but most of the tricks are around load time not calculation time.
After listening to a ton of .NET Rocks recently I’ve learned a lot about F#. I was so intrigued that I set out to create an F# implementation of the same formula model we created in C# and PDFx. The implementation took about 32 hours, but that’s also with a ton of research. By the end I think I could have written the entire thing in less than 16 hours which would be less time than the C# and PDFx implementation. I learned that functional programming lends itself to parallelization more than object oriented programming. Due to the fact that functional programming encourages an approach of not modifying values because everything is immutable by default the F# implementation can be run on a background thread as well. The cool part about all of this is the theory that many sub calculations can be run at the same time then aggregate the output to run a final answer calculation. Our current formula model is perfect for this approach. Because we no longer have a dependency on the PDFx to know when a property changes the PropertyChanged event is only raised once to trigger all calculations and is then only triggered once for each property that is updated by the output of the calculations so the UI will be able to respond. The object model takes a bit more than 1 second to load and the first calculation is done in another 2.5 seconds. The total load time is about 3.5 seconds. Compared to 79 seconds that’s 95% faster in F# just for load. Each subsequent calculation when a value changes via slider on a StadiumTeamData takes about 1.2 seconds. Compared to 6 seconds F# is about 80% faster for each calculation. Unlike the C# and PDFx implementation I have not optimized the F# formula model to only calculate the object tree that changed, instead all 12,240 StadiumTeamData instances are being recalculated each time and value changes in the entire object model. So we could still become more performant by only calculating the single StadiumTeamData that changed and the related team, division, league, and then the top level values of the formula model.
A complete breakdown of my comparisons can be found in this excel file. I wanted to call out a few very important results in this post to wrap things up.
I used to think that C# and PDFx was very readable. And while it is for very simple models it can get unwieldy. F# however is the clear winner here. I reduced lines of code by the hundreds. I can see one entire formula in one file which is compact enough to fit on my screen at one time, versus C# and PDFx which takes up multiple files due to multiple classes, and it requires me to do a lot of scolling due to the amount of lines a single property takes up. This seriously increases maintainability.
When it comes to performance C# and PDFx were blown out of the water. Application load time was improved by 95% and calculation time was improved 80%. This is serious business here!
Time to Implement
This is a slightly skewed comparison due to experience. I was impressed by the fact that C# and PDFx took 16-24 hours and F#, a brand new language, took only 32 hours. I am convinced that I can write F# faster than C# using PDFx on future projects.
I will be diligently searching for opportunities to use F# in production client code. It is a no brainer to me. I agree with the statement from many of .NET Rocks podcast guests talking about F# and functional programming, “Every software engineer should learn F#!” It just makes sense!
Source Code: Formula Implementation Proving Ground