Review: Dave Fancher’s “The Book of F#”

This book is fantastic! I had little F# experience going into this book and found the basics of the language easy to understand and fun to read. Dave does an excellent job explaining F# from syntax, types, and its functional nature all the way through complex topics like quoted expressions and asynchronous programming. The occasional Dr. Who reference is sure to catch the eye in his exciting code samples. As a C# developer I really appreciated Dave’s chapter that focused on comparing F#’s APIs to C#’s. I feel confident in adding F# to any of my pre-existing C# projects right away. One thing that I found very intriguing was Dave’s focus on using F# for regular application development not just math! F# is not just a niche language for the scientific world, but is a way to help develop extremely testable and reliable code in everyday applications.

I highly recommend Dave Fancher’s “The Book of F#” to every .NET developer. Even if the developer does not adopt F# the lessons taught by the language are invaluable and will make them a better .NET developer in whichever language they use.

C# to F#: I’m a Convert

In my previous blog post C# to F#: My Initial Experience and Reflections I wrote about learning F# and converting a C# formula model into an F# formula model. As of writing my previous post the jury was still out on performance. I am very happy to say that I have some very quantifiable results and I’m ecstatic to announce that F# took C# to school!

Formula Model

The formula model we created can be found here. The structure is essentially: Model contains many Leagues. A League contains many Divisions. A Division contains many Teams. A Team plays at every Stadium thus creating many StadiumTeamData objects. Each Stadium contains details. In the excel file you’ll find 2 Team sheets, a LeagueSummary sheet, a Stadiums sheet, and a Stadium Schedules sheet. The Stadium Schedules sheet contains the schedule for each Stadium found in the Stadiums sheet which is only a list of Stadiums and their details. Each Team sheet contains StadiumTeamData (a row of data) which is the lowest form of calculation in this model. The LeagueSummary sheet sums the 2 Team sheets and calculates 10 years of data which can be used to create a chart. Our sample apps do not chart as our test was not about prettiness, but rather about performance. The excel model is a very simple model. It was used only to prove the calculations were being performed correctly. In the source code included at the end of this article you will notice the existence of 2 data providers: MatchesModelNoMathDataProvider and PerformanceTestNoMathDataProvider. The matches model provider matches the excel scenario with 1 League, 1 Division, and 2 Teams and 2 Stadiums and a single mandatory Theoretical Stadium. The Los Angelos Stadium is ignored in code. The performance model however has 2 Leagues. Each League has 9 Divisions. Each Division has 10 Teams. Each Team references 68 Stadiums. There is also a single mandatory Theoretical Stadium. This gives a grand total of 12,240 StadiumTeamData instances. These instances represent the bulk of the formula work and in the case of PDFx the bulk of property dependency registrations.

Implementations

C# and PDFx

The first implementation we created was in C# and uses the PDFx (Property Dependency Framework). This implementation represents the pattern we have used for the last year for client implementations. Due to familiarity this implementation took about 16-24 hours to implement. Which is pretty fast. This is why we really like the PDFx. It helps to simplify implementation in C#. Because PDFx is a pull based approach no custom events are required. The PropertyChanged even triggers everything under the hood for the PDFx. There is a catch though. This means that each property in a chain of dependent properties will raise the PropertyChanged event. In our example of 12,240 StadiumTeamData instances this means that PropertyChanged is called roughly 500,000 times just on the first calculation of top level data. With all of the properties in existence properties are accessed 2,487,431 times and of those 1,176,126 are doing work to setup the required property dependency registrations. So at the end of the day the C# with PDFx implementation takes about 55 seconds to load the object model and another 24 seconds to run the first calculation for a grand total of 79 seconds to load the application. Another really bummer part of PDFx and that it’s currently not thread safe so it must run on the UI thread which means that for about 1:20 the application looks like it’s not doing anything. Very bad, very very bad! On top of that each time we change a value via slider on a single StadiumTeamData it takes about 6 seconds to finish calculating. Again blocking the UI thread. A very important detail to note is that when a single StadiumTeamData has an input value change only objects that depend on that StadiumTeamData and objects that depend on those object etc. are recalulated. This means that out of 12,240 StadiumTeamData instances only 1 is being recalculated and only 1 team, 1 division, 1 league, and the top level values of the formula model are being recalculated. We have been trying to improve PDFx performance for some time now, and we have a few more tricks up our sleeves, but most of the tricks are around load time not calculation time.

F#

After listening to a ton of .NET Rocks recently I’ve learned a lot about F#. I was so intrigued that I set out to create an F# implementation of the same formula model we created in C# and PDFx. The implementation took about 32 hours, but that’s also with a ton of research. By the end I think I could have written the entire thing in less than 16 hours which would be less time than the C# and PDFx implementation. I learned that functional programming lends itself to parallelization more than object oriented programming. Due to the fact that functional programming encourages an approach of not modifying values because everything is immutable by default the F# implementation can be run on a background thread as well. The cool part about all of this is the theory that many sub calculations can be run at the same time then aggregate the output to run a final answer calculation. Our current formula model is perfect for this approach. Because we no longer have a dependency on the PDFx to know when a property changes the PropertyChanged event is only raised once to trigger all calculations and is then only triggered once for each property that is updated by the output of the calculations so the UI will be able to respond. The object model takes a bit more than 1 second to load and the first calculation is done in another 2.5 seconds. The total load time is about 3.5 seconds. Compared to 79 seconds that’s 95% faster in F# just for load. Each subsequent calculation when a value changes via slider on a StadiumTeamData takes about 1.2 seconds. Compared to 6 seconds F# is about 80% faster for each calculation. Unlike the C# and PDFx implementation I have not optimized the F# formula model to only calculate the object tree that changed, instead all 12,240 StadiumTeamData instances are being recalculated each time and value changes in the entire object model. So we could still become more performant by only calculating the single StadiumTeamData that changed and the related team, division, league, and then the top level values of the formula model.

Results

A complete breakdown of my comparisons can be found in this excel file. I wanted to call out a few very important results in this post to wrap things up.

Readability

I used to think that C# and PDFx was very readable. And while it is for very simple models it can get unwieldy. F# however is the clear winner here. I reduced lines of code by the hundreds. I can see one entire formula in one file which is compact enough to fit on my screen at one time, versus C# and PDFx which takes up multiple files due to multiple classes, and it requires me to do a lot of scolling due to the amount of lines a single property takes up. This seriously increases maintainability.

Performance

When it comes to performance C# and PDFx were blown out of the water. Application load time was improved by 95% and calculation time was improved 80%. This is serious business here!

Time to Implement

This is a slightly skewed comparison due to experience. I was impressed by the fact that C# and PDFx took 16-24 hours and F#, a brand new language, took only 32 hours. I am convinced that I can write F# faster than C# using PDFx on future projects.

Next Steps

I will be diligently searching for opportunities to use F# in production client code. It is a no brainer to me. I agree with the statement from many of .NET Rocks podcast guests talking about F# and functional programming, “Every software engineer should learn F#!” It just makes sense!

Resources

Source Code: Formula Implementation Proving Ground

C# and PDFx Executable

F# Executable

Formula Excel Workbook

F# vs. C# Comparison Excel File

C# to F#: My Initial Experience and Reflections

When someone tells you, “You’re doin’ it wrong!” There is often a feeling to push back and get defensive. I’ve learned this is a worthless response. So when I was told that formulas should be written in F# not C# I took it to heart and decided to give F# and functional programming a whirl. While the jury is still out on performance, only because I haven’t completely finished my test app, F# has proven to be WAY more readable as far as formulas are concerned, and load time compared to my C# counter example is AWESOME! In this article I will discuss some high level comparisons and my approach so far.

Motivation

Here at InterKnowlogy we pride our selves on our ability to solve problems in the best way possible for our customers while still preserving readable, reusable, and elegant code. One problem that has come up over the last few years has been how to handle formulas. For example, if I have an Excel workbook with multiple sheets each with a set of complex formulas how do we transpose that workbook into code? My former co-worker and friend Kevin Stumpf in collaboration with the IK team developed what we call PDFx or the Property Dependency Framework. PDFx is a great succinct way to allow properties to depend on on properties by leveraging the INotifyPropertyChanged interface and a LINQ esque API. This was revolutionary for us. Instead of needing a completely separate formula calculator we could build our formulas directly into objects. We could create class for each sheet in an Excel workbook and have a property for each cell. This solution is elegant and easy to maintain. Features our previous implementations were not. This also allowed us the amazing capability to validate our formula output directly to cells in Excel. We have increased in productivity and the code is more reliable because of these things.

So what’s the problem? Performance! It’s not a matter of how complex the formula is, C# is lightning fast at math as far as we are concerned. However, each property that depends on another or more requires a property registration full of lambdas. Lambdas are slow, although you wouldn’t know it unless you were trying to call them about 2 Million times. With depth and width of our object model is really the problem here. Sheer number of instances each with N properties causes load time to be slow. Once loaded PDFx is awesome! Formulas run very quick and we have few grips about performance after load, but load time when accessing properties 2 Million times and running lambdas during that time can take about a minute.

Is there a better way? MAYBE! F# is our next attempt.

Why F# and Functional?

Honestly I’ve only started diligently listening to DotNetRocks in the last month. Shame on me I know. BUT! I backtracked some sessions on F# and Functional Programming. BEST… IDEA… EVER!!! I started looking into the capabilities of F# and feel strongly that I needed to get smart on the topic. We’ve been trying to optimize PDFx for some time now and have a few more tricks up our selves, but that doesn’t mean that it’s THE right way to do formulas. It’s certainly a really good one, but best? To be determined. Functional programming seems like a very intuitive way to handle formulas. After listening to how easy it is to parallelize work in F# I was sold. Functional programming forces you to think differently. Everything is immutable by default in F# and therefore the old ways in C# of looping and creating are over. This help preserve the integrity of your formulas and data which sounds great. Plus, would be no need for property registrations or lambdas and therefore load time, in theory, would only be bottlenecked by instantiating C# objects from the database. The question is runtime performance, which is still to be determined.

Approach – C# vs. F# Formula Comparison App

Since PDFx is open source we wanted to find a way to show it off in all it’s glory. We created a sample formula model in Excel and implemented it in C# using PDFx. The effort to create the workbook took longer than the 2 days it took me to implement the software solution. With this example we have been able to recreate the performance conditions that one of our clients has. Which is really great news! Because this means it is the perfect sample to implement in F# to see if it is more performant than it’s C# cousin. So far F# has taken me a longer period of time to implement, but only because it’s a new language. In fact, I already feel confident that I would be WAY faster in F# than in C# when dealing with formulas like this sample. I’m really excited to see the final results and of course the sample app and Excel workbook will be released as proving points. So look forward to more on this subject.

One major difference about using F# vs. C# & PDfx is that PDFx allows the formula model and the UI model to be the same. In F# this is not true. You need separation between your formula model, which is really the formula inputs and functions, and the UI model. For F# the UI model in our sample app has two mechanisms: GetFormulaInputs(), and ApplyFormulaOutputs(). The F# formula model has 2 main calculations that are called easily from C# with the inputs provided by GetFormulaInputs(). The F# formula model spits out outputs which are then applied in C# via the ApplyFormulaOutputs() method. I’m very close to have this working, but wanted to get my initial thoughts out on the matter first.

Stay Tuned

I’ve been super impressed with F# so far and the community using it. I haven’t struggled too terribly yet, but getting into a functional thought process and thinking in F#, a totally new language to me, has been tough. I have high hopes for the possibilities. The F# Language Reference has become my new best friend!