Quantcast
Channel: valerio.net
Viewing all articles
Browse latest Browse all 10

Towards a Workable INumeric, Part 1

0
0

Over the past couple of years (ok, decade? Since the days of .NET 1.1, I suppose) I’ve increasingly found myself wanting to write high-performance numerical code in C#. For example, while I was getting my Masters in electrical engineering at Ohio State I was working on computational electromagnetics software, most of which was written in C or FORTRAN, but I was drawn to the productivity boost of managed languages like C# and F#.

In scenarios like this, performance is king – the codes typically took multiple days and/or weeks to run, so a 10% performance improvement could save quite a bit of time. It was worth the effort to micro-optimize. Algorithms that were originally written for double-precision numbers could be sped up by using single-precision under certain circumstances. Some pieces of code could be improved by using integer arithmetic. Inevitably common code like sums and averages need to be maintained.

For example, just for two types (int and float) and one method (Sum), the code starts to add up:

    public static class Utilities
    {

        public static int Sum(int[] items)
        {
            int sum = 0;
            foreach (int item in items)
            {
                sum += item;
            }
            return sum;
        }

        public static float Sum(float[] items)
        {
            float sum = 0;
            foreach (float item in items)
            {
                sum += item;
            }
            return sum;
        }
    }

If the algorithm is more complicated, it starts to become difficult to keep things in sync. You have to remember to make changes in multiple places at the same time. Typically when you encounter code like this it’s a good indicator that some refactoring is necessary, but because the code is performance-critical that’s not possible.

Once .NET 2.0 came out and I discovered generics, I thought that would certainly solve the problem – I could just write the method once in terms of a type T, and then use it for int, float, double, etc. Wrong Smile

This is some code that I wish I could have written, but alas, that is not the case. Warning, unrealistic code ahead.

        public static T Sum<T>(T[] items)
            where T : int, float, double
        {
            T sum = 0;
            foreach (T item in items)
            {
                sum += item;
            }
            return sum;
        }

Generic type constraints are not allowed to specify individual numeric types (int, float, double), only that the type T must be either a struct (value type) or a class (reference type).  Even if a “where T : struct” clause is used, it cannot be assumed that 0 (an int) can be assigned to type T. “default(T)” helps, but typically doesn’t always express the concept of “zero” for all possible value types (even used-defined ones) that could be specified. Even after that, not all value types have the “+” operator defined – int, float, and double are special-cased by the C# compiler.

Generic type constraints could specify that the type T must implement a specific interface, but there is no interface that the built-in numeric value types (int, float, double, uint, etc) all implement (and has methods for the Add, Subtract, Multiply, and Divide operators). The BCL doesn’t contain such an interface. INumeric would be a good name for such a thing.

That seemed to be the end of the road to that idea. I reluctantly chose to continue maintaining multiple sets of the same code for different data types.

Since then I’ve had the nagging feeling that there has to be a better way, and tried a few different avenues to get past the roadblock- F# inline functions, the C# dynamic keyword, and  a custom INumeric implementation (the topic of this post and a few more).

F# inline functions

A few years after this I stumbled on F# when it was just an early beta that installed into Visual Studio 2005. I was just trying to find an interactive scripting environment like MATLAB that had good interop with .NET, not necessarily trying to find something geared towards science/engineering. I’d never heard of functional programming but picked it up quickly since F# was just so amazing. 

F# took a much more elegant approach to this problem, allowing inline functions where the type inference engine would allow constraints over individual static methods (and since F# is a functional languages, everything is a function, so the +,-,*,/ operators were also included in this). For example:

let inline Sum (items : 'a array) (zero : 'a) =
    items |> Array.fold (fun acc item -> acc + item) zero

The type signature of Sum (from F# interactive) is:

> 
val inline Sum :
   ^a array ->  ^a ->  ^a when  ^a : (static member ( + ) :  ^a *  ^a ->  ^a)

That’s great, and I love F#, but there were some practical reasons why I couldn’t just migrate all of my code into F#. C#/F# interop is fine, but sometimes leads to strange APIs, not to mention that all of the numerical code would need to be refactored out into a separate project/DLL. It’s a good solution, but switching languages kinda evades the original problem of “high-performance numerical code in C# without having to maintain multiple copies”.

(Note: This Sum method is just for illustration and parity with the above examples in C#. Obviously there is a built-in Sum function that is part of the core libraries that you should use Smile)

C# dynamic keyword

Another interesting development happened more recently with .NET 4 and C# 4’s ‘dynamic’ keyword.

Luca Bolognese has a good post about the approach.

The downside to this is that the method binding happens at runtime and has quite a bit of overhead, so that really shoots down the “high performance” part. I didn’t explore this option much more.

Custom INumeric implementation

And that sufficiently sets the stage enough to talk about this next idea that I’ve been pondering for awhile and the topic of this post.

The main idea here is that:

  • C# allows user-defined structs
  • Typically a user-defined struct contains more than one private value-type fields, but it’s still possible to create a struct with only one value-type field
  • If a user-defined struct has only one value-type field, and since structs are allocated on the stack, from a memory/bytes/bits perspective there is no difference between an instance of the user-defined struct and an instance of the underlying private value type
  • The user-defined struct is, however, a completely different C# type that is not (effectively) sealed like the built-in numerical value types, so it can implement whatever interfaces we need
  • We can define an INumeric interface to represent a numeric value type
  • We can just create “numeric wrapper types” for each numeric value type that implement this INumeric interface

Ok, I know this seems pretty bizarre, so let me explain with some code. (Caveat – this is just a first pass at the code to get the idea across. There are some fairly large usability problems with it at the moment, but hopefully they can be fine-tuned later.)

Here’s INumeric:

    public interface INumeric<T>
        where T : struct
    {
        T Add(T item);
        T Subtract(T item);
        T Multiply(T item);
        T Divide(T item);
    }

And here are some implementations for int and float:

    public struct IntNumeric : INumeric<IntNumeric>
    {
        private int _value;

        public IntNumeric(int value) { _value = value; }

        public IntNumeric Add(IntNumeric item) 
        {
            return new IntNumeric(this._value + item._value);
        }

        public IntNumeric Subtract(IntNumeric item)
        {
            return new IntNumeric(this._value - item._value);
        }

        public IntNumeric Multiply(IntNumeric item)
        {
            return new IntNumeric(this._value * item._value);
        }

        public IntNumeric Divide(IntNumeric item)
        {
            return new IntNumeric(this._value / item._value);
        }

        // ...
    }
   public struct FloatNumeric : INumeric<FloatNumeric>
    {
        private float _value;

        public FloatNumeric(float value)
        {
            _value = value;
        }

        public FloatNumeric Zero()
        {
            return new FloatNumeric(0.0f);
        }

        public FloatNumeric Add(FloatNumeric item)
        {
            return new FloatNumeric(this._value + item._value);
        }

        public FloatNumeric Subtract(FloatNumeric item)
        {
            return new FloatNumeric(this._value - item._value);
        }

        public FloatNumeric Multiply(FloatNumeric item)
        {
            return new FloatNumeric(this._value * item._value);
        }

        public FloatNumeric Divide(FloatNumeric item)
        {
            return new FloatNumeric(this._value / item._value);
        }


        // ...
    }

IntNumeric and FloatNumeric basically just provide method wrappers around the operators +,-,*,/.

One thing that always comes up with numerical computations is the need for array and matrix data structures. It would be nice to have a generic Array<T> or Matrix<T> where T is an INumeric. This allows common algorithms to be written once without having to maintain completely separate codebases for MatrixOfInt, MatrixOfFloat, etc.

Here’s a very simple implementation of a NumericArray<T>, just as an example:

   public class NumericArray<T>
        where T : struct, INumeric<T>
    {
        private T[] _array;

        public NumericArray(params T[] items)
        {            
            int size = items.Length;
            _array = new T[size];
            Array.Copy(items, _array, size);
        }

        public T this[int index]
        {
            get { return _array[index]; }
            set { _array[index] = value; }
        }

        public T Sum()
        {
            T sum = new T();
            foreach (T item in _array)
            {
                sum = sum.Add(item);
            }
            return sum;
        }
    }

Note that the Sum method is generic on T. The call to the default constructor “new T()” is understood (an implicit interface requirement, I suppose) to create a numeric value type of value 0. Instead of using a + operator, the .Add method is used on the INumeric type T.

It certainly seems like a lot of work since there will need to be conversions from regular value types to “wrapped” value types. Hopefully that can be addressed.

We also don’t know what the performance implications of this are – is there overhead for the .Add method invocation? As it turns out, there isn’t! The CLR JITter is incredibly smart when it comes to generating x86 code for these wrapped value types – it just treats NumericInt the same as an int.  It simply inlines the value-type appropriate opcode (add or faddp for int and float). I’ll dig into this in much more detail in an upcoming post. So, I think that the performance of this approach is very promising. Smile

This is just a first pass at the approach. I’m still trying to perfect it and iron out some of the wrinkles that make it hard to use – for example, having to convert every float to a FloatNumeric, etc. To be fully-featured numeric data types in .NET, they also need to implement all of the interfaces that the built-in numeric data types (IEquatable, IComparable, etc). I think it’s worth investigating, and I’ll try to get to the bottom of its feasibility in the next few posts.

Keep in mind that a lot of this is dependent on particular implementation details of the JITter in the .NET CLR, so things could work now and change in the future. Also keep in mind that there are actually 4 different CLRs – one for each combination of {.NET 2.0, .NET 4.0} and {x86, x64}. (Ok, actually 6, since there is .NET 1.1, but it’s ancient.)

Also, one more note to put credit where credit is due – this approach for INumeric stems from a number of conversations I had with two friends (and all-around brilliant guys), Tom Jackson and Stuart Bowers.

Stay tuned!


Viewing all articles
Browse latest Browse all 10

Latest Images

Trending Articles





Latest Images