Three new languages, not one
Visual Studio.NET actually includes three new languages rather than one. C# was built from the ground up as a new language. The two existing Visual Studio languages, C++ and Visual Basic, have been given extensive makeovers. (This is the last you'll hear about Visual BASIC in this article.)
In theory, the idea behind these extensions is that all of the languages in Visual Studio will
use the "Common Language Runtime" (CLR). There are two components to this, the IL ("Intruction Language") bytecode (akin to the JVM)
and the ".NET Framework". The first is a bytecode language akin to the JVM. The second is an API
for controlling the Windows operating system and user interface. The goal is to give programmers in
any supported language access to the same APIs. These two things are essentially independent, or
at least you can use the IL bytecode without using the .NET framework.
Getting a C++ program to use the IL bytecode is fairly easy. Most C++ programs can
simply be compiled to use it with the /clr
option. Getting C++ to
use the .NET Framework is a bit more difficult in that it requires that you
use "Managed C++". "Managed C++" is a garbage collected, safety checked version of C++
that is not backward compatible with standard ("unmanaged") C++. Managed C++
code cannot use many standard C++ features, the most important of which are
templates or any of the standard libraries. Making it all a bit confusing
is the fact that you can mix unmanaged and managed C++ code.
You can call managed code from unmanaged code, but not vice-versa.
In order to compare these various languages I took a program written for
the old "Programming Fun Challenge 4" contest that I had written in C++,
and rewrote it three times. First, in C#, then in Java (because C# is also
supposed to be a "Java killer") and finally in "Managed C++". My goal was
to get a sense of both the syntax and the performance differences of the
four languages. (I say four, because as you'll see, "managed C++" ends up
being practically an entire new language in its own.)
The program
Here is a Description of
the PFC4.
The PFC4 program is not a perfect test, but I think it is a decent one. It
ignores the GUI entirely, which is good as the differences between the
different UI APIs would make a comparison near impossible. It does,
however, use the collection classes extensively, so it isn't just a test of
the raw language, but of the APIs as well.
All of these programs use an identical algorithm, differing only as much as
the language required. The four programs can be found here:
The C++ version
The C# version
The Java version
The Managed C++ version
I have attempted to use all of the same method and property names in order to simplify comparisons. Also, my naming conventions are all taken from the original C++ version and therefore may not match the normal conventions for C# or Java. This is intentional as it also simplifies comparison.
Caveat: My experience in both Java and C# is light. It is very possible that my implementations in either of those languages is substandard. Also, this program does not hit every portion of the language, so this article is only a view of the languages used to attack a particular sort of problem.
The Implementations
Here is the implementation of one of the simpler methods in each of the languages:
// c++
WORDHDL AddHandle(const string& aWord)
{
int rc = GetHandle(aWord);
if( rc != -1 )
return rc;
myWordsLookup[aWord] = myWords.size();
myWords.push_back(aWord);
return myWords.size()-1;
}
// Java
public int AddHandle(String aWord)
{
int rc = GetHandle(aWord);
if( rc != -1 )
return rc;
myWordsLookup.put(aWord,new Integer(myWords.size()));
myWords.add(aWord);
return myWords.size()-1;
}
// C#
public int AddHandle(string aWord)
{
int rc = GetHandle(aWord);
if( rc != -1 )
return rc;
myWordsLookup[aWord] = myWords.Count;
myWords.Add(aWord);
return myWords.Count - 1;
}
// Managed C++
WORDHDL AddHandle(String *aWord)
{
WORDHDL rc = GetHandle(aWord);
if( rc != -1 )
return rc;
myWordsLookup->set_Item(aWord, __box(myWords->get_Count()));
myWords->Add(aWord);
return myWords->get_Count()-1;
}
Some things to notice:
Java doesn't have anything like typedef
, ie, something to make a quick-and-dirty alias for a type. This is a shame, because as you can see here, this feature can be used for code clarity. For the Java version, I have to remember that certain methods that return int
are returning a handle to a word string.
Java requires you to explicitly wrap base types when you put them in a container. Managed C++ requires you to "box" a base type when putting it in a container. (This is essentially the same thing). C# implicitly does these for you. C++'s containers are not object-based, so no casting or wrapping is needed.
Both C++ and C# make use of operator overloading to simplify the syntax of container inserts and fetches. Note that while you can in theory do this in Managed C++, the .NET framework does not do this, making inserts and features a bit uglier. Foo[key] = data;
becomes Foo->set_Item(Key, __box(Data));
.
Note the way C# muddies the difference between a property and a method. The Count
property in the C# version is the same as the Count()
method in the Java version. In other words, it could involve executing arbitrary amounts of code.
Thoughts about C#
Despite its name, C# is much more of an extended Java than an extended C++. The similarities are extensive, going as far is using identical names for identical methods. Comparing the two languages themselves, I'd have to say that C# is just slightly easier to work with in that where it doesn't mimic Java exactly, it extends it in a useful way. The two major extensions that I ran into were the ability to define properties with code (which I found useful syntactic sugar though I know others object to the idea) and the simpler syntax due to more implicit casting.
Thoughts about Managed C++.
It should be clear from anyone looking at the managed C++ version that it is not a viable option for new development. It is by far the longest of the programs, and also, in my opinion, the most syntactically ugly of the four. And indeed, Microsoft explicitly discourages its use for new development, promoting it mainly as a way to pull in old C++ code. And as you'll see below, you don't even get the performance benefits you'd expect using C++, so there seems to be little reason to ever use as anything other than an upgrade path.
Collection containers
The C++ containers are different from the containers in both Java and in the .NET Framework in that they are generic containers rather than object oriented containers. In other words, for C++, you say "I want a container to put things of type Foo" in whereas in the other language you say "I want a container to put things in".
None of the languages provided all of the containers I wanted for this application. C++ is missing a good hashtable type. (std::map
is a b-tree.) C# is missing any concept of a set (i.e. a container where the key and the data are the same). Neither C# nor Java have containers with non-unique keys. (Though they are pretty easy to fake with containers of containers.)
The SGI implementation of the C++ STL has a nonstandard "hash_map" extension. This version is very common in the C++ community and is rumored to be slated for inclusion of the next C++ standard. It has an interesting effect on performance, so I included support for it in the C++ version.
Objective comparisons
The lengths of the different programs is interesting:
Table One
C++: 375 lines
C#: 425 lines
Java: 431 lines
Managed C++: 512 lines
I was a bit taken aback by the fact that the C++ version came in shorther give that C# and Java are supposedly "higher level" languages. Part of this is due to the STL and the fact that it has a quick-and-dirty tuple class (std::pair
) that I could use in containers to avoid having to create a special class for the same purpose. Also influencing this is that neither Java nor C# are as good at console IO as C++ is. This is not surprising given their GUI orientation and C++'s console heritage.
The timing differences are more interesting. a note on the many C++ versions. C++ can be compiled to use the CLR or not to use the CLR even if it is not managed. This has timing implications. There are also different versions using the various STL implementations, and, for kicks, versions compiled with gcc rather than Microsoft Visual Studio.NET.
All timings were obtained on a 800 Mhz Pentium IV using this input data. All times in seconds.
Table Two
Standard C++: 27.99
Standard C++ + SGI STL 11.15
Standard C++ + SGI STL and hash_map 6.04
g++ C++: 17.28
g++ C++ + SGI STL: 14.93
g++ C++ + SGI STL and hash_map: 7.29
Standard C++ compiled /clr: 34.36
Standard C++ + SGI STL compiled /clr: 25.09
Standard C++ + SGI STL and hash_map compiled /clr: 12.98
Managed C++: 111.59
C#: 93.08
Java: 65.57
As you can see, the differences can be substantial. The fastest and the slowest systems differed from each other by more than a factor of fifteen. In general, the C++ versions outperformed the others, with one very important exception; the very slowest of the systems was the managed C++ version.
Why? There are lots of possibilities. The most obvious would be the bytecode, but it is clear from the C++ versions compiled to use bytecode that the performance hit here is about a factor of two. This alone does not explain the performance differences between C++ and the other three languages.
An obvious culprit is the .NET Framework itself, as the two differences between the Managed C++ version and the standard C++ version are the .NET Framework and garbage collection. And indeed, after doing a little bit of profiling, it appeared that the .NET Framework classes performed very differently then the STL collection classes. In particular, the C++ classes were actually slower on inserts than the .NET Framework collection classes, but were far, far faster on fetches. (The C++ version that runs in six seconds spends almost a second and a half of its time loading the dictionary, something the managed C++ version does in less than a second.)
It is interesting that the Java version, while significantly faster than those using the .NET Framework, is still significantly slower than the C++ versions. This could, in theory, be the garbage collection, but I suspect not, as playing around with other aspects of C# programming, most notably, UI development, I've found it generally only to be only about half as slow as C++.
I suspect the big difference is that the C# collections need to store full objects whereas the C++ collections
can store base types.
Final Thoughts
One thing I've found when comparing C# to C++, both with this program and others, is that the biggest
differences in the ease of programming come not from the languages themselves but from the APIs. The
collection classes in C# (and Java) are substantially easier to use than the STL. The new .NET Framework
is substantially easier to use then the older Win32 API for Windows programming. (And also quite a bit
easier to use than Gtk.) These differences seem to swamp out a lot of differences in ease of use in the
languages themselves. But it still must be said that programming in C# is certainly easier than programming
in C++. There's a lot of fairly arcane stuff you must think about if you are going to get the most out of
C++ whereas most of these decisions are made for you under the covers with C# (or, indeed, most languages).
It is also important to see that the performance differences between the various systems are very application
dependent. The timings here make C# (and also Java) look very slow compared to C++, and while I chose the
task essentially at random, in retrospect I think it was a bit unfair in that I suspect other tasks might not
show such a performance gap. Certainly a benchmark that did something like calculate Fibonacci numbers, and
that avoided the collection classes, would make C# look much better. It is also interesting to note the impact
that simply using the SGI hash_map
had on the C++ versions. It shows how important choosing
the right data structure for the task is.