No string type
C has no string type. Huh? Most sane programming languages have a string type
which allows one to just say "this is a string" and let the compiler take
care of the rest. Not so with C. It's so stubborn and dumb that it only has
three types of variable; everything is either a number, a bigger number,
a pointer or a combination of those three.
Thus, we don't have proper strings but "arrays of unsigned
integers". "char" is basically only a really small number. And now we have
to start using unsigned ints to represent multibyte characters.
What. A. Crock. An ugly hack.
Functions for insignificant operations
Copying one string from another requires including <string.h> in your
source code, and there are two functions for copying a
string. One could even conceivably copy strings using other functions (if
one wanted to, though I can't imagine why). Why does any normal language
need two functions just for copying a string? Why can't we just use the
assignment operator ('=') like for the other types? Oh, I forgot. There's
no such thing as strings in C; just a big continuous stick of memory. Great!
Better still, there's no syntax for:
- string concatenation
- string comparison
- substrings
Ditto for converting numbers to strings, or vice versa. You have to use
something like
atol()
, or
strtod()
, or a variant on
printf()
. Three families
of functions for variable type conversion. Hello? Flexible casting? Hello?
And don't even get me started on the lack of exponentiation operators.
No string type: the redux
Because there's no real string type, we have two options: arrays or pointers.
Array sizes can only be constants. This means we run the risk of buffer
overflow since we have to try (in vain) to guess in advance how many
characters we need. Pathetic. The only alternative is to use
malloc()
,
which is just filled with pitfalls. The whole concept of pointers is
an accident
waiting to happen. You can't free the same pointer twice. You have
to always check the return value of
malloc()
and you mustn't cast it. There's
no built-in way of telling if a spot of memory is in use, or if a pointer's
been freed, and so on and so forth. Having to resort to low-level memory
operations just to be able to store a line of text is asking for...
The encouragement of buffer overflows
Buffer overflows abound in virtually any substantial piece of C code. This is
caused by programmers accidentally putting too much data in one space or
leaving a pointer pointing somewhere because a returning function ballsed up
somewhere along the line. C includes no way of telling when the end of an
array or allocated block of memory is overrun. The only way of telling is
to run, test, and wait for a segfault. Or a spectacular crash. Or a
slow,
steady leakage of memory from a program, agonisingly 'bleeding' it to death.
Functions which encourage buffer overflows
gets()
strcat()
strcpy()
sprintf()
vsprintf()
bcopy()
scanf()
fscanf()
sscanf()
getwd()
getopt()
realpath()
getpass()
The list goes on and on and on. Need I say more? Well, yes I do.
You see, even if you're not writing any memory you can still access memory
you're not supposed to. C can't be bothered to keep track of the ends of
strings; the end of a string is indicated by a null '\0' character. All
fine, right? Well, some functions in your C library,
such as strlen()
, perhaps, will just run
off the end of a 'string' if it doesn't have a null in it. What if you're
using a binary string?
Careless programming this may be, but we all make mistakes and so
the language authors have to take some responsibility for being so
intolerant.
No built-in Boolean type
If you don't believe me, just watch:
$ cat > test.c
int main(void)
{
bool b;
return 0;
}
$ gcc -ansi -pedantic -Wall -W test.c
test.c: In function 'main':
test.c:3: 'bool' undeclared (first use in this function)
Not until the 1999 ISO C standard were we finally able to use 'bool' as a
data type. But guess what? It's implemented as a macro
and one actually has to include a header file to be able to use it!
High-level or low-level?
On the one hand, we have the fact that there is no string type, and
direct memory management, implying a low-level language.
On the other hand, we have a mass of library
functions, a preprocessor and a plethora of other things which imply a
high-level language.
C tries to be both, and as a result spreads itself too thinly.
The great thing about this is that when C is lacking a genuinely useful
feature, such as reasonably strong data typing, the excuse "C's a low-level
language" can always be used, functioning as a perfect 'reason' for C to
remain unhelpfully and fatally sparse.
The original intention for C was for it to be a portable assembly language
for writing UNIX. Unfortunately, from its very inception C has had extra
things packed into it which make it fail as an assembly language. Its
kludgy strings are a good example. If it were at least portable these
failings might be forgivable, but C is not portable.
Integer overflow without warning
Self explanatory. One minute you have a fifteen digit number, then try to
double or triple it and boom! its value is suddenly -234891234890892 or
something similar. Stupid, stupid, stupid. How hard would it have been
to give a warning or overflow error or even reset the variable to zero?
This is widely known as bad practice. Most competent developers acknowledge
that silently ignoring an error is a bad attitude to have; this is
especially true for such a commonly used language as C.
Portability?!
Please. There are at least four official specifications of C I could name
from the top of my head and no
compiler has properly implemented all of them. They conflict, and they
grow and grow. The problem isn't subsiding; it's increasing each day. New
compilers and libraries are developed and proprietary extensions are
being developed.
GNU
C isn't the same as
ANSI
C isn't the same as
K&R C
isn't the same as Microsoft C isn't the same as
POSIX C.
C isn't portable; all kinds of machine architectures are totally different,
and C can't properly adapt because it's so muttonheaded. It's trapped in The
Unix Paradigm.
If it weren't for the C preprocessor, then it would be virtually impossible
to get C to run on multiple families of processor hardware, or even just
slightly differing operating systems. A programming language should not
require a C preprocessor just so that it can run on both FreeBSD, Linux
or Windows without failing to compile.
C is unable to adapt to new conditions for the sake of
"backward compatibility", throwing away the opportunity to get rid of
stupid, utterly useless and downright dangerous functions for a nonexistent
goal. And yet C is growing new tentacles and unnecessary
features because of idiots who think adding seven new functions to their C
library will make life easier. It does not.
Even the C89 and C99 standards conflict with each other in ridiculous
ways. Can you use the long long type or can't you? Is a certain constant
defined by a preprocessor macro hidden deep, deep inside my C library? Is
using a function in this particular way going to be undefined, or acceptable?
What do you mean,
getch()
isn't a proper function but
getchar()
is?
The implications of this false 'portability'
Because C pretends to be portable, even professional C programmers can be
caught out by hardware and an unforgiving programming language; almost
anything like
comparisons, character assignments, arithmetic, or string output can blow
up spectacularly for no apparent reason because of endianness or because your
particular processor
treats all chars as unsigned or silly, subtle, deadly traps like that.
Archaic, unexplained conventions
In addition to the aforementioned problems, C also has various
idiosyncrasies (invariably unreported) which not even some teachers of C
are aware of: "Don't use fflush(stdin), gets() is evil, main() must return
an integer, main() can only take one of three sets of arguments, you
musn't cast the return value of malloc(), fileno() isn't an ANSI compliant
function..." all these unnecessary and
unmentioned quirks mean buggy code. Death by a thousand cuts. Ironic when
you consider that Kernighan thinks of Pascal in the same way when C has just
as many little gotchas that bleed you to death gradually and painfully.
Blaming The Progammer
Due to the fact that C is pretty difficult to learn and even harder to actually use without breaking something in a subtle yet horrific way it's
assumed that anything which goes wrong is the programmer's fault. If your
program segfaults, it's your fault. If it crashes, mysteriously returning 184
with no error message, it's your fault. When one single condition you'd
just happened to have forgotten about whilst coding screws up,
it's your fault.
Obviously the programmer has to shoulder most of the responsibility for a
broken program. But as we've already seen, C positively
tries to make the programmer fail. This increases the
failure rate and yet for some reason we don't blame the language when yet
another buffer overflow is discovered. C programmers try to
cover up C's
inconsistencies and inadequacies by creating a culture of 'tua culpa'; if
something's wrong, it's your fault, not that of the compiler,
linker, assembler, specification, documentation, or hardware.
Compilers have to take some of the blame. Two reasons. The first is that
most compilers have proprietary extensions built into them. Let me remind
you that half of the point of using C is that it should be portable and
compile anywhere. Adding extensions violates the original spirit of C and
removes one of its advantages (albeit an already diminished advantage).
The other (and perhaps more pressing) reason is the lack of anything beyond
minimal error checking
which C compilers do. For every ten types of errors your
compiler catches, another fifty will slip through. Beyond variable type and
syntax checking the compiler does not look for anything else. All it can do
is give warnings on unusual behaviour, though these warnings are often
spurious.
On the other hand, a single error can cause a ridiculous cascade, or make the
compiler fall over and die because of a misplaced semicolon, or, more
accurately and incriminatingly, a badly constructed parser and grammar.
And yet, despite this, it's your fault.
To quote The Unix Haters' Handbook:
"If you make even a small omission, like a single semicolon, a C compiler
tends to get so confused and annoyed that it bursts into tears and complains
that it just can't compile the rest of the file since one missing semicolon
has thrown it off so much."
So C compilers may well give literally hundreds of errors stating that half of your code is wrong if you miss out a single semicolon. Can it get worse?
Of course it can! This is C!
You see, a compiler will often not deluge you with error information when
compiling. Sometimes it will give you no warning whatsoever even if
you write totally foolish code like this:
#include <stdio.h>
int main()
{
char *p;
puts(p);
return 0;
}
When we compile this with our 'trusty' compiler gcc, we get no errors
or warnings at all. Even when using the '-W' and '-Wall' flags to make it
watch out for dangerous code it says nothing.
In fact, no warning is given ever unless you try to optimise the program with
a '-O' flag. But what if you never optimise your program? Well, you now have
a dangerous program. And unless you check the code again you may well never
notice that error.
What this section (and entire document) is really about is the sheer
unfriendliness of C and how it is as if it takes great pains to be as difficult
to use as possible.
It is flexible in the wrong way; it can do many, many different things,
but this makes it impossible to do any single thing with it.
Trapped in the 1970s
C is over thirty years old, and it shows. It lacks features that modern
languages have such as exception handling, many useful data types, function overloading, optional function arguments and garbage collection.
This is hardly surprising considering that it was constructed from an
assembler language with just one data type on a computer from 1970.
C was designed for the computer and programmer of the 1970s, sacrificing
stability and programmer time for the sake of memory. Despite the fact that
the most recent standard is just half a decade old, C has not been updated
to take advantage of increased memory and processor power to implement
such things as automatic memory management. What for? The illusion of
backward compatibility and portability.
Yet more missing data types
Hash tables. Why was this so difficult to implement? C is intended for the
programming of things like kernels and system utilities, which frequently
use hash tables. And yet it didn't occur to C's creators that maybe including
hash tables as a type of array might be a good idea when writing UNIX?
Perl has them. PHP has them. With C you have to fake hash tables, and
even then it doesn't really work at all.
Multidimensional arrays. Before you tell me that you can do stuff like
int multiarray[50][50][50]
I think that I should point out
that that's an array of arrays of arrays. Different thing. Especially when
you consider that you can also use it as a bunch of pointers. C programmers
call this "flexibility". Others call it "redundancy", or, more accurately,
"mess".
Complex numbers. They may be in C99, but
how many compilers
support that? It's not exactly difficult to get your head round the
concept of complex numbers, so why weren't they included in the first place?
Were complex numbers not discovered back in 1989?
Binary strings. It wouldn't have been that hard just to make a compulsory
struct
with a mere two members: a char *
for the
string of bytes and a size_t
for the length of the string.
Binary strings have always been around on Unix, so why wasn't C more
accommodating?
Library size
The actual core of C is admirably small, even if some of the syntax isn't
the most efficient or readable (case in point: the combined
'? :
' statement). One thing that is bloated is the C library.
The number of functions in a full C library which complies with all
significant standards runs into four digit figures. There's a great deal of
redundancy, and code which really shouldn't be there.
This has knock-on
effects, such as the large number of configuration constants which are
defined by the preprocessor (which shouldn't be necessary), the size of
libraries (the GNU C library almost fills a floppy disk and its
documentation, three) and inconsistently
named groups of functions in addition to duplication.
For example, a function for converting a string
to a long integer is
atol()
.
One can also use
strtol()
for exactly the same thing. Boom - instant redundancy. Worse
still, both functions are included in the
C99,
POSIX
and
SUSv3 standards!
Can it get worse? Of course it can! This is C!
As a result it's only logical that there's an equivalent pair of
atod()
and strtod()
functions for converting a
string to a double. As you've probably guessed, this isn't true. They are
called atof()
and strtod()
. This is very foolish.
There are yet more examples scattered through the standard C library like
a dog's smelly surprises in a park.
The Single Unix Specification version three specifies 1,123
functions which must be available to the C programmer of the compliant
system. We already know about the redundancies and unnecessary functions,
but across how many header files are these 1,123 functions spread out?
62. That's right, on average a C library header will define approximately
eighteen functions. Even if you only need to use maybe one function
from each of, say, five libraries (a common occurrence)
you may well wind up including 90, 100 or even 150 function definitions you
will never need. Bloat, bloat, bloat. Python has the right idea; its
import
statement allows you to define exactly the functions
(and global variables!) you need from each library if you prefer. But C? Oh,
no.
Specifying structure members
Why does this need two operators? Why do I have to pick between
'.' and '->' for a ridiculous, arbitrary
reason? Oh, I forgot; it's just yet another of C's gotchas.
Limited syntax
A couple of examples should illustrate what I mean quite nicely. If
you've ever programmed in
PHP
for a substantial period of time, you're probably aware of
the 'break' keyword. You can use it to break out from nested loops of
arbitrary depth by using it with an integer, such as "break 3"; this would break out of three levels of loops.
There is no way of doing this in C. If you want to break out from a series of
nested for or while loops then you have to use a goto. This is what is
known as a crude hack.
In addition to this, there is no way to compare any non-numerical data type
using a switch statement. C does not allow you to use switch and case statements for strings. One must
use several variables to iterate through an array of case strings and compare
them to the given string with strcmp(). This reduces performance
and is just yet another hack.
In fact, this is an example of gratuitous library functions running wild
once again. Even comparing one string to another requires use of the
strcmp() function.
Flushing standard I/O
A simple microcosm of the "you can do this, but not that" philosophy of C;
one has to do two different things to flush standard input and standard
output.
To flush the standard output stream, one can use fflush()
(defined by <stdio.h>). One doesn't usually need to do this after
every bit of text is printed, but it's nice to know it's there, right?
Unfortunately, one cannot use fflush() to flush the contents of
standard input. Some C standards explicitly define it as having undefined
behaviour, but this is so illogical that even textbook authors sometimes
mistakenly use fflush(stdin)
in examples and some compilers
won't bother to warn you about it. One shouldn't even
have to flush standard input; you ask for a character with
getchar(), and the program should just read in the first
character given and disregard the rest. But I digress...
There is no 'real' way to flush standard input up to, say, the end of a line.
Instead one has to use a kludge like so:
int c;
do {
errno = 0;
c = getchar();
if (errno) {
fprintf(stderr,
"Error flushing standard input buffer: %s\n",
strerror(errno));
}
} while ((c != '\n') && (!feof(stdin)));
That's right; you need to use a variable, a looping construct, two library
functions and several lines of exception handling code to flush the
standard input buffer.
Inconsistent error handling
A seasoned C programmer will be able to tell what I'm talking about just by
reading the title of this section. There are many incompatible ways in
which a C library function indicates that an error has occurred:
- Returning zero.
- Returning nonzero.
- Returning a NULL pointer.
- Setting errno.
- Requiring a call to another function.
- Outputting a diagnostic message to the user.
Some functions may actually use up to three of these methods. But the thing
is that none of these are compatible with each other and error handling does
not occur automatically; every time a C programmer uses a library
function they must check manually for an error. This bloats code
which would otherwise be perfectly readable
without if-blocks for error handling
and variables to keep track of errors. In a large software project one must
write a section of code for error handling hundreds of times. If you forget,
something can go horribly wrong. For example, if you don't check the return
value of malloc() you may accidentally try to use a null pointer.
Oops...
Commutative array subscripting
"Hey, Thompson, how can I make C's syntax even more
obfuscated and difficult to understand?"
"How about you allow 5[var] to mean the same as var[5]?"
"Wow; unnecessary and confusing syntactic idiocy! Thanks!"
"You're welcome, Dennis."
Variadic anonymous macros
In case you don't understand what variadic anonymous macros are, they're
macros (i.e. pseudofunctions defined by the preprocessor) which can take a
variable number of arguments. Sounds like a simple thing to implement. I
mean, it's all done by the preprocessor, right? And besides, you can define
proper functions with variable numbers of arguments even in the original
K&R C, right?
In that case, why can't I do:
#define error(...) fprintf(stderr, ...)
without getting a warning from GCC?
warning: anonymous variadic macros were introduced in C99
That's right, folks. Not until late 1999, 30 years after development on the
C programming language began, have we been allowed to do such a simple
task with the preprocessor.
The C standards don't make sense
Only one simple quote from the ANSI C standard - nay, a single footnote - is
needed to demonstrate the immense idiocy of the whole thing. Ladies,
gentlemen, and everyone else, I present to you...footnote 82:
All whitespace is equivalent except in certain situations.
I'd make a cutting remark about this, but it'd be too easy.
Too much preprocessor power
Rather foolishly, half of the actual C language is reimplemented in the
preprocessor. (This should be a concern from the start; redundancy usually
indicates an underlying problem.) We can #define fake variables, fake
conditions with #ifdef and #ifndef, and look, there's even #if, #endif and
the rest of the crew! How useful!
Erm, sorry, no.
Preprocessors are a good idea for a language like C. As has been iterated, C
is not portable. Preprocessors are vital to bridging the gap between
different computer architectures and libraries and allowing a program to
compile on multiple machines without having to rely on external programs.
The #define statement, in this case, can be used perfectly validly to set
'flags' that can be used by a program to determine all sorts of things: which
C standard is being used, which library, who wrote it, and so on and so
forth.
Now, the situation isn't as bad as for C++. In C++, the preprocessor is so
packed with unnecessary rubbish that one can actually use it to calculate an
arbitrary series of Fibonacci numbers at compile-time. However, C comes
dangerously close; it allows the programmer to define fake global variables
with wacky values which would not otherwise be proper code, and then compare
values of these variables. Why? It's not needed; the C language of the Plan
9 operating system doesn't let you play around with preprocessor definitions
like this. It's all just bloat.
"But what about when we want to use a constant throughout a program? We
don't want to have to go through the program changing the value each time we
want to change the constant!" some may complain.
Well, there's these things called global variables. And there's
this keyword, const
. It makes a constant variable. Do you see
where I'm going with this?
You can do search and replace without the preprocessor, too. In fact, they
were able to do it back in the seventies on the very first versions of Unix.
They called it sed. Need something more like cpp?
Use m4 and stop complaining. It's the Unix way!