# Introduction
While XML is buzzword-friendly, YAML (used to be Yet Another Markup Language, but is now a GNU-ish YAML Ain't Markup Language) is almost a non-buzzword. Mentioning it to people is the surest way to get a blank stare, even from fairly tech-savvy people.
That's a real shame, in my opinion, because both XML and YAML have their uses. XML, a subset of SGML, is designed to be simpler than SGML, and designed to be a standardized way of storing and sharing various types of data. It's good for a number of things; however, for every article I've read that makes me say, "Hey, that's neat!" I can find at least one rebuttal that says, essentially, "That's not what XML was designed to do!"
One wonders sometimes what XML was designed to do if it wasn't designed to do the things that people are using it for. On top of that, I've run into a number of people have expressed a number of concerns about XML. The biggest complaints I've seen are "too many characters" and "what makes this better than other file formats, exactly?"
My thoughts exactly.
# A look at YAML
One of the first things people try to do (including me) is compare YAML to XML. That's not really a fair comparison. While XML builds on a legacy language, YAML was designed from the start to be a data serialization language that's both powerful and human readable. The similarities end there.
Take a look at the reference card. The reference card is, essentially, a YAML document. For a more complete explanation, take a look at the YAML specification, and have a look at the YAML Cookbook (warning: the YAML Cookbook has a strong Ruby bias.)
# XML, or YAML?
Bear in mind that I'm not a raving expert on either XML or YAML when looking at these examples. Having said that, here's a modified snippet from the XML file I had created for an in-house contacts list:
<user id="babooey" on="cpu1">
<firstname>Bob</firstname>
<lastname>Abooey</lastname>
<department>adv</department>
<cell>555-1212</cell>
<address password="xxxx">ahunter@example1.com</address>
<address password="xxxx">babooey@example2.com</address>
</user>
Now, contrast this with what I would enter for the YAML file:
babooey:
computer
: cpu1
firstname: Bob
lastname: Abooey
cell: 555-1212
addresses:
- address: babooey@example1.com
password: xxxx
- address: babooey@example2.com
password: xxxx
I don't know what anyone else's thoughts are on the subject, but I find the YAML version to be much more readable. Python-phobes will be shocked to learn that whitespace is significant in YAML.
Another thing to notice is that YAML is designed with scripting languages (such as Python, Perl, PHP, and Ruby, among others) in mind. What does that mean? Well, it means that the language was designed for scripting languages, and designed to translate easily to structures common to various languages.
What does that mean for you? Well, if I were to read this example YAML structure into Python, assuming that I'd created a complete YAML document containing only this information, the resulting structure would look something like this:
{babooey: {computer: cpu1, firstname: Bob, lastname: Abooey, cell: 555, 1212, addresses: [{address: babooey@example1.com, password: xxxx},{address: babooey@example2.com, password: xxxx}]}
In other words, there's a dictionary with one key, and a dictionary as a value. That dictionary has a number of keys and values, with the "addresses" key mapping to a list of dictionaries. Crystal clear? Good.
And finally, what's the line count and character count on the resulting files?
XML:
Line count: 245
Character count: 10110
YAML:
Line count: 289
Character count: 9447
Not a huge savings, but YAML saves itself by being readable.
Some people might be asking "Why use either standard?" Good question! My own feeling is that I should stick with a standard language simply because I want my data to be readable, but not force someone to write a new parser if they pick up my data and, say, wish to process it in Perl. If you don't have that goal in mind, or don't care, neither XML nor YAML will appeal to you.
# Reading the document into Python
There are two parsers I know of for Python: PyYAML and Syck. Syck is designed to be a fast parser for multiple languages; PyYAML is designed to be a Python parser. After experimenting with both, I had better luck with PyYAML. Unfortunately, neither parser is a complete implementation of the YAML specification; fortunately, PyYAML implements enough for my needs.
Here's what I originally used to pretty-print the data structure resulting from loading contacts.yml:
#!/usr/bin/env python
import yaml, pprint
datafile = yaml.loadFile("contacts.yml")
dataset = datafile.next()
print pprint.pprint(dataset)
And that's it!
What does this do? Well, you can retrieve all the YAML documents in a YAML file by assigning the value of foo.next() to a variable; the result is a data structure based on the YAML document.
# In closing...
I've only scratched the surface. I've not even covered enclosing multiple documents into a single YAML file, shortcuts, forcing datatypes, blocks, trailing newlines in literals, folding, aliases, and a raftload of handy features I've not even had the pleasure to need. For more on the subject, have a look at the YAML Cookbook for good examples.
I for one know what sort of format I'll be using for my next project, and it won't be XML.
# What about all the cool stuff that's been done with XML? Can YAML compete?
So what? Anything that can be done in XML can be done without XML. Do you need XML for XML-RPC? Well, yeah, but is it necessary to use XML for RPC? No. Is it necessary to use XML for RDF? No. Is it necessary to use XML for, well, anything? No. Is YAML a perfect replacement for XML? No, but in some circumstances, YAML may be more suitable than XML.
Do I think that YAML can compete with XML? Yep. Do I see YAML replacing XML? Nope. Both are great standards, and both have their place. I'd just like to help clue people in to a standard that's more suitable in situations where XML, quite frankly, stinks.