Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

[P]
Why YAML? Why not?

By regeya in Op-Ed
Sun Oct 31, 2004 at 01:01:03 PM EST
Tags: Software (all tags)
Software

I started a simple project a while back; all I needed to do was store some information about stuff around the office, and since it's a small office, I thought that a RDBMS system was overkill. So I decided to go with XML. After creating a document, I then decided to fight with XML DTDs, and tried to fight with both XML DOM and XML SAX for Python, I finally decided on an approach that built dictionaries using SAX, as described here.

After messing with it for a while, I decided there had to be a better way. Almost by accident, I stumbled upon YAML.


# Introduction

While XML is buzzword-friendly, YAML (used to be Yet Another Markup Language, but is now a GNU-ish YAML Ain't Markup Language) is almost a non-buzzword. Mentioning it to people is the surest way to get a blank stare, even from fairly tech-savvy people.

That's a real shame, in my opinion, because both XML and YAML have their uses. XML, a subset of SGML, is designed to be simpler than SGML, and designed to be a standardized way of storing and sharing various types of data. It's good for a number of things; however, for every article I've read that makes me say, "Hey, that's neat!" I can find at least one rebuttal that says, essentially, "That's not what XML was designed to do!"

One wonders sometimes what XML was designed to do if it wasn't designed to do the things that people are using it for. On top of that, I've run into a number of people have expressed a number of concerns about XML. The biggest complaints I've seen are "too many characters" and "what makes this better than other file formats, exactly?"

My thoughts exactly.

# A look at YAML

One of the first things people try to do (including me) is compare YAML to XML. That's not really a fair comparison. While XML builds on a legacy language, YAML was designed from the start to be a data serialization language that's both powerful and human readable. The similarities end there.

Take a look at the reference card. The reference card is, essentially, a YAML document. For a more complete explanation, take a look at the YAML specification, and have a look at the YAML Cookbook (warning: the YAML Cookbook has a strong Ruby bias.)

# XML, or YAML?

Bear in mind that I'm not a raving expert on either XML or YAML when looking at these examples. Having said that, here's a modified snippet from the XML file I had created for an in-house contacts list:

    <user id="babooey" on="cpu1">
        <firstname>Bob</firstname>
        <lastname>Abooey</lastname>
        <department>adv</department>
        <cell>555-1212</cell>
        <address password="xxxx">ahunter@example1.com</address>
        <address password="xxxx">babooey@example2.com</address>
    </user>

Now, contrast this with what I would enter for the YAML file:

        babooey:
            computer : cpu1
            firstname: Bob
            lastname: Abooey
            cell: 555-1212
            addresses:
                - address: babooey@example1.com
                  password: xxxx
                - address: babooey@example2.com
                  password: xxxx

I don't know what anyone else's thoughts are on the subject, but I find the YAML version to be much more readable. Python-phobes will be shocked to learn that whitespace is significant in YAML.

Another thing to notice is that YAML is designed with scripting languages (such as Python, Perl, PHP, and Ruby, among others) in mind. What does that mean? Well, it means that the language was designed for scripting languages, and designed to translate easily to structures common to various languages.

What does that mean for you? Well, if I were to read this example YAML structure into Python, assuming that I'd created a complete YAML document containing only this information, the resulting structure would look something like this:

{babooey: {computer: cpu1, firstname: Bob, lastname: Abooey, cell: 555, 1212, addresses: [{address: babooey@example1.com, password: xxxx},{address: babooey@example2.com, password: xxxx}]}

In other words, there's a dictionary with one key, and a dictionary as a value. That dictionary has a number of keys and values, with the "addresses" key mapping to a list of dictionaries. Crystal clear? Good.

And finally, what's the line count and character count on the resulting files?

XML:
Line count: 245
Character count: 10110

YAML:
Line count: 289
Character count: 9447

Not a huge savings, but YAML saves itself by being readable.

Some people might be asking "Why use either standard?" Good question! My own feeling is that I should stick with a standard language simply because I want my data to be readable, but not force someone to write a new parser if they pick up my data and, say, wish to process it in Perl. If you don't have that goal in mind, or don't care, neither XML nor YAML will appeal to you.

# Reading the document into Python

There are two parsers I know of for Python: PyYAML and Syck. Syck is designed to be a fast parser for multiple languages; PyYAML is designed to be a Python parser. After experimenting with both, I had better luck with PyYAML. Unfortunately, neither parser is a complete implementation of the YAML specification; fortunately, PyYAML implements enough for my needs.

Here's what I originally used to pretty-print the data structure resulting from loading contacts.yml:

#!/usr/bin/env python
import yaml, pprint

datafile = yaml.loadFile("contacts.yml")
dataset = datafile.next()

print pprint.pprint(dataset)

And that's it!

What does this do? Well, you can retrieve all the YAML documents in a YAML file by assigning the value of foo.next() to a variable; the result is a data structure based on the YAML document.

# In closing...

I've only scratched the surface. I've not even covered enclosing multiple documents into a single YAML file, shortcuts, forcing datatypes, blocks, trailing newlines in literals, folding, aliases, and a raftload of handy features I've not even had the pleasure to need. For more on the subject, have a look at the YAML Cookbook for good examples.

I for one know what sort of format I'll be using for my next project, and it won't be XML.

# What about all the cool stuff that's been done with XML? Can YAML compete?

So what? Anything that can be done in XML can be done without XML. Do you need XML for XML-RPC? Well, yeah, but is it necessary to use XML for RPC? No. Is it necessary to use XML for RDF? No. Is it necessary to use XML for, well, anything? No. Is YAML a perfect replacement for XML? No, but in some circumstances, YAML may be more suitable than XML.

Do I think that YAML can compete with XML? Yep. Do I see YAML replacing XML? Nope. Both are great standards, and both have their place. I'd just like to help clue people in to a standard that's more suitable in situations where XML, quite frankly, stinks.

Sponsors

Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure

Login

Poll
XML, YAML, or neither?
o XML 13%
o YAML 9%
o Neither 13%
o Whatever works 63%

Votes: 44
Results | Other Polls

Related Links
o RDBMS
o XML
o XML DTDs
o XML DOM
o XML SAX
o Python
o dictionari es
o here
o YAML
o #
o SGML
o # [2]
o reference card
o YAML specification
o YAML Cookbook
o # [3]
o Perl
o PHP
o Ruby
o # [4]
o PyYAML
o Syck
o # [5]
o # [6]
o Also by regeya


Display: Sort:
Why YAML? Why not? | 184 comments (166 topical, 18 editorial, 2 hidden)
maybe it's because I'm a heavy XML user... (3.00 / 3) (#2)
by zenofchai on Fri Oct 29, 2004 at 02:50:35 PM EST

But I find XML much more human-parseable than YAML. For example, in XML, often times the tags describe themselves; in your example, I don't know what "babooey" itself is, and apparently would likely have to rely on it being in some kind of file called: "user-ids.yaml" or something.

In short: a root document tag is a good idea.

Plus, XML is very interoperable with programming languages. Java, C, C++, and Javascript come to mind very quickly. Many of the "next gen" data formats are already in XML (SVG, XHTML).

In conclusion: I'll learn YAML when I have to use it for a project. ;]
--
The K5 Interactive Political Compass SVG Graph

I hit the spam button (1.10 / 10) (#3)
by phred on Fri Oct 29, 2004 at 03:36:40 PM EST

then I'm going to -1 it. Next I'm going to modbomb the author with 0's. Then I'm going to spam his email all over slashdot.

Because whitespace is at best a delimiter, and I can't visually tell the difference between multiple spaces and a tab. So how the heck am I gonna read a python program?

Nice. (none / 0) (#8)
by ZorbaTHut on Fri Oct 29, 2004 at 04:30:42 PM EST

I use XML as data storage for a game I'm working on - my philosophy is that, if a horrible disaster occured and all of my source were to be deleted, I should be able to open every source data file up in an existing program and look at it there. (In other words, .png is fine because I can use Adobe, and .wav is fine because I can use Winamp. Any other source format should be XML.)

I may change it over to YAML.

. . . Or I would, but it seems there's no YAML C++ implementation. Bah. I may have to whip up a very quick back-of-the-envelope one.

Minor detail (none / 0) (#9)
by jd on Fri Oct 29, 2004 at 04:31:52 PM EST

XML is a set of pre-defined templates within SGML. To call it a subset of SGML is not strictly correct, because XML is defined on top of SGML. (Unlike HTML, which really IS a subset, as it is implemented entirely seperately.)

I like the look of YAML, and will likely look at it much more closely.

I've been working on a system of my own, though, for some time which stores data in a "folded" format, rather than the flat format most people are familiar with. I then have a second heirarchy, matching the first in structure, containing metadata on what the block is supposed to look like.

The idea is that I can then "flatten" the file into any markup I like - HTML, XML, LaTeX, etc. All I need is a mapping which tells me how to translate the metadata into the notation used, and then just do a suitable substitution.

The reason I favour this kind of approach is that I don't see why I should be telling users how they should be viewing the document. That should be up to them. I should be concerned only with the content, NOT the presentation.

Sadly, too much emphasis these days IS placed on the presentation, which means web pages (especially) are impossibly cluttered as "designers" try to look impressive, rather than be functional.

The only purpose in storing data on a computer is to deliver it. "Write Once, Read Never" approaches should be left to rot.

a shortcoming you overlooked (2.77 / 9) (#11)
by dimaq on Fri Oct 29, 2004 at 05:05:52 PM EST

there's one property of xml that is so obviousely not present in your yaml - you can take a block of xml and stick it anywere in another xml document, without any modifications.

your yaml would need whitespace (nesting) adjustment.

another bit that I find erratic in python and complete murderous in your yaml is indentation nesting - granted good code shouldn't be over-indented anyway, so python gets off the hook easily - no consider a yaml document that has, say, 200 levels of nesting - would it still look that pretty? would you even be able to edit it sensibly?

xml, by contrast, has a [semi-standard] convention on white-spaces that allows you to show/edit/store it either as indented or as one long line.

besides, just because you chose to group elements in your yaml representation differently from the xml representation, doesn't mean it's a great special property of yaml!

p.s. all this said I'd like to see something more human-readable than xml (and yaml seems to be for short data dumps) as a standardized config file format for example. perhaps even as some request format.

whitespace (3.00 / 3) (#15)
by forgotten on Fri Oct 29, 2004 at 07:12:29 PM EST

i can live with whitespace in programs.

but having whitespace significant in data files is just asking for trouble.

--

All parsing sucks cock. (2.14 / 14) (#17)
by five volt on Fri Oct 29, 2004 at 07:59:17 PM EST

Why are people so obsessed with using text files as the between-programs representation of non-text data?

The only easily readable way to do it is to put every item on its own line, and throw in some more whitespace every time you want to nest. Like your YAML. Then it looks sorta okay in a text editor. You just need to page up and down to see what you're nested under.

But it's still retarded to make a computer store its data in semi-readable human format. Why not store it in a format that computers are good with (binary with pointers), and then do quick conversions when, perchance, a human wants to look at/manually edit it? Hey computer scientists, stop inventing new languages for a minute and make binary tools.

In a sensible computing system, a core program should not need to care about the contents of a string. Filenames suck cock. XML sucks cock. URLs suck cock. Text config files suck cock. Text protocols suck cock.

The only reason I choose to live with the pain is that whoever designed UNIX decided that standard programs should talk to each other in a mishmash of English and line noise. WAY TO GO, GENIUSES. So, for every one line of real code (you know, crunching numbers - what computers were originally made for), I seem to have five lines of cooksocking parse code - reading some stupid config file, figuring out what the fuck the text is saying, making sure user didn't put a string in where I expect a floating point number, and dying gracefully if anything was wrong. And then I have to output my numbers into a text file so that gnuplot can read it.

Blah.

--
Ruthlessness kicks ass.

Bah, doesn't solve size and XML just as readable (2.50 / 2) (#24)
by jongleur on Sat Oct 30, 2004 at 12:36:31 AM EST

.. with the right tool. If you have a computer to edit a file, you can find some viewer that can peel off the tags and indent it to look like YAML anyway. Piddly idea IMO. But I'll abstain.
--
"If you can't imagine a better way let silence bury you" - Midnight Oil
This looks like .plist structure in NeXT/Apple n/t (2.00 / 2) (#30)
by israfil on Sat Oct 30, 2004 at 07:18:53 AM EST


i. - this sig provided by /dev/arandom and an infinite number of monkeys with keyboards.
bottom's up gay (1.09 / 11) (#35)
by Requiem for a Dream on Sat Oct 30, 2004 at 01:56:15 PM EST

no comment

generic lisp-bigot response (3.00 / 9) (#36)
by Delirium on Sat Oct 30, 2004 at 02:32:56 PM EST

(babooey
   (computer cpu1)
   (firstname Bob)
   (lastname Abooey)
   (cell 555-1212)
   (addresses
     (address babooey@example1.com (password xxxx))
     (address babooey@example2.com (password xxxx))))

repeat after me (2.00 / 3) (#37)
by the sixth replicant on Sat Oct 30, 2004 at 02:51:59 PM EST

encoding encoding encoding

how does YAML deal with it? I bet you it's just as stupid as XML

Ciao

So how does this compares with old file structures (none / 1) (#49)
by lukme on Sat Oct 30, 2004 at 07:33:27 PM EST

like the ones used to store files on tape.


-----------------------------------
It's awfully hard to fly with eagles when you're a turkey.
Why I stopped worrying and learned to love XML (2.66 / 3) (#57)
by xL on Sun Oct 31, 2004 at 07:37:41 AM EST

When XML started to become one of those New Things, I went into righteous denial. The concept of using XML formatting for data exchange seemed, well, bloated for most purposes. XML-formatted structured data, at first and second glance, looks harder to parse both for humans and machines. For long, I worked with a standard textformat that served me well for structured data, something along the lines of:

foobar {
  quux="bar"
  foo="baz"
  wobble {
    wewp=42
  }
}

Then came the point that I ran into situations where I had to communicate with software that used XML data. This is where things started to get complicated. A lot of my time started to go into tedious transformation code getting the external XML data and putting it into an internal representation.

I started to get exposed to more and more XML formats over time and the transformation code became more and more sophisticated. Up to the point that the internal representation of data started to follow most of the concepts of XML (keyed dictionaries or arrays of objects with attributes). A sensible text format for storing such data would be quite tricky, it should distinguish the following characteristics for objects:

  • Some form of indication of an object's type or class
  • Optional and/or mandatory attribute values
  • Child nodes with a key attribute or as part of an array
It daunted me that, whatever text format I could come up with to cover all bases would be just as hard to parse in software as XML and not necessarily easier to parse by humans either. That's when I decided to just stop horsing around and use XML natively (with a backdoor option to use binary structured storage in situations where it's important that the machine can parse things easily and legibility for humans is not important).

YAML looks sensible. It's indeed a bit easier to read than regular XML. But, already, it's not a free ride for machine parsing. And, even if you embrace it, you will run into situations where you have to talk to software that expects XML. Keeping code around for two ways of storing and parsing the data is not going to make your life easier.

XML Configuration Files. (3.00 / 5) (#60)
by bhearsum on Sun Oct 31, 2004 at 12:00:23 PM EST

XML has it's uses, I won't deny that. But this article made me think of XML configuration files.

Ugliest. Files. Ever.

I remember when I wanted to stream some audio to a couple friends I decided to try out icecast2, which is a nice piece of software. Now, normally for me to configure something like this I just open up the config file and browse through it, simple as that. XML makes the ugliest fucking files ever. It's redundant and silly, and one line usually doesn't fit in a standard terminal which is an even greater annoyance.

Let's compare,

<icecast>
    <limits>
        <clients>100</clients>
        <sources>2</sources>
        <threadpool>5</threadpool>
        <queue-size>102400</queue-size>
        <client-timeout>30</client-timeout>
        <header-timeout>15</header-timeout>
        <source-timeout>10</source-timeout>
    </limits>
</icecast>

with,

# Limits
clients = 100
sources = 2
threadpool = 5
queue-size = 102400
client-timeout = 30
header-timeout = 15
source-timeout = 10

I find myself squinting to find the value for each variable. It's ridiculous

</off-topic-rant>

Misses the point(s) (3.00 / 2) (#63)
by danharan on Sun Oct 31, 2004 at 01:06:39 PM EST

First of all, you shouldn't be starting off with DTDs for a new project, but XML Schema.

YAML could be a neat format for storing one application's data. Heck, people still use good old-fashioned key=value pairs in text files. You can even just serialize objects... Whatever works.

Against XML though, YAML just can't compete. How do you communicate the acceptable structure of a document? This is essential for programs to be able to write valid files- and other programs to verify that what they are reading is indeed valid. And perhaps the coolest thing about XML is XSL- I have a hard time seeing how YAML could offer a robust alternative in that domain.

Your first impulse was correct (2.66 / 3) (#65)
by daviddisco on Sun Oct 31, 2004 at 01:29:59 PM EST

The whole thing would have been so easy if you had just stuck the data in a simple database. A database! They are for storing a retrieving data. They have been around a long time and are very mature. There are free databases. Every language has a data access API. XML is great for moving data and messages between disparate systems. Maybe YAML has a use for human edited files (mostly config files). They are great, but know when to use'em, know when to lose'em
##I run a geography related site at globalcoordinate.com##
Wow, YAML really sucks (2.50 / 2) (#70)
by trhurler on Sun Oct 31, 2004 at 02:05:44 PM EST

First of all, what does it offer that I don't get with a recursive descent parser that'll take about ten minutes to write(after all, using indentation like that makes it obvious that this isn't intended for anything particularly complicated?)

Second, this automated translation into data structures sounds neat, until you realize that you can write something that does that with any number of data formats in a few minutes' time in any of those scripting languages you're talking about.

Third, your final conclusion vs XML is that it is about the same size, but that you think it is more readable. Er... I hate to bust your balls on this one, because I think I already know the answer, but have you noticed that there are XML tools that will take a DTD or a Schema, let you edit the data in its intended structure without having to mess with tags much at all, make sure you don't do anything that isn't allowed, and so on? I can't imagine that being less readable than this. Also, although it isn't typically used, whitespace is mostly ignored in XML, so you CAN format XML to be very readable if it is important, even with the tags.

--
'God dammit, your posts make me hard.' --LilDebbie

Biggest mistakes (3.00 / 2) (#73)
by lookout on Sun Oct 31, 2004 at 02:43:01 PM EST

XML: attributes and elements

There is a subtle semantic difference between

<person sex="male">
    <name>John</name>
    <surname>Doe</surname>
</person>

and

<person>
    <sex>male</sex>
    <name>John</name>
    <surname>Doe</surname>
</person>

but was it really necessary to mess up the syntax to encode that semantic difference ? IMO attributes should have been left out.

YAML: whitespace significance

Bad, bad. If you've worked in worldwide teams where everyone insists on his/her own tab settings, you'd know. We've left behind the fixed line layout of Fortran and punched cards a long time ago.


Boooooooo (none / 1) (#77)
by sethadam1 on Sun Oct 31, 2004 at 03:28:28 PM EST

Unfortunately, what most people use XML for could be accomplished in .ini files.  In fact, other than RSS feeds, I don't many many who use XML who should either be using simpler text files or should be in a database.  

Along with YACC? (1.50 / 2) (#79)
by Sen on Sun Oct 31, 2004 at 03:43:07 PM EST

Sorry, the pretentious "yet another" turns me off to both. I used JavaCC, by the way--very good.

Been there, done that (1.00 / 3) (#96)
by rtmyers on Sun Oct 31, 2004 at 07:15:26 PM EST

Hey, before you went to all the trouble to spend your exhausting 5ms thinking this stuff up, why didn't you look to see if someone else had already done it, since they have? I'll leave it up to you to find the details, since you obviously need practice doing research on the net, but I'll give you hint: Google for "python-like XML".

STAR format (3.00 / 2) (#104)
by Thought Assassin on Sun Oct 31, 2004 at 10:19:19 PM EST

There's a format called STAR that's been used by various scientists for donkey's years. The files look very similar to YAML, but it already has strong support for a form of schema (known as dictionaries) that is a bit more powerful but sometimes a bit clunkier (numerous legacy tags) than XML schema.

Personally, I think all three schema languages have a long way to go, although there are better (but by no means perfect) alternatives out there for XML, and that the expressiveness of your schema language is far more important than the actual format. So although I favour the terser STAR/YAML style, I use XML personally because the chances are that's where

Disclaimer: I worked on the STAR project for a while. So I guess I'm partly to blame if the dictionaries don't live up to what I wish they were. I hope one day I'll have a chance to work on that stuff again.

human readability (2.50 / 2) (#153)
by maccha on Tue Nov 02, 2004 at 07:04:05 AM EST

Why is it that XML / YAML or whatever all insist on being readable and editable with a plain text editor?

As someone else pointed out, no-one who uses XML seriously would try to get by without tools for efficient viewing, editing and validation. And if you're committed to using tools anyway, why not go for the compactness and efficiency of binary formats?

It's a funny world where entire applications get written in C and C++, but nobody minds the resources involved in storing and parsing those start and end <namespace:tag attr="value"> tags </namespace:tag>.


(Or am I just talking a load of crap?)


YAML must die (1.00 / 3) (#181)
by aminorex on Sun Nov 14, 2004 at 08:50:21 PM EST

Anything where whitespace is significant is worse than nothing at all.

Why YAML? Why not? | 184 comments (166 topical, 18 editorial, 2 hidden)
Display: Sort:

kuro5hin.org

[XML]
All trademarks and copyrights on this page are owned by their respective companies. The Rest © 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!