This post is a repost due to the loss of my original blog location
A chat with a work colleague opened my eyes to the reality that many programmers still use DataSets, and consider them powerful and useful tools. Many of my peers would react to the idea much like a vampire confronted by sunlight (a proper vampire, not these metrosexual emasculated snivellers popularised in the Twilight series), pulling back in horror and making a hissing noise before launching a full blooded attack at the throat. So why do many developers hate them so much, and why are they still popular?
Datasets have history, they’ve been around since .NET 1.0 and have not changed since then. Code using them still works great, so why change it? Also, they do their job okay. They allow you to use your data in a mechanism familiar to most developers: tables, columns and relationships. Not only that; they also allow you to sync changes to back and forth to multiple back end stores. Finally they assist you in maintaining data consistency by handling most data concurrency issues for you. So, given all that, why would we not use DataSets?
They’re quite slow and bloated, they have all this plumbing for handling concurrency and data versioning and serialization which is great, but normally you don’t need a lot of that stuff.
Regardless of the formatter being used, the DataSet always serializes first to XML. What's worse, the DataSet uses a pretty verbose schema—the DiffGram format plus any related schema information. Now take a DataSet with a few thousand records in it and imagine such a large chunk of text traveling over the network with no sort of optimization or compression (even blank spaces aren't removed).
[He then goes on to explain some ways to improve matters]
- Binary Serialization of DataSets, Dino Esposito, http://msdn.microsoft.com/en-us/magazine/cc163911.aspx
DataSets are complex objects with a hierarchy of child objects, and as a result, serializing a DataSet is a processor-intensive operation. Also, DataSet objects are serialized as XML even if you use the binary formatter. This means that the output stream is not compact.
- How To: Improve Serialization Performance, http://msdn.microsoft.com/en-us/library/ms979193.aspx
DataSets serialize naturally to XML quite well and you have lots of control over that. Typed DataSets have the XSD with some properties that control that (and of course you can do it programmatically too). But one of the common problems with remoting DataSets is that the default binary serialization is actually just the XML Serialization ASCII. Crude. Some people have even used this fact to extrapolate that the DataSet is internally XML – which isn’t true.
- DataSet Serialization: Smaller & Faster, http://blogs.objectsharp.com/cs/blogs/datasetfaq/archive/2004/06/10/614.aspx
There’s little support for decent inheritance scenarios, and I must admit to being appalled at the way they handle business rule validation (yes, there is actually support for it somewhere in there). They have still not managed to get support for nullable value types, you cannot use custom types or even standard types such as Uri or IPAddress or a million other types you might want your column to be. Additionally, the standard serialization offered by DataSets is just as bloated (although Read/WriteXml isn’t too bad in a single-table scenario).
DataSets are just horrible, horrible design. As mentioned above, they have a great deal of code that is rarely used, yet often slows down processing. They do not follow a good separation of concerns, so there are numerous ways of reading and writing data, yet they don’t allow much customisation of the output. Their support for validation and business rules is laughable. They force you to put all your code in one giant .cs file, or separate it from the DataSet entirely. They do not support the POCO (Plain Old CLR Object) model of programming at all, so if you want Customer and Order objects, tough, you’d have to wrap them around DataRows, which stuffs up any chance of you using those objects across WebServices/WCF since DataRow is not serializable. Even if you’re happy serializing them across web service boundaries, non-.NET consumers will hate you forever.
But worst of all is what they encourage you to do in your code. DataSet encourages you to treat data as completely separate from behaviour, breaking encapsulation, removing polymorphism and encouraging brittle and massively inter-dependent code. Instead of simple methods with overrides you have masses of switch statements and if statements. I’m not saying that you have to work this way with DataSets, they just make this way easier and make more “correct” ways of writing code difficult. By shoving everything together, DataSets also make it tricky to perform decent unit tests; you have to test the entire CRUD stack as a unit instead of being able to break pieces (such as validations) out for individual testing.
When Microsoft first came out with .NET, strongly typed DataSets were presented in every book and by many Microsoft evangelists as THE new way to persist information to the database. … All this “ease of use” is now a maintenance nightmare.
Business rules should go in business objects. The problem with DataSets is that there is no where [sic] to place business rules. The business logic inevitably ends up in the user interface layer (where it should not go). Microsoft tried to solve this problem in .NET 2.0 by adding partial classes. This did nothing to combat having the same tables in multiple DataSets across the application. If the same table, such as a customer table, was throughout the many screens, business rules would be duplicated for every instance in a partial class.
- Why should you use business objects in .NET and not DataSets?, Greg Finzer, http://www.kellermansoftware.com/t-articlebusinessobjects.aspx
They encourage you to think about tables, rows, columns when you should be thinking about Customers, Orders and business rules. They move your eye level in the wrong direction, away from the problem domain and towards the implementation domain. They discourage abstractions and generalizations. Our job as software developers is to manage complexity, and in almost every situation that I’ve seen DataSets used, they’ve increased the complexity to little or no benefit.
An ideal environment for creation of business applications should allow developers to describe the business logic and state of the problem domain which they are modeling with minimum or no "noise" coming from the underlying representation and the infrastructure that supports it. Applications should be able to interact with the stores that maintain the persistent state of the system in the terms of the problem domain; specifically in the terms of a conceptual domain model, completely separated from the logical schema of the underlying store.
While the relational model has been extremely effective in the last few decades, it's a model that targets a level of abstraction that is often not appropriate for modeling most business applications created using modern development environments.
- The ADO.NET Entity Framework Overview, http://msdn.microsoft.com/en-us/library/aa697427(VS.80).aspx
So, should you never, ever use DataSets? No, I wouldn’t go that far. They have their place, albeit a very much more limited one than many people use them for. If you just need something quick and simple, use Linq to SQL, otherwise use an Object Relational Mapper (ORM). If you absolutely have to have disconnected client side relational tables with change tracking and minimal business rules or validations, then sure, DataSets might be an acceptable technology, although even then I’d motivate against it. I personally only use DataSets on throwaway Proof Of Concepts and User Interface mockups.
Jeff Atwood feels that you should either pick Objects or Tables, but I disagree vehemently. The important thing is to consider your solution domain. If data storage size, speed and consistency is what you’re after then you should be dealing in tables and columns. If you’re communicating between disparate systems then you need to be thinking about messages. Finally, if you’re dealing with editing, validations, business rules, then you must think about objects, services, interfaces and inheritance hierarchies. It doesn’t make sense to me to avoid catering for all three scenarios with one simple object. Most ORM’s can handle this easily, and can also give you all the data concurrency, offline support and versioning that DataSets do, and usually with far better performance.
Looking for a Document Management System? Signate 2010 is powerful, secure and easy to use.