This session by Rebecca Parsons was very insightful. Data models change as products evolve through iterations. Scott Ambler says that relational databases evolve in agile manner by refactoring/migration pair in small steps. Like everything else; data changes. Our understanding and access patterns change- requiring database migration.
Code changes can be easily managed in version control repositories. Data is not version controlled – but data models are. Developers need to meticulously create and maintain data migrations so that data can be rolled forward or backward in sync with the code. Developers also need to provide default values for columns in records created by older versions where those columns didn’t exist. Over all data migration is a hairy problem.
One would tend to think that data migrations for Big Data would be a bigger problem. NoSQL databases are characterized as non relational , schemaless, cluster friendly, open source, 21st century web. You have Raven, Couch and Mongo – which work with Documents, HBase and Cassandra which work with columnar data, Riak and Redis which work with Key-Value and Neo4j which works with graph. Each of these have different ways of dealing with migrations.
NoSQL databases like MongoDB provide a clean way to address this problem. The loose structure of MongoDB allows data from multiple versions to co-exist in the same database. All data doesn’t have to look the same. This makes evolution easier- e.g. no change in data is needed to add a field. Thus there is no need to run large scale migration to roll a version forward or backward.
This does not mean that migrations are never needed. You do require migration to add a non sparse index in MongoDB. Migrations in graph databases like neo4j are a bit more complex. However we can conclude that NoSQL databases can be modeled for easier migration.