NoSQL is a Premature Optimization

July 2011
M	T	W	T	F	S	S
	1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Posted by Bob Warfield on July 22, 2011

There’s been a lot of back and forth lately from the NoSQL crowd around Michael Stonebreaker’s contention that reliance on relational technology and MySQL has trapped Facebook in a ‘fate worse than death.’ This was reported in a GigaOm post by Derrick Harris. Harris reports in a later post that most of the reaction to Stonebreaker’s contention was negative:

By and large, the responses weren’t positive. Some singled out Stonebraker as out of touch or as just trying to sell a product. Some pointed to the popularity of MySQL as evidence of its continued relevance. Many questioned how Stonebraker dare question the wisdom of Facebook’s top-of-the-line database engineers.

Harris, Jim Starkey, Paul Mikesell, and Curt Monash all take a stab at rehabilitating Stonebreaker’s argument in the second post. Their argument boils down to, “Yeah, Facebook did it, but only because they have great engineers, spent a fortune, and endured a lot of pain. There are easier ways.”

Sorry fellas, time to annoy the digerati again, and so soon after bashing Social Media. I disagree with their contention, which is well expressed in the article by this Jim Starkey quote:

If a company has plans for its web application to scale and start driving a lot of traffic, Starkey said, he can’t imagine why it would build that new application using MySQL.

In fact, I would argue that starting with NoSQL because you think you might someday have enough traffic and scale to warrant it is a premature optimization, and as such, should be avoided by smaller and even medium sized organizations. You will have plenty of time to switch to NoSQL as and if it becomes helpful. Until that time, NoSQL is an expensive distraction you don’t need.

The best example I see for why that’s the way to look at NoSQL comes from Netflix, which is mentioned towards the end of the article. I went through several expositions by Netflix engineers on their experience transitioning from an Oracle Relational data center to one based on NoSQL in the form of Amazon’s SimpleDB and then later Cassandra (the latter is still an ongoing transition as I understand it). You’re welcome to read the same sources, I’ve listed them at the bottom.

Netflix decided to move to the Cloud in late 2008 to early 2009 after an outage prompted them to consider what it would take to engineer their way to significantly higher up time. They concluded they couldn’t build data centers fast enough, and that as soon as one was built it was swamped for capacity and out of date. They agree with Amazon’s Werner Vogels that building data centers represented “undifferentiated heavy lifting”, and was therefore to be avoided, so they bet heavily on the Cloud. These are smart technologists who have been very transparent about their experiences, so it’s worth learning from them. Werner Vogels reaction to Stonebreaker’s remarks about Facebook are an apt way to start:

Scaling data systems in real life has humbled me. I would not dare criticize an architecture that holds social graphs of 750M and works.

The gist of the argument for NoSQL being a premature optimization is straightforward and rests on 3 points:

Point 1: NoSQL technologies require more investment than Relational to get going with.

The remarks from Netflix are pretty clear on this. From the Netflix “Tech” blog:

Adopting the non-relational model in general is not easy, and Netflix has been paying a steep pioneer tax while integrating these rapidly evolving and still maturing NoSQL products. There is a learning curve and an operational overhead.

Or, as Sid Anand says, “How do you translate relational concepts, where there is an entire industry built up on an understanding of those concepts, to NoSQL?’

Companies embarking on NoSQL are dealing with less mature tools, less available talent that is familiar with the tools, and in general fewer available patterns and know-how with which to apply the new technology. This creates a greater tax on being able to adopt the technology. That sounds a lot like what we expect to see in premature optimizations to me.

Point 2: There is no particular advantage to NoSQL until you reach scales that require it. In fact it is the opposite, given Point 1.

It’s harder to use. You wind up having to do more in your application layer to make up for what Relational does that NoSQL can’t that you may rely on. Take consistency, for example. As Anand says in his video, “Non-relational systems are not consistent. Some, like Cassandra, will heal the data. Some will not. If yours doesn’t, you will spend a lot of time writing consistency checkers to deal with it.” This is just one of many issues involved with being productive with NoSQL.

Point 3: If you are fortunate enough to need the scaling, you will have the time to migrate to NoSQL and it isn’t that expensive or painful to do so when the time comes.

The root of premature optimization is engineers hating the thought of rewriting. Their code has to do everything just exactly right the first time or its crap code. But what about the idea you don’t even understand the problem well enough to write “good” code at first. Maybe you need to see how users interact with it, what sorts of bottlenecks exist, and how the code will evolve. Perhaps your startup will have to pivot a time or two before you’ve even started building the right product. Wouldn’t it be great to be able to use more productive tools while you go through that process? Isn’t that how we think about modern programming?

Yes it is, and the only reason not to think that way is if we have reason to believe that a migration will be, to use Stonebreaker’s words, “a fate worse than death.” The trouble is, it isn’t a fate worse than death. And yes, it will help to have great engineers, but by the time you get to the volumes that require NoSQL, you’ll be able to afford them, and even then, it isn’t that bad.

Netflix’s story is a great one in this respect. They went about their NoSQL migration in a clever way. They built a bi-directional replication between Oracle and SimpleDB, and then they started moving over one app at a time. They did this against a mature system rather than a new buggy untested by users system. As a result, things went pretty quickly and pretty smoothly. That’s how engineers are supposed to work: bravo Netflix!

I have a note out to Adrian Cockcroft to ask how long it took, but already I have found a reference to Sid Anand doing the initial “forklifting” of a billion records from Oracle to Simple DB in about 9 months, and they went on from there. When Sid Anand was asked what the most complex query was to convert from Oracle to NoSQL he said, “There weren’t really any.” He went on to say you wouldn’t convert your transactional data anyway, and that was pretty much it.

Conclusion

The world loves to see things in black and white. It sells more papers. Therefore, because some situations benefit from NoSQL for scaling, we hear a hue and cry that everyone must embrace NoSQL immediately. Poppycock. You can go a long long way with SQL-based approaches, they’re more proven, they’re cheaper, and they’re easier. Start out there and if the horse you’re riding is strong enough to carry you to NoSQL scaling levels you can tackle that when the time comes. Meanwhile, avoid premature optimizations. You don’t have time for them. Let all these guys with NoSQL startups make their money elsewhere. You need to stay agile and focused on your next minimum viable deliverable.

Extra! Extra!

This post is now, at least for a time, the biggest post ever for Smoothspan. Be sure to check out my follow up post: Biggest Post Ever Redux: NoSQL as a More Flexible Solution?

Articles on the Netflix NoSQL Transition

Sid Anand’s Experience “Forklifting” the Data from Oracle into SimpleDB

Adrian Cockcroft’s NoSQL Migration Slides

Sid Anand’s QCon Video on NoSQL at Netflix

This entry was posted on July 22, 2011 at 3:02 pm and is filed under cloud, data center. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

47 Responses to “NoSQL is a Premature Optimization”

adrian cockcroft said

July 22, 2011 at 4:53 pm
The slide deck you link to is getting old, a much more relevant and up to date version is here http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra

While it took a long time to get the first cloud data sources up and running starting two or three years ago, that is not the case now. In many ways, NoSQL is much simpler and easier to learn and implement than relational SQL. It only takes a few weeks to migrate a data source from one NoSQL system to another (e.g. SimpleDB to Cassandra) and we have seen this with several teams at Netflix. It takes a little longer to figure out how to break down a complex monolithic relational schema into a set of simpler de-normalized data sources, but we are splitting off data sources one at a time and shrinking our Oracle dependency down to zero in the coming months. We need to be able to run the streaming service with no active connection to our datacenter and Oracle systems.

The right technology depends on your application and ambitions. Ruby on Rails with MySQL lets you build a web site incredibly quickly with a small team as you iterate to figure out whether you have the product concept right. Since Netflix has hundreds of developers building complex algorithms and business logic the Java based Struts/Tomcat/Cassandra stack gives us industrial strength tools to keep everything scaling in terms of developer productivity as well as traffic.

Reply
Subraya Mallya said

July 22, 2011 at 6:42 pm
Bob, the fact that it takes long time to migrate the data first up from a Relational to NoSQL is not just Netflix’s experience. The premier expert on MySQL performance optimization – Jeremy Zawodny – highlights the challenges Craigslist faced in moving the data from MySQL to MongoDB. http://www.10gen.com/presentation/mongosf2011/craigslist. It is just the nature of the workload.

Agree with you assertion that for teams that do not have the luxury of a army of engineers that a highly profitable company like Netflix has – NoSQL is taking on trouble earlier than one really needs to.

Reply
- Bob Warfield said
  
  July 24, 2011 at 3:54 pm
  Jeremy is also the guy that said he had to give up learning Rails because it was too hard, that relational mapping is too hard, and that this post is crap.
  
  I didn’t know he is the world’s foremost expert on much, but he certainly does have an opinion and a job at a high profile place.
  
  Cheers,
  
  BW
  
  Reply
  - Subraya Mallya said
    
    July 24, 2011 at 4:50 pm
    That is how the Mongo guys characterized him. Figures why.
Dwight Merriman said

July 22, 2011 at 8:53 pm
hi a couple thoughts. first i’m not sure the “premature optimization” analogy holds in this case. in general i totally agree that in code premature optimization is silly. however, it is often quite easy to optimize later after profiling. with a database, refactoring later is not easy. i know in the past i have worked on projects where simply switching from one relational database to another was a daunting task, much less switching to a fundamentally different technology. the switch could involve massive rewrites.

one other thought: i think this space will have two benefits, scaling of course but also development agility. when done right, these tools can make developers of data backed applications able to write things faster. i know several mongodb developers who love it for this reason even though their scale requirements are modest.

Reply
- Bob Warfield said
  
  July 22, 2011 at 9:17 pm
  Dwight, if it can be done in the span of less than a year on a major mature product like Netflix, we can’t really regard that as prohibitively hard and massive rewrites. At worst, it is one major release cycle, albeit not a very agile major release.
  
  We have to contrast that cost against whatever friction we’ve endured on every single other release up to the point where we needed the scaling bandwidth. So if the tools are less mature, contain less functionality, and you can hire fewer people with experience, you’d have to concede its more costly to use those tools, nu? Though at least we can thank companies like Netflix for helping make them more mature sooner, yet are they really on a par with tried and true relational? Even close?
  
  Lastly, if conventional tools work just as well up to a relatively large scale, and they clearly do, how can it not be a premature optimization to forgo productivity and invest in all this scaling headroom that your organization doesn’t even know whether it will use?
  
  Isn’t that the very definition of a premature optimization then? Don’t we have to look beyond how shiny (Squirrel!) the new new thing is to whether it actually has utility?
  
  Lastly, are these mongodb developers really more productive than say the Ruby on Rails with MySQL example Adrian gives? Really?
  
  Developers love to persuade themselves of the abstract value of some tool or other, but when it takes a single major release cycle to switch, you had better be ahead in a single release cycle too. I maintain that’s just not going to be the case for NoSQL until you need to scale big time.
  
  Beware too the team that wants to tell you how easy it could be if they had to do it again. That team now knows all sorts of things they didn’t know and you still don’t know your first time around that block.
  
  Cheers,
  
  BW
  
  Reply
Wes Felter said

July 23, 2011 at 4:48 am
A year ago I would have agreed with you: start on MySQL and migrate to NoSQL if and when you need to. But the game has (supposedly) changed; Stonebraker, Starkey, et al. are not proposing NoSQL at all. They’re proposing to start on NewSQL and stay on NewSQL; this is possible because (supposedly) NewSQL is both easy to use and scalable so there’s no need to trade off.

Reply
J Chris Anderson said

July 23, 2011 at 5:45 am
NoSQL is not just about scale (although it can be). In my opinion the main benefit of giving up relational constraints is flexibility. Find me a relational system that can credibly do offline replication to mobile devices. Each of the options I’ve seen are full of gotchas. But if you replace the relational model with an MVCC document model, you can suddenly solve an entirely new class of problems.

Read about how Apache CouchDB is being used in rural Africa to bring collaborative data and web technologies to health clinics that don’t have reliable internet access: http://radar.oreilly.com/2011/03/couchdb-zambia-healthcare.html

These same patterns are applicable to mobile connections in the 1st world.

Reply
jeduden said

July 23, 2011 at 6:53 am
The question NoSQL vs. SQL is an architectual question. The properties and behaviours of SQL and NoSQL database leak to the whole application.

BTW: facebook split up their database into 4000 MySQL shards in order to scale their store. I don’t think they really liked this way of dealing with scaling. It is certainly not elegant.

If you want to make a switch you need to prepare your architecture for it otherwise depending on the complexity of the project and the man-power behind it, you won’t be able to do it efficiently.

NoSQL databases have an advantage even when your are small. Scaling is not the only advantage that some NoSQL products have. Flexible datastructures (but still able to use an index ) is the second on.
A better query language-to-application language-fit might be a the third depending on the project.

We should all accept: the times where you can “just choose an SQL database” is over.

Everybody should know about all the trade offs, advantages and disadvantages of the NoSQL and SQL databases.
Without that knowledge go ahead and choose the which products fits best.

Just talking about scaling is not enough – datastorage is a complex topic. Don’t try to oversimplify it.

Reply
NoSQL Is A Premature Optimization said

July 23, 2011 at 12:41 pm
[…] Or so Bob Warfield writes. I happen to agree with the title — optimization using NoSQL means using a server cluster to split the load and scale up, and such an optimization is premature unless you are already having the millions of visits it takes to feel growing pains. If I start off on a new project and decide «I’m going to use NoSQL so that it will scale when my project will have millions of users» then I am prematurely assuming that your initial NoSQL strategy will fit the actual million-user scenario that will come up years from now. In fact, the bottleneck will probably be in a feature I didn’t even think of yet, and making it work will probably involve changes in the persistence model. But Bob Warfield goes further than the premature optimization argument: Point 2: There is no particular advantage to NoSQL until you reach scales that require it. In fact it is the opposite, given Point 1. […]

Reply
Theo said

July 23, 2011 at 1:48 pm
“It’s harder to use.”

Strikes me as a strange thing to put forward as an argument for relational databases vs. post-relational. I’m not sure about what kind of applications you’ve built, but the overwhelming majority of applications built today use object-relational mapping frameworks. I think that in itself shows how hard it is to use a RDBMs. The amount of code needed just to set up user account management in an application is just ridiculous. If you need a mapping tool just to represent your model in a database, it cannot be said to be easy to use. RDBMs are great for tabular data, but they really suck at storing object and, well, relations, actually.

Most people don’t choose post-relational databases for their scaling properties, but for their ease of use. Getting started with MongoDB and CouchDB is so much easier than trying to cram your objects into MySQL using an ORM.

That being said, not all post-relational databases are easy to use either. Using a key-value store it may be just as hard to figure out how to store your data as with a relational database. You use a tool that fits the problem, and sacrifice ease of use for scaling properties.

RDBMs don’t fit many problems well, but they fit some. Let’s use them for those. It’s just as much premature optimization to use a relational database — not every application (perhaps not even most) needs their unique features (but almost all will suffer from the features they do not have).

Reply
Yorick said

July 23, 2011 at 3:27 pm
“Point 2: There is no particular advantage to NoSQL until you reach scales that require it. ”
There are various cases possible where the relational model just offers too much limitations. Various NoSQL databases work with document stores, which offers plenty of advantages, such as easier storage of trees or other nested data. The relational model isn’t made for this and requires plenty of workarounds, which is worse than premature optimisation in my book.

Reply
johnfx said

July 23, 2011 at 4:41 pm
Although quite provocative, this was a really good article that needed to be written. I’ve had a niggling notion since attending a few NOSQL talks (sales pitches) that it smelled of premature optimization. It just felt like the n00b programmer’s lament “My program won’t compile, the compiler must be broken.” Premature de-normalization of relational databases by programmers has been a recurring theme in my career, and NOSQL just seems too supportive of that mindset.

You may take some heat for touching this delicate subject, but I think it is very useful and necessary to question new technologies built on the paradigm that “everything we know about (insert mature technology) is wrong”. That said, I do appreciate innovators like those in the NOSQL movement for pushing us to rethink the basics once in a while. If they can deliver on their promises, then I’ll probably come around.

Reply
Darren Schreiber said

July 23, 2011 at 6:17 pm
Most of the responses to this article are accurate, but way too polite.

This post is inflammatory and fulll of FUD. It doesn’t focus on “right tool for the right job” as a core concept and point 2 and point 3 have poorly backed up factual data, citing a small number of examples that fit this FUD argument. Then the conclusion tries to sum things up as if there are enough facts to come to such a conclusion. Ridiculous.

There is no question that both MySQL and NoSQL have their place. I’m a huge fan of MySQL and it was great for analytics at a past firm (click and open tracking of emails). I used to run one of the top 10 largest MySQL installs in the world thanks to the success of that software – it powered our entire company and grew to many terabytes of data in a time when nobody knew what a terabyte was. When we hit bottlenecks we actually contracted help from the MySQL AB folks, who responded “wow, you have that many rows? I didn’t even think that would work.” Yes, we pushed it’s limits. I know MySQL inside and out, and have spent many a night recovering busted binary logs and dealing with overgrown tables and schema modifications, slow queries, etc. You’re correct – there was a rich community of tools to debug and assist with troubleshooting, but that also didn’t equate to it being easy to use, manage or that it was cost-effective. Nor was it easy to find talent to deal with these problems – actually, it was near impossible. There were good days, there were bad days, but overall it was fantastic for what we were doing.

Now I work on hosted phone systems. Customers are constantly requesting new features and we don’t have the luxury of taking down the system or locking tables and copying them to change the schema all the time. We can’t afford master/slave replication strategies with the high number of writes going on. Views needed to be created and destroyed quickly. We CAN afford eventual consistency. NoSQL has been a game changer for us. CouchDB is simple to use and comes with a GUI to manage it, built-in. BigCouch adds stability, sharding and many-master replication. We’ve also looked at MongoDB and it would have worked but they hadn’t completed sharding yet (we actually tried it – didn’t just look at it – it was very fast and very stable). The logs are easy to read once you get used to the formatting of Erlang errors. It doesn’t get any simpler then this. We’ve picked it up, as a company, in under 3 months. Once again, there are good days, there are bad days, but overall it has been a fantastic choice for what we are doing.

Your article makes it sound like you need a million customers in order to make use of NoSQL, at which point you’ll be fine because you have nine months to spare to move your data over. What a bunch of silly, unfounded assumptions! The deciding factor on which technology to use should be based on the facts around the strengths and weaknesses of each type of database and how it will interact with your software. If you need clusters of distributed masters (large or small), NoSQL has tons of options – MySQL and Postrgres aren’t so great there but also have some benefits if you need guaranteed commits across the cluster. If you just want master/slave with lots of fast reads and more reads then writes, MySQL is a great choice and is adding features constantly for stability, sharding, etc. There are other points for deciding, too many to list here.

You are trying to make an argument that the industry itself will not support NoSQL easily because it is a new technology and talent is hard to find. But the companies you reference are also solving problems that, in general, are harder to solve then most people have achieved before. That’s the whole point of an emerging technology – solve new and old problems by rethinking how to do so. You’ve completely skipped over that concept and you’ve limited the scope of where solving new/old problems in the database world is applicable. Therefore, your conclusions are not based on apples to apples comparisons. You need to consider the scope of the problems these people are trying to solve AND their relevance to the abilities of each database technology.

If nothing else, the real issue here is that there used to be a few limited choices and now there are many (both SQL & NoSQL), and they all solve different problems so it’s harder to pick. THAT problem I can hear, but the rest of this article is full of FUD that is the exact reason we continue to have this NoSQL vs SQL argument.

Reply
- Bob Warfield said
  
  July 23, 2011 at 7:45 pm
  Darren, step aside from being a wound up emotional NoSQL fanboy for a minute and show me the FUD (I won’t argue about inflammatory as you are certainly inflamed):
  
  Are you arguing that NoSQL is just as mature as SQL?
  
  Are you arguing that the NoSQL ecosystem is full of peeps and tools you can easily hire relative to SQL?
  
  Are you arguing that the NoSQL tools are as mature and bug free as the SQL tools?
  
  Those are simple “yes”, “no”, or “I don’t know” questions. Those questions are not just FUD, they have a real bearing on project risk and are questions you should answer before committing yourself to the shiny new thing.
  
  RE the scale, issue, I don’t remember writing a million customers, but for the sake of clarity and dispelling the FUD, how many customers do you believe are necessary before you must abandon SQL as just too difficult? And how many companies get there in less than 9 months really? How many get there in their first year really?
  
  When you argue about all these problems that are so hard people haven’t solved them before, really? Now that sounds like the FUD to me if you’re telling me that what’s hard there is outside the scaling problem and it’s too hard to start out relational and switch later. Been there, done too much scaling and too many architecture switches to buy it.
  
  Darren, nowhere have I said “Never ever use NoSQL because it never makes any sense.” But you’re having a hard time formulating a concrete strategy for when it does make sense.
  
  My strategy is pretty simple, concrete, and practical: you will know when it makes sense because the problem will be knocking at your door from real not imagined customers, and when it does, you will still have time to get it done. Given that you’ve just said your startup picked it up in 3 months, seems like more justification to wait.
  
  I’ve been through the “we didn’t know it would work for that” walk too. Had to handle all of eBay’s auction volume at a time when they were holding weekly news conferences about being down. Did it on SQL Server and Windows after being told that was impossible and I would need Oracle and Unix which our little startup couldn’t afford. All those telling me it was impossible were sage folks with years of expertise scaling to those volumes. Whatever. Been there, done that, woo hoo! at more than one startup.
  
  Also did a company that went public because it was the only one among all the Enterprise companies including Oracle and SAP whose software would scale to handle more than a couple thousand sales people. We calculated the sales comp for the world’s largest companies. Allstate had 250K+ independent agents. Sprint had about 30K sales agents. Zillions of transactions and complex ever-changing business logic. It couldn’t be eventually consistent, it had to be right or those sales guys were showing up with torches and a hanging rope. All done on grid computing with relational–Cloud before Cloud existed.
  
  Here is a reality that Werner Vogels alludes to in his quote above: all these big scaling problems are hard, hard, hard. They humble you no matter what tool you choose to use. It isn’t the tool that going to save you.
  
  Or, if you prefer, “Git ‘er done”
  
  Cheers,
  
  BW
  
  Reply
  - tracker1 said
    
    July 23, 2011 at 9:27 pm
    As to NoSQL being as mature… If you mean mature as in baked, tested and reliable, it would depend on the system in question. Google has used it’s BigTable implementation rather successfully, and I would say that is a pretty big check in terms of “mature”.
    
    As to the tools… Honestly, I find the tooling for CouchDB’s web interface and the MongoDB console to be as, if not easier than a lot of the tooling for even Microsoft SQL server depending on what you are attempting to do. I only point out MS-SQL, because the developer tooling is imho second to none. Though Oracle does have some nice tooling, the ramp-up to expertise is far larger than coming into a NoSQL system completely new.
    
    Now to the availability of talent. The fact is that most developers have a hard time with relatively simple SQL statements, let alone going into more advanced topics. Many administrators are more system administrators, and have less knowledge of dealing with the programmatic results needed as a whole. I had worked on a project where there was data that was needed from more than one system. The developer from the database team designed their query to pull the data from the second system, and it performed so slowly, that the results in the end application (web browser based) would take 30 seconds to over 4 minutes. It was far faster to modify every write into the second system to clear out a cached record from memcached, and have the application request the bulk from the primary system, then each matching records information from memcached, falling back to the second database system than it was to get the coalesced data from the rdbms.
    
    Talent is very relative, very subjective and a rather poor point of argument to make. The fact is that technology changes, even in the database world, and times come where people have to learn how to do new things. If someone is un-capable, then they’ll be stuck in the ice ages along with a lot of Cobol and other mainframe programmers. Yeah, there’s still jobs out there, but that talent pool is shrinking. It’s more important to find people that can think, and learn than it is to have people with experience with (insert HR technology buzzword here).
  - Bob Warfield said
    
    July 24, 2011 at 4:24 pm
    Oh come on. Because it’s hard to hire database guys we’re going to just forget about the value of talent because we couldn’t get any anyway?
luigi said

July 23, 2011 at 10:28 pm
Who exactly is saying that everyone must embrace NoSQL immediately? Just one citation, please.

Many current users aren’t even using NoSQL technologies solely because of their claims to scale. I certainly don’t. I use MongoDB and Redis because they’re the right tools for the job. I’ve found that when storing complex objects, MongoDB is the natural fit. It turns out that JSON is a much better format for storing objects than relational tables. CouchDB and Riak are good solutions for that as well. I use Redis for storing key-value pairs that get updated frequently. It’s wonderful. Exactly what I need. No more, no less. Working with those datastores has been a pleasant surprise, and relational databases are no longer my default choice.

NoSQL technologies have caught on for very good reasons that aren’t easily brushed aside. It has nothing to do with premature optimization. It has a lot to do with developer happiness.

Reply
- Bob Warfield said
  
  July 24, 2011 at 4:22 pm
  Luigi, the citation this post is about is at the top of the post, kinda hard to miss. I don’t have a citation for your straw man, maybe you’d care to provide one?
  
  Reply
Darren Schreiber said

July 23, 2011 at 11:17 pm
Your response to my post is ironic. “All those telling me it was impossible were sage folks with years of expertise scaling to those volumes.”

And yet, here you are… the sage old folk, citing years of experience, telling the NoSQL basecamp in the most general terms possible that scaling and utilizing NoSQL software inexpensively is impossible to do cost effectively with what’s out there today.

Cat, kettle, black.

I think I gave a pretty concrete example already of where our current, active customers are asking for customizations. We’re able to give it to them, too. I have a document-oriented schema-less database to thank for that. That was after trying it for 2 years on MySQL where we weren’t able to make it work.

Reply
- Bob Warfield said
  
  July 24, 2011 at 4:02 pm
  Darren, it SHOULD be ironic. You are absolutely right that I have been there and done that, it was my point. I would have loved to have done the job with Oracle and Unix instead of Windoze and SQL Server. We reached a point of having to randomly rewrite queries 4 to 6 times because the query planner/optimizer was so unpredictable. It would have been a walk in the park to do it in Oracle by comparison as the experts suggested. But, we didn’t have the cash for the Oracle licenses.
  
  The difference here is both options are free, so cash is off the table as a consideration. You’re left with these other considerations.
  
  Now I don’t know why your MySQL solution couldn’t be made to work after 2 years. Maybe it is the golden example of when you MUST choose NoSQL. But I do know I’ve seen SQL do a heck of a lot of things without breaking too much sweat. It’d be great if someone would do a deep analysis of where the bodies are buried, but it has to be deep. It can’t boil down to “Oh, but it’s so hard to do Object Relational Mapping.” If that’s the answer, remember that the first wave of Object Oriented Databases tried and failed on that argument. Evidently it wasn’t too hard. Document storage? No, been there done that. That one I know is not too hard. Graph algorithms? Maybe. I do know they are a PITA on SQL to do anything very complex.
  
  Cheers,
  
  BW
  
  Reply
Dean Higginbotham said

July 23, 2011 at 11:44 pm
Thanks for the frank writeup. With new tech I look for:
1) Does it solve a real issue?
2) Does it have a big enough community?
3) Is it mature?

These are questions I ask myself, and answer to myself as honest as possible to save me pain down the line. I really try to ignore anything in the way of third party opinion (pro or con).

1) At least for me, NoSQL doesn’t solve enough. “Enough” to me means, the cost (time/money) to benefit ratio.
2) Right now, I’d say no. For basic things, yes. For edge things and deep things, no.
3) No. Too many stories of developers from big companies having to deal directly with the noSQL developers to figure things out and/or fix bugs – noSQL devs who, most likely, won’t, or can’t, give me (or other independents) the time of day.

I need to spend time working on my product. Not figuring my tools out.

Reply
NoSQL is What? | Jeremy Zawodny's blog said

July 24, 2011 at 4:55 am
[…] found myself reading NoSQL is a Premature Optimization a few minutes ago and threw up in my mouth a little. That article is so far off base that I’m […]

Reply
- Bob Warfield said
  
  July 24, 2011 at 4:20 pm
  Be sure to read Jeremy’s post 2 back wherein he says he had to give up learning Rails again (not sure how many other times he tried) because it was just too hard. If you are having a hard time learning Rails, object relational mapping may indeed be too hard for you. I don’t know if NoSQL can save your bacon or not, but you’re obviously running out of options at this point.
  
  Reply
Daniel Walters said

July 24, 2011 at 5:05 am
“The world loves to see things in black and white.”, “NoSQL is a Premature Optimization” – including you.

Reply
- Bob Warfield said
  
  July 24, 2011 at 4:18 pm
  Daniel, I’m stunned by the depth and beauty of your insightful analysis. Thank you for sharing it.
  
  Reply
Muhammad Nasrullah said

July 24, 2011 at 8:26 am
Hello Bob,

I’ve been on both sides of using NoSQL and I can conclude the following:

1. Saying NoSQL is the only solution or SQL is the only solution is simplistic. You need a mix of both technologies and one size does not fit all.

2. Many startup advisers will tell you to do it fast, fix it later and I’ve been an avid believer of that. However, the technical debt suffered due to short-term engineering decisions can be terribly high. Take Twitter for example, it took them _years_ to be more reliable; they just couldn’t scale. They keep lollygagging about shifting to Cassandra but the effort is too large for them to do so. As a result, new projects are in Cassandra but the main store is till MySQL. And as you can tell from the snail pace of innovation coming from Twitter, they already have too much on their plate and a shift from MySQL might not be enough user-visible to spend that much time away from such features.

3. Whether you should do NoSQL or not should largely depend upon your technical know-how. If you are good a t a particular tech, go for it. If you’re not, even if it is MySQL, stay away from it.

Reply
- Bob Warfield said
  
  July 24, 2011 at 4:17 pm
  Technical know how is an argument I make in the follow on post. If you have that cadre of NoSQL experts who know where the bodies are buried and can make an informed decision, go for it. If you don’t, you had better have a really good reason for taking on the additional risk. Otherwise you’re just kidding yourself.
  
  Reply
Philip John said

July 24, 2011 at 9:14 am
What about those of us using NoSQL for reasons other than performance and scaling? Where I work we use an RDF Triple Store (graph database) because it’s a better fit to our problem domain than the relational model.

It’s given us a lot of benefits – easy to merge new datasets in, highly normalised, ability to run multiple versions of our schema in the same store, powerful graph query language (SPARQL)…the list just goes on and on.

Reply
- Bob Warfield said
  
  July 24, 2011 at 4:16 pm
  Philip, it could be that some novel data store requirement is the right driving force when choosing NoSQL. But, there’d have to be a more formal analysis with some deep examples that pinned down exactly what sorts of domains these were.
  
  Reply
Mike Lopez said

July 24, 2011 at 9:33 am
This by far is the geekiest article I’ve read in months. It ain’t boring and a ton helpful. As the lead developer of the start-up WishList Products, I do agree with the points mentioned in this article. You don’t need a mansion for a family of 4 and in the world of coding where vanity (i.e. mansion) is not as important as logical needs, I would say that starting easy and simple using already proven technologies. You can only optimize so much and based on my experience, I’d say that you can never really optimize for things that you still don’t know.

Migration is a thing that we software engineers want to avoid but it’s also the inevitable. We will have to migrate to better technologies when the need arises.

Reply
James Phillips said

July 24, 2011 at 1:45 pm
I think the most interesting thing about the response to this blog posting (both here in the comments and in response blog postings) is that users – those who actually have a business to run, revenue targets to hit and limited resources to employ – are the ones so violently reacting. We don’t have vendors and pundits arguing about the theoretical or academic merits of one approach over another. That, to me, is the best sign that real problems are being solved, and that those solutions required a new approach at the data layer.

Reply
- Bob Warfield said
  
  July 24, 2011 at 4:10 pm
  James, first, not so many of those violent reactions are traceable to businesses saved by NoSQL as you suggest. It’s very hard to separate the zealotry from the practical on those. Second, who among the vendors would respond? The SQL guys aren’t feeling threatened. The NoSQL guys wish it would go away. Stonebreaker already didn’t do so well with his Facebook bashing.
  
  I think it is also interesting that I’ve come across almost no responses to the scaling aspect, which is where Stonebreaker’s argument was and what motivated the post. The gist of the pushback is from folks who say their data is different enough that it’s too awkward for SQL. Gotta take that as a ringing endorsement even if those delivering the message see it as a rebuttal. Heck, maybe your data IS different. At least go through the exercise of really breaking down why and what it means. I will say that so far I’ve built big systems around SQL that were genetic algorithm based software testing, deep textual data mining at high volumes (Big Data, and unstructured at that), finicky rule-based business logic at large volumes, and social networking. None of those springs to mind as the row and column payroll and accounting app so many want to see SQL as suited for.
  
  Cheers,
  
  BW
  
  Reply
NoSQL is a Premature Optimization | Large Data Matters said

July 31, 2011 at 1:52 pm
[…] full post here This entry was posted in General. Bookmark the permalink. ← LexisNexis open sources […]

Reply
NoSQL is a Premature Optimization | Brent Sordyl's blog said

July 31, 2011 at 8:22 pm
[…] Story: NoSQL is a Premature Optimization) database, nosql, scalability database, nosql, scalability A VC: 30/10/10 Rule Jenx […]

Reply
There Will Be No Files, No PC’s, No SQL, No Datacenters, and No Money Soon « SmoothSpan Blog said

August 10, 2011 at 2:04 am
[…] Posts NoSQL is a Premature OptimizationMemcached: When You Absolutely Positively Have to Get It To Scale the Next DayThe 7 Kinds of […]

Reply
There Will Be No Files, No PC’s, No SQL, No Datacenters, and No Money Soon said

August 11, 2011 at 1:23 am
[…] as adept at inflating bubbles and debt that manufacture more money seemingly from nowhere. The NoSQL gang would like to eliminate SQL for most applications, and if that’s too scary, the NewSQL gang would still like you to give up your Old School […]

Reply
Scalability: Links, News and Resources (1) « Angel “Java” Lopez on Blog said

August 11, 2011 at 10:20 am
[…] NoSQL is a Premature Optimization « SmoothSpan Blog http://smoothspan.wordpress.com/2011/07/22/nosql-is-a-premature-optimization/ […]

Reply
Techmemed: NoSQL Is a Premature Optimization | Nosql online said

September 8, 2011 at 6:51 am
[…] Techmemed: NoSQL Is a Premature Optimization: […]

Reply
Thought this was cool: Techmemed: NoSQL Is a Premature Optimization | Lisheng Yu said

September 8, 2011 at 6:57 am
[…] Techmemed: NoSQL Is a Premature Optimization: […]

Reply
7 osi layers said

October 19, 2011 at 2:57 am
7 osi layers…

[…]NoSQL is a Premature Optimization « SmoothSpan Blog[…]…

Reply
Big Data = BI + ADD « SmoothSpan Blog said

April 26, 2012 at 4:50 am
[…] Private and Public Clouds: Oh My!Tipping Points, Personality Styles, and Email: A-List or Spam?NoSQL is a Premature Optimization70% of the Software You Build is Wasted (Part 1 of Series of Tool/Platform Rants)What Are Your […]

Reply
Big Data = BI + ADD | WikiCloud said

July 28, 2012 at 3:59 pm
[…] Deficit Disorder? That’s gotta be linkbait of an order I’ve not used since my NoSQL is a Premature Optimization post. What’s up with […]

Reply
Is NoSQL A Premature Optimization? | Dennis Forbes…Professional said

October 2, 2012 at 3:12 am
[…] readers have forwarded an article holding NoSQL as a premature optimization. I assume because I’ve written about NoSQL before (1, 2, 3, 4, 5, among others), with many […]

Reply
Big Data is a Small Market Compared to Suburban Data « SmoothSpan Blog said

February 2, 2013 at 7:16 pm
[…] NoSQL is a Premature Optimization […]

Reply
Big Data is a Small Market Compared to Suburban Data : Enterprise Irregulars said

February 4, 2013 at 5:34 pm
[…] skeptical about Big Data for a variety of reasons. As I’ve noted before, it seems to be a premature optimization for most companies. That post angered the Digerati who are quite taken with their NoSQL shiny […]

Reply
rogerdpack said

October 22, 2014 at 3:39 pm
I guess I would just note that it created a “nightmare” for facebook–normal startups probably survive using either way, possibly choose the one that feels a better fit based on the other.

Reply

	Camels to Cars, Arti… on A Picture of the Multicore Cri…
	LinkedIn shuts down… on Get Ready to Give Up on Linked…
	LinkedIn shuts down… on Get Ready to Give Up on Linked…
	Start With an Audien… on The Very First Thing a Foundin…
	Breaking through the… on Reflections on Six Years of Co…

SmoothSpan Blog

For Executives, Entrepreneurs, and other Digerati who need to know about SaaS and Web 2.0.

Blog Tools

Tags

Archives

Recent Comments

Pages

Top Posts

Recent Posts

Meta

NoSQL is a Premature Optimization

Share this:

Like this:

47 Responses to “NoSQL is a Premature Optimization”

adrian cockcroft said

Subraya Mallya said

Bob Warfield said

Subraya Mallya said

Dwight Merriman said

Bob Warfield said

Wes Felter said

J Chris Anderson said

jeduden said

NoSQL Is A Premature Optimization said

Theo said

Yorick said

johnfx said

Darren Schreiber said

Bob Warfield said

tracker1 said

Bob Warfield said

luigi said

Bob Warfield said

Darren Schreiber said

Bob Warfield said

Dean Higginbotham said

NoSQL is What? | Jeremy Zawodny's blog said

Bob Warfield said

Daniel Walters said

Bob Warfield said

Muhammad Nasrullah said

Bob Warfield said

Philip John said

Bob Warfield said

Mike Lopez said

James Phillips said

Bob Warfield said

NoSQL is a Premature Optimization | Large Data Matters said

NoSQL is a Premature Optimization | Brent Sordyl's blog said

There Will Be No Files, No PC’s, No SQL, No Datacenters, and No Money Soon « SmoothSpan Blog said

There Will Be No Files, No PC’s, No SQL, No Datacenters, and No Money Soon said

Scalability: Links, News and Resources (1) « Angel “Java” Lopez on Blog said

Techmemed: NoSQL Is a Premature Optimization | Nosql online said

Thought this was cool: Techmemed: NoSQL Is a Premature Optimization | Lisheng Yu said

7 osi layers said

Big Data = BI + ADD « SmoothSpan Blog said

Big Data = BI + ADD | WikiCloud said

Is NoSQL A Premature Optimization? | Dennis Forbes…Professional said

Big Data is a Small Market Compared to Suburban Data « SmoothSpan Blog said

Big Data is a Small Market Compared to Suburban Data : Enterprise Irregulars said

rogerdpack said

Leave a ReplyCancel reply

Discover more from SmoothSpan Blog