Let me start by saying that the ideas discussed here are my own, and not necessarily that of my employer (EMC). This is my own personal blog.
A great article by Andrew Oliver has been doing the rounds called “Never ever do this to Hadoop”. Andrew argues that the best architecture for Hadoop is not external shared storage, but rather direct attached storage (DAS). The article can be found here: http://www.infoworld.com/article/2609694/application-development/never–ever-do-this-to-hadoop.html
I want to present a counter argument to this. Now having seen what a lot of companies are doing in this space, let me just say that Andrew’s ideas are spot on, but only applicable to traditional SAN and NAS platforms. There is a new next generation storage architecture that is taking the Hadoop world by storm (pardon the pun!). EMC has done something very different which is to embed the Hadoop filsyetem (HDFS) into the Isilon platform. This approach changes every part of the Hadoop design equation.