An important development in the arena of big data analytics occurred recently when Amazon Web Services (AWS) offered its Redshift API as a cloud service to a limited number of users. In direct competition with some of the big database vendors, Amazon Redshift can easily manage a 2 TB database, and even far more than that when using high storage extra-large nodes, although the cost for those extra-large nodes is significant.
Here are some of the pros and cons of using Amazon Redshift for cloud computing and big data support.
Advantages
Redshift does have the capability to support huge databases, thus eliminating the need for a company to undertake a large and costly procurement of hardware and database management software. Redshift is also scalable, even far beyond its stated specifications, so that database growth would be no problem – an important fact since database growth is a near certainty for almost every installation. Redshift was literally built for analytics, and it therefore is lightning fast at performing mathematical aggregations and stored functions such as AVG, SUM, and COUNT.
Amazon Redshift is compatible with a good number of other resources that will contribute to fast data fetching and return, so that data storage isn’t the only function it manages well. Being able to access big data quickly makes Redshift very appealing.
The bottom line on Redshift is that, over a number of years, the return on investment for purchasing the service is very likely to be considerable, as opposed to a similar initial investment to purchase your own hardware and software.
Disadvantages
The specter of downtime and outages presents a real threat to the image of cloud computing, since any kind of outage would be public and quickly known throughout computing circles. Also, there can be considerable costs associated with the migration and integration of your data. For instance, you will probably need a tremendous amount of bandwidth to transmit data from your existing database to the Redshift cloud, or you’ll have to use some far less efficient method like a physical transfer of USB drives.
Since public cloud-hosted data is fairly new territory, there is no real handbook for how it should be administered, so there are quite likely to be some missteps along the way. There is also the possibility of higher cost associated with Amazon Redshift cloud support initially, even though most corporations should find that there is a savings over time. This will be especially true if you make use of some of the more exotic configurations of Redshift which call for the extra-large nodes (these can cost 5 to 10 times more than the basic configuration).
The Crystal Ball on Redshift
The likelihood is that Amazon Redshift will succeed in a big way because at its core it is a tremendous product that offers a very useful and necessary service for corporations who routinely deal with big data. Assuming any early glitches can be quickly smoothed over and rectified, and also assuming that costs remain competitive compared to the big database vendors, there is simply no reason to believe that the entry of Amazon Redshift into the big data market will be anything other than hugely successful.