Platform for deploying web applications - IS MUNI

65
MASARYKOVA UNIVERZITA FAKULTA INFORMATIKY Platform for deploying web applications MASTER THESIS Bc. Marek Jelen Brno, fall 2011

Transcript of Platform for deploying web applications - IS MUNI

MASARYKOVA UNIVERZITAFAKULTA INFORMATIKY

}w���������� ������������� !"#$%&'()+,-./012345<yA|Platform for deploying web

applications

MASTER THESIS

Bc. Marek Jelen

Brno, fall 2011

Declaration

Hereby I declare, that this paper is my original authorial work, which I haveworked out by my own. All sources, references and literature used or excerptedduring elaboration of this work are properly cited and listed in complete refer-ence to the due source.

Bc. Marek Jelen

Advisor: doc. RNDr. Tomas Pitner, Ph.D.

ii

Acknowledgement

I would like to thank my supervisor and all those who helped me to make thishappen.

iii

Abstract

This theses focuses on deployment of web applications. Web applications arebecoming more important with advances in cloud computing and with the ap-proach of HTML5 standard. With the rise of web applications, the importance ofdeployment technologies rises. This thesis describes the deployment process ofweb applications and the cloud computing phenomenon regarding the deploy-ment process of web applications. As part of the theses a project for deployingweb applications is developed to provide open-source platform for web applica-tion deployment.

iv

Keywords

cloud computing, PaaS, IaaS, web applications, deployment, service, Ruby, Java,JavaScript, Linux

v

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Cloud computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Finding the roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Infrastructure as a Service . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3.1 Amazon Elastic Compute Cloud . . . . . . . . . . . . . . . . 82.3.2 Rackspace cloud . . . . . . . . . . . . . . . . . . . . . . . . . 92.3.3 Joyent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4 Platform as a Service . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4.1 Heroku . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.4.2 Google App Engine . . . . . . . . . . . . . . . . . . . . . . . 112.4.3 Microsoft Azure . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5 Software as s Service . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.5.1 Office Web Apps . . . . . . . . . . . . . . . . . . . . . . . . . 122.5.2 Google Apps . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 The cloudy business . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.1 Web application development process and deployment . . . . . . . 14

3.1.1 Bare metal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.1.2 Shared hosting . . . . . . . . . . . . . . . . . . . . . . . . . . 173.1.3 Virtual servers & dedicated hosting . . . . . . . . . . . . . . 173.1.4 Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2 Platform adoption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2.1 Management . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2.2 Developers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 Making the clouds wild . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.1 Wildcloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2 Required features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.2.1 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2.2 Resource allocation . . . . . . . . . . . . . . . . . . . . . . . . 224.2.3 Building the application and central repository . . . . . . . . 224.2.4 Thin provisioning . . . . . . . . . . . . . . . . . . . . . . . . . 224.2.5 Extensibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.2.6 Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.2.7 File system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.3 Comparison with existing solutions . . . . . . . . . . . . . . . . . . 234.3.1 CloudFoundry . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1

4.3.1.1 Tight coupling . . . . . . . . . . . . . . . . . . . . . 244.3.1.2 Deployment . . . . . . . . . . . . . . . . . . . . . . 244.3.1.3 Message queuing . . . . . . . . . . . . . . . . . . . 244.3.1.4 Isolation . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.3.2 Nodejitsu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.3.2.1 Node.js . . . . . . . . . . . . . . . . . . . . . . . . . 254.3.2.2 Isolation . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.3.3 OpenShift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.4 Applied technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.4.1 Operating system . . . . . . . . . . . . . . . . . . . . . . . . . 264.4.2 Application security and isolation . . . . . . . . . . . . . . . 264.4.3 Loose coupling . . . . . . . . . . . . . . . . . . . . . . . . . . 274.4.4 Thin provisioning . . . . . . . . . . . . . . . . . . . . . . . . . 284.4.5 Data storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.4.6 Cloud communication . . . . . . . . . . . . . . . . . . . . . . 294.4.7 External services . . . . . . . . . . . . . . . . . . . . . . . . . 294.4.8 HTTP routing . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.4.9 Session store . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.5 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.5.1 Basic communication . . . . . . . . . . . . . . . . . . . . . . . 304.5.2 Core components . . . . . . . . . . . . . . . . . . . . . . . . . 314.5.3 Components seen by applications . . . . . . . . . . . . . . . 354.5.4 Routing of HTTP requests . . . . . . . . . . . . . . . . . . . . 37

4.6 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.6.1 Functional testing . . . . . . . . . . . . . . . . . . . . . . . . . 404.6.2 Load testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5 Product evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.1 Case-study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.1.1 Company . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.1.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.1.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.1.4 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.1.5 Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.2 From customer’s point of view . . . . . . . . . . . . . . . . . . . . . 445.2.1 Welcome screens . . . . . . . . . . . . . . . . . . . . . . . . . 455.2.2 Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.2.3 Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.2.4 SSH keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.2.5 Repositories . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2

5.2.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.2.7 Router . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3

1 Introduction

When at the end of the year 1990 1 Tim Berners-Lee was proposing his new project“WorldWideWeb”, I believe, he had no idea, what will WildWideWeb become.In those 11 years the mankind created something, that has never been createdin the known history. An incredible library of knowledge available to everyoneconnected to the network. With the advances in text recognition and search algo-rithms, we have possibility to scan through billions of documents, books, articleswithin seconds.

And man is going even further. We are building social networks and services,to allow people to communicate what they want to say to those, who want to lis-ten and even to those, who do not. We are tearing down the walls among people,countries, continents and the freedom of speech has advanced so much, that weare even afraid of it. Services like Twitter2 or Facebook 3 scale to billions of activeusers and unimaginable amounts of data. People are loosing their fear of livingin the virtual environment and are embracing the impossible - being anytime ev-erywhere.

All these advances in the domain of social or community networking are pos-sible thanks to a new phenomenon called “cloud computing”. Cloud computingis the new “buzz word”, that we can hear from all directions all over the world.At the beginning the cloud computing was the domain of agile start-up compa-nies, that needed to scale their products to vast number of clients, but did nothave enough money to invest into their own and expensive hardware. As thetime comes, the cloud computing is adopted even in large enterprises and theterm “cloud” is becoming the new formula, everyone want to use to magicallysucceed with their product.

The cloud computing brought new ideas into the area of server infrastruc-tures. The most important one is the shift of paradigms from “we have to buildcomputer that will never fail” into more reasonable and practical view “we haveto build an infrastructure that will survive the crash of a computer”. Cloud com-puting is about building an infrastructure that even on commodity hardware willmaintain high availability and performance.

The main subject of this thesis is to analyse requirements on deploying mod-

1. http://en.wikipedia.org/wiki/World Wide Web2. https://twitter.com/3. http://www.facebook.com/

4

1. INTRODUCTION

ern web applications and implement platform that simplifies such process. To ful-fil the subject different areas and views on cloud computing will be explored andthe most important cloud computing services regarding the topic of this thesisdescribed. In the first chapter, the term “cloud computing” will be discussed. It isalways important to lay the ground by defining the terms. It is even more impor-tant in the domain of cloud computing, because of it’s rapid pace of expansionand development. In the second chapter, with the terms defined, the processesfrom the point of a clients and providers will be discussed to analyse the needs ofcustomers. In the third chapter existing solutions and their advantages and dis-advantages are discussed. Also the reasoning why to start new project is beingpresented.

The following chapters will describe the new platform for deploying web ap-plications in the “cloudy way”. The advantages and disadvantages of this so-lution, as well as technical aspects and requirements to set-up the platform arepresented. The last chapter will provide performance figures as well as results offunctional and load testing.

5

2 Cloud computing

2.1 History

The cloud computing comes after the “dot-com bubble” 1. Amazon as one of thesurvivors of the bubble measured that only 10% of their computing capacity isused. The rest of their computing power was there to cover spikes in the visi-tors count to accommodate the necessary computing power to serve the HTTP re-quests. With the advances of virtualization, Amazon created cloud infrastructurethat allowed to significantly reduce the costs of their infrastructure and labourrequired to operate it. Learned this lesson Amazon decided to create public ser-vice, that allows clients to rent computational power for their own purposes. Thisproduct was presented for the first time in 2006.

More companies followed and new projects were started, proprietary andopen source. Among the pioneers in this area were projects Eucalyptus2 and Open-Nebula 3. Later a project called OpenStack 4 was found by RackSpace5.

2.2 Finding the roots

There is no simple and universal answer to the question what cloud computingreally is. It is important to note, that different people or experts do define cloudcomputing differently. In different domains the term may mean something com-pletely different. For example taking 21 experts to define the term cloud comput-ing yields 21 different definitions [19]6.

From the perspective of this thesis, the definition provided by Wikipedia7 [73]seems to be one of the most precise, but let me provide my own definition that isbased on those mentioned.

“providing computation as a service over a network”

1. http://en.wikipedia.org/wiki/Dot-com bubble2. http://www.eucalyptus.com/3. http://www.opennebula.org/4. http://openstack.org/5. http://www.rackspace.com/6. http://cloudcomputing.sys-con.com/node/6123757. http://en.wikipedia.org/wiki/Cloud computing

6

2. CLOUD COMPUTING

Computation, in this definition, is any computer-related service that can beprovided remotely and basically can be divided into three layers, that build oneach other from bottom up.

Layer AbbreviationSoftware as a Service SaaSPlatform as a Service PaaSInfrastructure as a Service IaaS

To illustrate what each of these layers represents, let’s apply them to a realservice. Dropbox8 is solution to allow users backup data to a remote location andalso to replicate these data among many computers.

There is a client software that user installs to a computer and it communicateswith a cloud software. Software is the service itself. Providing utility to the endusers of the service. In Dropbox it is the web application that provides client-lessaccess to the data and also and API for the client-software communication. Thedata itself is stored in some sort of object store or database that is responsiblefor storing the data and also meta-data associated with it. This is represented bythe platform layer. It provides utility to the service builder. The data have tobe written to a block device or some kind of memory. The infrastructurelayer represents this service, providing virtual devices on top of real hardware.This layer provides utility to a system administrator that provides a platform.Depending on a situation the utility can be provided to a single entity or manyentities can provide the utility to each other. At the bottom are the servers itself,the real hardware a service provider maintains.

Having a basic knowledge of what these layers represent, let’s take a look ateach of them in turn, from the most basic one to the most complex, and discussmore details regarding it’s functionality.

2.3 Infrastructure as a Service

The main task for a Infrastructure as a Service is to create a scalable environmenton top of a real hardware that can survive a crash of a single physical node. In

8. https://www.dropbox.com/

7

2. CLOUD COMPUTING

most cases the infrastructure is build using a virtualization solution. However inrecent months there are providers9 that provide also bare-metal (unvirtualized)infrastructures.

One important aspect of Infrastructure as a Service is that client pays only forthe time the system is running. For example having an application that quadru-ples the visitors count only for two hours a day. With IaaS provider can buy extranodes to accommodate the spike and pay for those two hours only instead ofbuying the hardware and paying for the housing even when those servers arenot used.

2.3.1 Amazon Elastic Compute Cloud

Amazon Elastic Compute Cloud (Amazon EC2) was the first service that pioneeredthe cloud ecosystem. Founded in the year 2006 Amazon realized that they owncomplex infrastructure to handle occasional spikes in traffic but most of the time,the computing power was unused. The company decided to use the advances invirtualization to consolidate their servers. From this endeavour the service wasstarted. The platform utilizes Xen10 hypervisor and allows the client to use widespectrum of operating systems. The user has administrator access to the systemand is free to modify most aspects of the system. Every instance is assigned spe-cial domain name. The domain name may resolve to different IP addresses. Hav-ing a static IP address is a paid feature.

The platform offers many services. Most notable is Elastic Block Store. EBSis replicated block storage that can be attached to virtual machines to providesecure and persistent storage with POSIX characteristics. The virtual device canbe formatted with file system of choice and mounted as regular block device.

Many service use Amazon EC2 as their infrastructure of choice, because ofacceptable pricing and high fault tolerance. The software as a service mentionedin the previous text - Dropbox.com - is build using Amazon EC2 infrastructure.But also platforms as a service are utilizing Amazon EC2 in their architectures, inexample Heroku and EngineYard mentioned in the next sections.

9. http://www.newservers.com/language/en/10. http://xen.org

8

2. CLOUD COMPUTING

2.3.2 Rackspace cloud

Rackspace company is one of the largest11 [45] server hosting providers in theworld to the number of running servers. In 2006 the company created new brandMosso to start providing virtual server by utilizing their own data-centres. Laterthe brand Mosso was changed to Rackspace Cloud.

Rackspace cloud uses Xen as it’s virtualization platform and provides similarservices like Amazon EC2, but with different product names. On top of Rackspacecloud is build another product Cloud sites, that offers PaaS. Some of the internalsystems of Rackspace cloud were open-sourced in a project called OpenStack.

2.3.3 Joyent

Joyent started as a SaaS provider and evolved to a IaaS provider. Their infrastruc-ture was also build using Xen. However when Sun microsystems12 open-sourcedSolaris system, the company started building on top of that. Now, the companyuses Solaris13 from the Illumos project14 as a host operating system and KVM 15

as a hyper-visor in their own package called SmartOS16. Joyent is big advocateof open-source software and sponsors development of many projects. SmartOSis their operating system of use and it’s provided as an open product. They alsoemploy the creator of Node.js project.

2.4 Platform as a Service

Having a fail-proof infrastructure is not enough to create a scalable service. Whena service outgrows a single machine (bare-metal or virtual), there is the need fora platform to orchestrate the execution of the service among multiple nodes, ma-chines, to allow the service to scale as easy as adding new nodes to the cluster.There are two points of view on such solutions depending on who provides theplatform.

11. http://www.datacenterknowledge.com/archives/2009/05/14/whos-got-the-most-web-servers/12. http://en.wikipedia.org/wiki/Sun Microsystems13. http://www.oracle.com/us/products/servers-storage/solaris/solaris11/overview/index.html14. https://www.illumos.org/15. http://www.linux-kvm.org/page/Main Page16. http://smartos.org/

9

2. CLOUD COMPUTING

For the most demanding services it is vital to be build directly on an infras-tructure level and the platform and cluster orchestration integrate directly intothe service, but that makes the development costs of the service much higher. Formany services it is simpler and more cost-effective to be build on top of a plat-form that handles the orchestration itself and the service is only focused on it’sprimary business.

2.4.1 Heroku

The pioneer of such a platform is Heroku17 (founded in 2007) that started as aplatform for deploying Ruby and Ruby on Rails applications in a scalable envi-ronment. Later bought18 by Salesforce19 for $212 million. Following the transac-tion Heroku announced support for many different stacks - Node.js, Python, Java,Scala, Clojure.

Heroku uses Git20 based deployment. Every service running the platform hasit’s own git repository associated and every time there is a git-push21 to the repos-itory a special hook is executed - it takes the latest revision from the repositoryand exports it to a virtual environment. In the environment a build is run andthe resulting application is tested to start properly. Being the test successful, theresult of this build is saved for later use. The resulting package is called a ’slug’and represents everything the application needs to be run.

The slug may contain anything, but it is restricted to 100MB of size. Depend-ing on a plan the customer is paying for, the slug is distributed to the actual nodesof the Heroku cluster and started. This way the application is paralleled on mul-tiple nodes. All the nodes have access to the same data-store, Heroku providesPostgreSQL22 database and there need to be shared session store, because fromthe same client a request may come to different nodes.

To ensure stability and security of applications in the cluster, there is virtualenvironment that separates them from each other. In the first version of the plat-form chroot23 was used. Chroot is very simple to implement and does not need

17. http://www.heroku.com/18. http://techcrunch.com/2010/12/08/breaking-salesforce-buys-heroku-for-212-million-in-cash/19. http://www.salesforce.com20. http://git-scm.com/21. http://gitref.org/remotes/#push22. http://www.postgresql.org/23. http://linux.die.net/man/1/chroot

10

2. CLOUD COMPUTING

any external software except unix-like operating system. Chroot, however, doesnot support resource limitation. Badly written application can allocate too muchmemory or CPU cycles, and other applications will starve.

To add resource limitation aspect to the platform, Heroku adopted technologycalled c-groups24. C-groups allows separating processes into isolated groups(there can be one or more process in a group) and limit what resources each groupis allowed to use. The platform than can limit different aspects of resource alloca-tion - memory, cpu used by the process, cpu scheduling, IO operations.

2.4.2 Google App Engine

Google App Engine25 is PaaS service by Google26. Google App Engine (GAE) isbased on Google’s proprietary technologies and provides scalable environmentfor deploying web applications in Python and Java programming languages. Theplatform provides many services including the possibility of running tasks onbackground to free web applications of managing long running task. Integral partof the platform is also XMPP protocol infrastructure.

2.4.3 Microsoft Azure

Microsoft also entered the market of cloud computing with a product called Mi-crosoft Azure27. Depending on the point of view, Microsoft’s platform has advan-tage and disadvantage in the form of reliance on their own Microsoft Windowsoperating system. Most technologies regarding development of web applicationshave roots in the Unix-like operating systems and in the process of porting thesoftware to Windows operating system developers are often forced to disablefeatures already present in the software. As part of the platform all basic set oftechnologies is provided - SQL database, Key-Value store, blob store, load bal-ancing infrastructure, running environment.

24. http://www.kernel.org/doc/Documentation/cgroups/25. http://code.google.com/intl/cs/appengine/26. http://www.google.com/27. http://www.microsoft.com/windowsazure/

11

2. CLOUD COMPUTING

2.5 Software as s Service

Both levels of cloud computing already described are related to the developmentprocess of the service and does not directly touches the end-user. Software as aService builds on the previous levels and provides a value to the customer. Theprovider has already an infrastructure to deploy the service to and platform tomake the applications robust, fail-proof and scalable. In the last step the providerdelivers some value the customer.

The main paradigm of SaaS is that the customer gets thin client that is ba-sically only responsible for displaying results, data, information and does verybasic operations on the data. The main processing system is part of the serviceand the client makes requests to get some results and these results are deliveredfrom the service. There is vast number of services, that might be considered SaaS.

2.5.1 Office Web Apps

Office Web Apps is Microsoft’s response to the Google Apps ecosystem and it’s in-creasing popularity. This service is neither big player nor pioneer in this area,however it’s still mentioned as the first one in this theses. The reasoning to makethe Office Web Apps first is, that this service builds on Outlook Web Access, a ser-vice originally part of Microsoft Exchange solution. Even though it is not a wellknow fact, XMLHttpRequest28 the main technology that allowed the “Web2.0”era was created by developers of the Outlook Web Access.

Office Web Apps is a web application, that allows creating and editing docu-ment using a supported web browser. It’s integrated into standalone Office suite,that allows uploading and downloading documents from and to this service. Of-fice Web Apps have equivalent of Word, Excel, PowerPoint and OneNote. Outlookis provided in the form of Outlook Web Access as part of Microsoft Exchange oras a HotMail service to the general audience.

2.5.2 Google Apps

Google Apps29 is the pioneer of Software as a Service. Starting with Gmail in theyear 2006, Google broadens the set of tools provided as part of this service. Nowa-

28. http://msdn.microsoft.com/en-us/library/ms759148(VS.85).aspx29. http://www.google.com/apps/intl/en/business/index.html

12

2. CLOUD COMPUTING

days the services include e-mail30, documents31, text, audio and video chat32, so-cial network33, image sharing34 and many others.

Google Apps use the same infrastructure as other Google services and evenGoogle App Engine customers. Google Apps is a clear example of Software as aService, because user does not have to install nothing into the operating system,and needs only web browser, that is part of most desktop operating systems, tostart using the service.

30. http://www.gmail.com31. http://docs.google.com/32. http://www.google.com/talk/33. http://plus.google.com34. http://picasaweb.google.com

13

3 The cloudy business

Knowing what cloud computing means is not enough. It is important to knowhow to implement the cloud products into the business itself. Regarding the topicof this theses, in this chapter discusses what benefits and pitfalls the businessgains from using cloud related technologies. The text focuses on web applicationsdeployment.

Before getting to the technologies itself, it is necessary to describe the processof development of web applications and what role in the process deployment has.

3.1 Web application development process and deployment

The process of developing web applications is complex area of expertise. Firstof all it is necessary to know what application is being developed. It is simplerwith in-house product, but more difficult with development of public services orcustom-build systems for customers. With public service development there hasto be market and demand analysis done. With development for customers, theactual description and specification of the product communicated. Then the userinterface design and user experience design is created. Such a process includesmany different professions and experts.

From the management point of view it is important to create schedules andbudgets. To hire experts in all involved domains. To choose project managementmethodology. Whether to use time-proven waterfall model or go with more mod-ern agile approach.

Depending on all of these factors the actual development process will differ.However across all those processes, there is one, that covers the development ofall web applications. As it can be seen on chart 3.1. It consists of four phases in acycle. The cycle is an important aspect of web application development. Web ap-plications are not developed as standalone product, that are created, delivered,maintained and forgot. Web applications are delivered as services and as servicesthey have tendency to improve in time in iterations, rather than in separated de-velopment processes.

The first step is Development of the product. This step includes the planningand design phases found in some of the methodologies and the implementation

14

3. THE CLOUDY BUSINESS

Testing

Development

Deployment

Maintenance

Figure 3.1: Common web development process across methodologies

phase that is part of all methodologies. During this step the team creates somedeliverable product. In case of the waterfall model, this product would be thewhole product, in contrast to agile methodologies, where the product would besmall part of the system, prepared to be presented to the customer for feedback.

The Development phase is done in-house at the development company anddoes not directly concern the deployment platform. I many cases it is not knownbeforehand, in what environment the application will run. However, knowingthe environment before allows the developers simplify the software and tailorthe product for the environment.

The Testing phase comes after the Development phase. Testing phase can befound in all methodologies utilizing different names, tools or processes. The mainand only purpose is to ensure that the product is containing as least problems andissues as possible.

Testing phase is tightly connected with the Development and Deploymentphases. From some points of view the Development and Testing phases can beseen as a cycle itself. In the classical point of view, the product is developed and

15

3. THE CLOUDY BUSINESS

then pushed to testing. Testing itself may involve some interventions to fix foundissues, but these interventions should be minimal and targeted only to fix theissue, in an already developed product.

To test the product thoroughly, it is required to know, in what environmentthe application will run or test across many different environment. Staging en-vironments are used to simulate the environment the product will be runningin. These environment should be as close to the real environments as possible.Having an engaged customer and accessible staging environment, it is possibleto draw the customer into the development process and discover issues inflictedfrom misunderstandings in specification of the product.

In the Deployment phase, the product is delivered to the customer. Depend-ing on the product, this may be one step or many steps, however the target of thephase is the same - the product is available to the customer.

The last phase of the process is Maintenance. This step may be provided bythe developer itself, by the client, by third party vendor or a combination of thosementioned. Using the product in production means, that it is crucial to ensure it’savailability and that found issues are resolved.

On the other hand as the business grows, the customer may have require-ments that were not known in the time of creating the product. Such requirementsshould be also resolved for the customer. That way the circle is closed. Addingnew features or making more complex changes in the product requires the fullcircle from Development, over Testing to Deployment and back to Maintenance.

In case of agile methodologies, the circle will be repeated many times as theiterations will go. In the waterfall methodology, the circle will be repeated oncefor one version of the product. As said previously, the actual implementation ofthe steps depends on the methodology chosen for managing the project develop-ment.

Concerning this thesis the Deployment phase is the most important one. Tounderstand what benefits the product of the theses provides, it is important todiscuss, what choices the deployer has to deploy a product.

3.1.1 Bare metal

The bare-metal method is the oldest and most complex one. The deployer buysreal hardware, that will be connected to the network and managed by a serverhousing company. This method has an advantage in the knowing exactly what

16

3. THE CLOUDY BUSINESS

hardware, what operating system and what software the environment has and inthe ability to modify and tune all the aspects of the environment.

It is possible to fully ensure the Quality of Service, because the deployer hasfull control over all aspects of the whole environment the product is running.

3.1.2 Shared hosting

Shared hosting is very popular method among PHP developers. Shared hostingcompanies provide FTP or SCP access to the system the application is running on.The deployer only uploads the data to the system and the application is running.This way of deploying is possible with technologies that does not utilize virtualmachines to run the product. With technologies running using virtual machinesit is more difficult, because of the requirement to restart the virtual machine toreload the code of the product.

In a shared environment it is difficult to separate running applications andmoreover the deployer has no way to affect it. When someone else deploys badlywritten application, the other applications may starve for resources. This way itis very difficult ensure any kind of Quality of Service.

3.1.3 Virtual servers & dedicated hosting

When setting up bare-metal deployment requires too high investments and thecustomer can not afford it, however there has to be some Quality of Service, vir-tual servers or dedicated hosting may be the answer.

In the case of dedicated hosting the whole environment for deployment isprepared by the provider and the deployer has just to upload the product intothe system. With virtual servers, there is usually only basic system provided bythe service and the deployer has to set up the environment itself.

Virtual server has advantage that are accounted by the amount of used re-sources. If the product is not required to run 24 hours, 7 days a week, for the timeit is not used, the system may be shut down and started only when needed. Nextbenefit of virtual servers is that deployment from templates is provided. Whenset up correctly with load balancing and other technologies, the template of theproduct can de deployed to more machines to accommodate occasional spikes intraffic.

These two methods are very similar in that the product does not share system

17

3. THE CLOUDY BUSINESS

with other products. To ensure Quality of Service is possible, but the actual leveldepends heavily on the Quality of Service from the provider of virtual server ordedicated hosting.

3.1.4 Platform

Platforms lay between shared hosting and virtual servers. The platform providersets an entrance method to let the deployer upload the product into the platform.When uploaded, the platform allows the deployer to deploy instances of theproduct and scale the application accordingly. Platforms also provide resourcemanagement functionality and are mostly accounted by consumed resources.

In comparison to virtual servers and bare-metal, platforms does not providehuge flexibility in environment tuning. The environment is set up by the platformprovider and the application has to be bend to it. However that may be an advan-tage, when the developers know beforehand that the application will be runningon such a platform, that allows then to leverage platform-specific features.

Forcing the deployers to bend the product to the platform allows the plat-form providers to provide functionality, that would be otherwise impossible. Asen example, in platform deployment the deployer usually can not affect whereand how will the application instance be deployed. However this fact allows theplatform provider to relocate the applications according to resource demands byother applications to ensure some Quality of Service. The deployer on the otherhand gets from these compromises load balancing among the product instancesfor free.

3.2 Platform adoption

Adopting platform as a main target for newly developed projects brings benefitsto the company. On the other hand such an adoption is larger process that maytouch many departments in the company. What changes may be part of the pro-cess are described in this chapter from the points of views of departments in thecompany.

18

3. THE CLOUDY BUSINESS

3.2.1 Management

The first and most important decision is whether to adopt the platform or not.That decision depends on many factors that are specific for each company andproduct. Platform adoption may bring benefits as well as costs. However, such adecision should be always made from the long-term perspective.

When the company decides to adopt a platform, it is important to consider,who will be the provider. There are providers that provide platforms as a servicefor their customers. Using platform as a service means for the company fixedprice per running instance of the product. Price will scale linearly with how manyinstances of the product will be required to run to accommodate all incomingrequests from clients.

The company may also decide to run the platform itself. In such a case, theprice to run an instance of a product is not fixed. Most of the costs the companywill have to buy actual hardware to run the platform. From the perspective of aninstance of a product, the price will vary with how many instances of how manyproduct are running on the platform in the given time. With more products theprice will lover. However to this variable price, there are fixed costs the needsto be added to the calculation. There will be fixed costs for people to administerthe platform and assure it’s operability, although these tasks may be carried outby a department in the company that is already existent. Moreover there is thecost to connect the servers to the internet, being it in-house managed server or aserver-housing provider.

3.2.2 Developers

From the perspective of developers, platform adoption means learning new waysof product development. To develop for a platform means the developers haveto have some basic knowledge of the platform’s inner workings to be able to usethe platform’s features to the maximum.

Most platforms use version control systems as an entrance point to the system.In many smaller companies version control systems are not yet used to managesource code. In such companies there will be cost in teaching the developers touse the version control system as an integral part of the development cycle.

One of the biggest changes for the developer is being no longer responsible fordeployment.Developers tasks are reduces to delivering product into the versioncontrol system (or platform generally). The deployment for Quality assurance

19

3. THE CLOUDY BUSINESS

may be carried by the testers. And the deployment to production by managementby an user interface, because the real deployment is done by the platform, thereis only need to tell the platform that is should do the deployment.

20

4 Making the clouds wild

4.1 Wildcloud

The experience gained in the years of server management, web application devel-opment and hosting is applied in a project called Wildcloud. In the next chap-ters is described what Wildcloud should look like to help developers to deliverexceptional web applications.

Features are described at first to form an idea, what such a platform shouldlook like. Next differences to existing platforms are discussed to back up the rea-soning to start new project. At last the technologies used to implement such plat-form are discussed with the overall architecture.

4.2 Required features

The idea of the platform emerges from pieces that were created for hosting per-sonal and commercial projects. Having already experience with providing host-ing services, the platform is formed to ease the burden with deploying applica-tions into production environments and with ensuring the consistency of theseenvironments.

Important aspect of developing web applications and deploying then intoproduction environments is quality assurance. One of the key factors of the plat-form should be ability to deploy application versions into staging environments.These staging environments has to be as close as possible to the production de-ployment environment.

4.2.1 Security

Every application run in the cluster is isolated so that there is no way that twoapplication could interfere. Every application has it’s own set of data and it’s ownroot file system. The application runs as a regular user, but even by gaining higherprivileges and capabilities to modify system configuration, the other applicationscan not be affected.

Different applications need different set of system packages. Platform shouldallow to specify set of required system packages for each deployment. This set

21

4. MAKING THE CLOUDS WILD

of system packages should not interfere the other applications deployed to theplatform and the platform itself.

4.2.2 Resource allocation

Each application in the system can be allocated only some of the system resources.The operator can configure an exact memory consumption for an application.The configuration allows setting the memory it is allowed to use and swap it isallowed to use. Each application can have specific CPUs assigned and only thosewill be used by the scheduler. The platform allows the operator to specify howmany CPU cycles each application receives. This way the application can not takethe platform, and the other applications, to a resource starvation.

4.2.3 Building the application and central repository

When new application is set to be deployed to the platform, an image of the sys-tem should be created. The image will hold everything the application needs tobe deployed. The whole system image, required system packages, dependencies,compiled scripts, interpreters and the application itself, all is part of the image.

These images are stored in central repository. When the platform should de-ploy the application, the image is downloaded to the target node and started.This way it is possible to have one copy of prebuild image in the repository andonly when needed the image is transferred.

4.2.4 Thin provisioning

Because of the requirement on standalone images of the applications, such plat-form would have high requirements on storage capacity to accommodate thewhole system images of each application and on each node the application isdeployed to have the image unpacked and started.

To have storage capacity, thin provisioning should be put in place to lowerthe demand on bandwidth. With thin provisioning in place, only differences be-tween basic system image and image build for application will be stored. Down-loading whole image of the whole operating system for each deployment overthe network is not resource effective. With thin provisioning the amount of datatransmitted over the network is much smaller.

22

4. MAKING THE CLOUDS WILD

4.2.5 Extensibility

The system has to be created in a way that adding new features and technolo-gies is very simple. In some deployments Ruby language might be important inothers Node.js or Java. The platform has to allow broadening the set of technolo-gies it provides to the applications. It is important to ensure that technologies theplatform uses for it’s inner workings are replaceable. In Java ecosystem is lowernew technology adoption rate, compared to Python, Ruby or JavaScirpt. The plat-form has to be capable of keeping the pace with the world of new and moderntechnologies.

4.2.6 Deployment

To deploy an application a version control system is used. As part of the platform,version control repositories are provided to get changes from customers. Whennew application is supposed to be deployed, a revision id is specified and new ap-plication is build according to the state regarding the specified revision id. Moreapplications may be deployed from one repository to allow creating staging en-vironments for testing and quality assurance, before moving the application intoa production deployment.

4.2.7 File system

Application has access to the file system, where it is deployed, but has no guar-antee of persistence of files written to the file system. Every time when an appli-cation is deployed it starts with a state of file system that was created during thebuilding phase. The applications has to use some other service to store it’s files.The service might be provided by the platform operator, but also third-party ser-vice might be utilized.

4.3 Comparison with existing solutions

Starting new project is always time and resource consuming. With large projectsuch as a platform, it is even more demanding. In this chapter some existingproject are discussed and compared to the previous set of features, that are re-quired from the platform.

23

4. MAKING THE CLOUDS WILD

4.3.1 CloudFoundry

CloudFoundry1 is a platform created by VMware and open sourced under the termsof Apache 2 license. Because of the permissive license and big company backing thedevelopment the platform is getting big traction.

4.3.1.1 Tight coupling

The project is very complex and does not seem to be easily separated into au-tonomous components. The components are build on Ruby on Rails2 frameworkthat is very complex itself. In Wildcloud platform the components will be com-pletely separated and decoupled so that they can be used as standalone softwareto provide specific functionality.

4.3.1.2 Deployment

CloudFoundry provides client console application that is run in the root direc-tory of an application. Once the developer wants deploy the application, it iscompressed and pushed to the platform. Almost every developer nowadays usessome version control system. By uploading the application changes to the the ver-sion control system allows the platform save traffic needed to transfer the appli-cation as a snapshot. Once the application revisions are saved inside the platform,it is much simpler to deploy different versions of application for testing, qualityassurance or production deployment.

4.3.1.3 Message queuing

CloudFoundry uses a message queue to decouple some of it’s components. Asa message queue a ruby based product called NATS3 is used. In the beginningsof Twitter4, the team created and used their own message queue written in Rubywith very simple memcache-like protocol. This project was later abandoned andreimplemented using Scala5, because of performance issues with Ruby 6.

1. http://cloudfoundry.org/2. http://rubyonrails.org/3. https://github.com/derekcollison/nats4. http://www.twitter.com5. http://www.scala-lang.org/6. http://robey.livejournal.com/53832.html

24

4. MAKING THE CLOUDS WILD

4.3.1.4 Isolation

CloudFoundry does provide only basic application and resource isolation fea-tures. Applications are deployed from the same file system and the platform doesnot provide ability to install per application set of packages. To isolate resourceCloudFoundry uses ulimit7 to force the applications to behave correctly, butthis functionality does not allow isolation of running applications.

4.3.2 Nodejitsu

Nodejitsu8 is a platform as a service developed as an open-source project9 and alsoa commercial service. The components are loosely coupled and lightweight.

4.3.2.1 Node.js

Nodejitsu focuses on deployment of Node.js based applications only. Wildcloudsupports wider range of different technologies, including Node.js.

4.3.2.2 Isolation

Nodejitsu supports only simple chroot for application security. There is no sup-port for resource allocation and application isolation on the system level.

4.3.3 OpenShift

OpenShift10 by RedHat11 was announced after the project was started. Accordingto all the information available, the platform seems to be very similar to Wild-cloud, however it is not open-source. RedHat promised to make the platformopen in the future12.

7. http://www.linuxhowtos.org/Tips%20and%20Tricks/ulimit.htm8. http://nodejitsu.com/9. https://github.com/nodejitsu10. https://openshift.redhat.com/app/11. http://www.redhat.com/12. http://redmonk.com/sogrady/2011/05/04/deconstructing-red-hats-openshift-the-qa/

25

4. MAKING THE CLOUDS WILD

More information regarding the platform were requested form RedHat, infor-mation were promised, but till the deadline of the thesis, no information wereprovided.

4.4 Applied technologies

Wildcloud is composed of many technologies. The platform is build with exten-sibility in mind and almost any component in the platform can be replaced witha completely different kind of product.

For the primal implementation, however, it is desirable to choose basic set oftechnologies to bootstrap the project into a working state, that can be used as ablueprint for implementing extensions.

4.4.1 Operating system

As an operating system Ubuntu Linux Oneiric Ocelot (11.10) Server edition13 waschosen. This distribution provides modern packages compared to the more con-servative distributions. This is important from the point of Ruby development.The community around this language is pushing new technologies very fast anddistributions like Debian or RedHat Enterprise Linux can not keep the pace. Ubuntualso supports natively lots of different technologies that were abandoned or notimplemented in other distributions.

For example Aufs14 is at the moment only viable open-source solution for thinprovisioning and it is contained in Debian based distributions but not in Fedorabased. That was one of the reasons not to start building the project on Fedora.

4.4.2 Application security and isolation

Wildcloud builds it’s security features on Linux Containers15. Linux Containersallow very secure chroot environment. Once container is chrooted into a virtualenvironment, Linux Containers create isolations layer around the container toisolate it’s processes and resources. Linux Containers allow to specify what sys-

13. http://www.ubuntu.com/business/server/overview14. http://aufs.sourceforge.net/15. http://lxc.sourceforge.net/

26

4. MAKING THE CLOUDS WILD

tem resources each container is allowed to use, and can be used to achieve finegrained resource allocation.

Linux Containers are operating system level virtualization16. Compared to fullsystem virtualization17, there is only one kernel running, so this kind of virtual-ization imposes less overhead on the system. The performance is almost the sameas a native system18. More to that, operating system level virtualization can beused inside full system virtualization like Xen19 or KVM20.

Linux Containers are implemented as part of new versions of Linux kerneland does not require external patches like OpenVZ21. OpenVZ is more matureproduct, however it is focused on RedHat Enterprise Linux based distributionsand requires old kernel version. It is also difficult to combine patches of OpenVZand Aufs into a single kernel.

4.4.3 Loose coupling

AMQP22 is used to decouple the platform into smaller components. As an imple-mentation of AMQP broker, RabbitMQ23 was chosen, because of it’s performance,features and wide deployment. RabbitMQ allows clustering and ability to buildsystems with no single point of failure.

Each component connects to RabbitMQ and reports it’s status. After that thecomponent waits for instructions what should be done. The instructions can besend by any component in the platform, however in most cases these instruc-tions are be send by some platform orchestration component, that coordinatesthe whole platform or it’s part. This orchestration component is not part of theplatform itself, because it is deployment specific.

16. http://en.wikipedia.org/wiki/Operating system-level virtualization17. http://en.wikipedia.org/wiki/Full virtualization18. http://en.opensuse.org/SDB:LXC19. http://xen.org/20. http://www.linux-kvm.org/page/Main Page21. http://wiki.openvz.org/Main Page22. http://www.amqp.org/23. http://www.rabbitmq.com/

27

4. MAKING THE CLOUDS WILD

4.4.4 Thin provisioning

Advanced multi layered unification file system version 3.x (Aufs3)24 is used toprovide thin provisioning of virtual machines. Aufs stacks separate directorieson top of each other in read-write or read-only mode. Wildcloud uses Aufs tocreate virtual environments for applications.

When application stack is build all data resulting from the process are writteninto a separate directory that forms a read-write layer on top of read-only direc-tory containing the base system. The resulting directory is packed and ready tobe deployed into the cloud.

When the deployment process is started the application stack is transferredto the targeted node and unpacked. To create a running environment, the basesystem (the same as in the build phase) is read-only layered, on top of that isthe application stack read-only layered and on top of that read-write temporarydirectory is layered.

4.4.5 Data storage

The file system the application runs on is transient. Whenever an application isdeployed, new temporary directory is created and all files written by the appli-cation to the file system are contained in the directory. When the application isundeployed the data are destroyed. This allows the application to use temporarydata and also to provide the possibility to move the applications among the clus-ter.

Databases are the core of modern web applications. A database server may bestarted as an application in the cluster and would be available to the applicationsdeveloper deploys to the platform. But the file system the applications are using isnot persistent, so the data will not survive. To solve the problem the applicationshave to use some other service.

The platform provider can provide and scale some database servers for theclients, but that is not in the scope of the platform itself. The same it is with up-loaded files. The applications have to use an external service to save data thatneeds to be persistent. These features are deployments specific and therefore arenot part of the platform.

24. http://aufs.sourceforge.net/

28

4. MAKING THE CLOUDS WILD

4.4.6 Cloud communication

Whenever an application instance is started, new IP address is assigned to it. IPaddress is used to route HTTP requests to the application instance. Having anaddress for each application allows the applications communicate among them.Within each application instance more than one process may run, therefore rout-ing based on ports is not the right solution.

4.4.7 External services

To the application itself the environment seems like standard Linux system. Whenan application wants to initiate a connection it is allowed to do it. Based on theexternal system configuration, the connection is handled. In actual implemen-tation, the communication is handled as NAT. Therefore applications are able toconnect to the external network, however they are not reachable from the externalnetwork directly. Simple port-forwarding system would allow opening externalports and routing the communication to the application, however because theplatform main task is to stream-line web application deployment, such a func-tionality is not required.

4.4.8 HTTP routing

Wildcloud implements an HTTP proxy that routes incoming HTTP connectionsamong the cluster. Because the platform can move deployments of applicationsfrom node to node according to actual performance characteristics and applica-tion deployments, it is not possible to give the application static address. Therouter solves this problem. All connections coming to applications inside thecluster are proxied through an HTTP proxy that is aware of actual location of re-quested applications and forwards the request to the actual virtual environment.

HTTP proxy serves also as a load balancer, when more instances of an applica-tions are deployed, the requests are distributed evenly to all those instances. Thisprovides important aspects of scalability. By adding application deployments intothe cluster, the application can scale horizontally. And it is only up to the platformprovider to ensure enough computational resources.

29

4. MAKING THE CLOUDS WILD

4.4.9 Session store

The HTTP proxy is responsible for delivering the request and it decides where therequest is routed based on the performance aspects of the platform. However al-most all modern web applications use session to store data among requests to theapplication. The two most widely used models are memory and file based stor-ages. Both these models can not be used in a platform environment because thedata would not survive application relocation and the requests come randomlyto the instances, therefore on one instance the user would be logged in and on theothers not.

The problem can be solved by storing data in external storage like database.This approach however add more overhead on a database server and makes thedatabase server needs for resources higher. Wildcloud solves this problem by im-plementing simple HTTP based service, that is responsible for managing sessionsfor applications. The requests to the server may be done asynchronously and willnot block evented architectures. Then it is up to the service to save the sessionsand scale the storage itself.

4.5 Architecture

The architecture of the platform is discussed in this chapter. In the previous chap-ters the technologies used to build the platform were discussed as isolated pieces.To make the platform working, it is required to make the components communi-cate.

4.5.1 Basic communication

The chart 4.1 represents how all communication in the platform works.

Figure 4.1: Basic communication in the platform

MQ stands for Message Queue. Message Queue is responsible for deliveringmessages among components in the cluster. On the diagrams the Message Queue

30

4. MAKING THE CLOUDS WILD

is represented as a single instance, however in production deployments it shouldbe run as a cluster of nodes on many machines. The components itself does notknow what other components are in the cluster or where they are located. Com-ponents just connects to the Message Queue and creates required routings forpublished messages.

This mechanism can be easily illustrated on the communication between Brainand Git components. When Brain starts, it connects to the Message Queue andlet’s it know, that all messages tagged as “master” should be routed to it and thiscomponent will handle those messages. Afterwards when Git component startsand connects to the Message Queue, it let’s the Message Queue know that allmessages related to Git and tagged as the name of the node, should be routed tothat component.

When the Git component is fully started it publishes a message that a nodewith a “specific name”, wants to get full synchronization of data related to Git andtags the message as “master”. The Brain component receives the messages basedon the routing, generates a response and publishes the response tagged with the“specific name” of the node, contained in the original request. The response isdelivered to the node, based on routing by the Message Queue.

This way, neither the Brain nor Git component needs to know how they areimplemented or where they are located. This architecture allows to operate Wild-cloud on single machine as well as in separate data-centers across the world.

In this and the following chapters Brain represents the orchestration com-ponent. This component is responsible for organizing the cloud and instructingother components whet they are supposed to do.

Only the Brain is aware of the state of the world. Regular components knowonly the information, that are directly connected to they purpose. Brain is respon-sible for storing all information regarding the state of the components and alsois responsible of delivering all required information to the components when theinformation is requested, as well as all mediate information published by all reg-ular components.

4.5.2 Core components

The chart 4.2 displays all core components of the platform. As mentioned in pre-vious chapter no components are directly connected. All communication amongthe components is directed by the Message Queue.

31

4. MAKING THE CLOUDS WILD

Figure 4.2: Core components of Wildcloud

The platform internally uses Git to transfer application data from clients tothe platform. Git service is responsible for managing all aspects of platform re-garding Git version control system25. The platform uses Git over SSH transport toprovide high performance and compatibility. To authenticate users SSH keys26 areused. The component is responsible for SSH key management of the machine theSSH server is running. The component also creates and destroys repositories onthe file system as is requested by the Brain component. When the client is authen-ticated and it is known what repository the client wants to access, the componentschecks the authorization and allows the operation or permits it. When all data is

25. http://git-scm.com/26. https://wiki.archlinux.org/index.php/SSH Keys

32

4. MAKING THE CLOUDS WILD

received, Git notifies Brain that new revisions all available. After that the projectmanager may be notified of new changes in the application, as seen on the chart4.3.

Figure 4.3: Pushing new revision to Git

When a Git component receives new data, the client can request to build newimage of the application. For this task is responsible Builder component, as canbe seen on chart 4.4. When the Builder receives a request to build new image,it sets up new empty environment base on the “base image”. Into the environ-ment the sources of the application are cloned from the Git repository. Dependingon the platform the application uses, the Builder sets up the environment. Thisprocess always consist of two steps. As a superuser, the Builder configures theoperating system aspects of the environment. In that phase new operating sys-tem packages are installed according to manifest provided by the application. Inthe second phase, the Builder runs application specific tasks as an unprivilegeduser. When both phases are finished, the Builder compresses the environmentand pushes the resulting image into a central repository.

Keeper is responsible of starting and stopping application instances amongthe cluster. When the client decides to deploy new version of the application orwhen the client want to scale the application to more instances, the Brain is no-tified and requests an appropriate task from the Keeper on a specific node. TheBrain can also move an instance of application to a different node in the cluster.To do so, it simply requests stop of an instance on one node and start of an in-stance on another node. The reasoning to move instances among the cluster is toallow better allocation of resources and higher density of deployed applications.As seen on the chart 4.5, when the Keeper is requested to deploy new application,it download the image from central repository and starts the application.

When a requests comes to the platform, Router is responsible for delivering

33

4. MAKING THE CLOUDS WILD

Figure 4.4: Building new version of application

it to the actual instance of the application. The decision where to route the requestis just on the router and can be replaced. When new instance of an application is

34

4. MAKING THE CLOUDS WILD

Figure 4.5: Deploying new application

started or stopped or when relocation of an instance has been made, the router isnotified by the brain about the actual state of the routing table.

4.5.3 Components seen by applications

The figure 4.8 explains how the architecture looks like from the perspective of thedeployed applications.

All those components mentioned in previous chapter are transparent to theapplication. The application is somehow started, however it is not aware of themechanisms that lead to it’s starting. The application is only directly concernedwith two components.

Storage is responsible for storing application data. This component can beprovided by the platform operator or by some third-party service. As part ofWilcloud a basic component is provided. The storage is simple HTTP REST 27

service, that accepts only small range of requests. Every requests has to containspecial “X-Appid” header to authenticate the application, based on the applica-tion, data files are accessed. All possible operations are described in table 4.5.3.

The storage implements multiple back-ends to store the actual data. To allowas simple deployment as possible, it implements file system storage, operatedonly within the file system of the system, the storage is running on. This back-enddoes not provide any measures to scale or ensure high availability. To providesuch functionality, storage implements MongoDB28 as a back-end. MongoDB is

27. http://www.ics.uci.edu/ fielding/pubs/dissertation/top.htm28. http://www.mongodb.org/

35

4. MAKING THE CLOUDS WILD

document-oriented database29, that natively supports clustering and sharding toprovide highly available and scalable data storage. Document are stored in theformat of binary JSON30. Based on the document-oriented architecture, MongoDBprovides virtual file system called GridFS31.

Figure 4.6: Operations provided by storage service

Method Path DescriptionGET /some/path Download data from the location /some/pathPUT /some/path Save data to the location /some/path

DELETE /some/path Delete data from the location /some/pathGET / list files List files available to the application

Session store provides central repository to store sessions. Session cannot use any storage that would not be available to all instances of the applica-tion in the cluster, because requests are routed to random nodes in the cluster,the data in sessions has to be available to all those nodes. Wildcloud providessimple HTTP REST service to store and load a session. Operations provided bythe service are described in the table 4.5.3. Session store uses Redis database32 asa back-end. Redis is very fast in-memory key-value database, with possibility topersist date to the file system. Redis can be run in master-slave replication modeto allow higher performance.

Figure 4.7: Operations provided by session service

Method Path DescriptionGET /session id Load data for session idPUT /session id Save data for session id

DELETE /session id Delete data for session id

The application may also save the session data into cookie and make the dis-tribution among the working instances as part of the HTTP requests. There are

29. http://en.wikipedia.org/wiki/Document-oriented database30. http://www.mongodb.org/display/DOCS/BSON31. http://www.mongodb.org/display/DOCS/GridFS32. http://redis.io/

36

4. MAKING THE CLOUDS WILD

however some limitations. Cookies have limit on the size of data it may containand therefore this method is not suitable for application that use session storeheavily. Also to ensure some level of security, the cookies are encrypted. How-ever when the key used to encode the data is known, an attacker may read thedata from cookies or to put different data into it.

The application is indirectly concerned with one more component - Router.All incoming requests come through the HTTP routing proxies. Moreover also allresponses from the applications are send through those proxies. The proxies cancount bandwidth from and to applications and based on that make regulations toensure some kind of Quality of Service. The proxies may also modify request andresponses to dynamically add more information.

4.5.4 Routing of HTTP requests

On the chart 4.9 the path of a request through the platform can be seen. As saidbefore, the platform has to ensure, that requests are delivered to the application.This is required because the platform dynamically relocates the application in-stances among the cluster and it’s addresses are changed accordingly to theiractual location.

When client initiates a connection to the platform, it is received on front facingproxy server. These server are high performing and has static set of destinations.In this chart is only one front facing proxy used, but there might be many ofthem to increase the performance. There can be hardware based load balancer todifferentiate the traffic to multiple proxies, or simply by utilizing round-robin DNSrecords33.

Routers receive information regarding the state of applications from Braincomponent. Each Router may have different set of rules, where to direct the re-quest. This simple fact allows creating sub-systems or sub-networks, that may behandled differently. But the proxy server may use different algorithms to choosethe Router for each request, so it is important to ensure, that the chosen routerhow to handle the request. It is also possible to chain routers to create fine-grainednetwork system.

33. http://tools.ietf.org/html/rfc1794

37

4. MAKING THE CLOUDS WILD

Figure 4.8: Components from the perspective of applications

4.6 Testing

The platform is already being used in production for small sized deployment.In this chapter is the behaviour of the platform in this particular deployment

38

4. MAKING THE CLOUDS WILD

Figure 4.9: Routing of HTTP requests through the platform

39

4. MAKING THE CLOUDS WILD

described.

4.6.1 Functional testing

During the development of the platform no unit nor behaviour tests were used.

Testing would add too much overhead to the development process. The plat-form is build from functional pieces of software that have already been used inproduction and therefore tested. Moreover there was a vision what the platformshould be capable of doing and well defined set of functionality to implement.However, there were no requirements on the implementation itself. To adopt test-driven development, a specification regarding the implementation would be ex-pected and that would make the development process less lean.

The development model was inspired by the Lean startup movement34. Themain point of this philosophy is to lower all unnecessary burden connected withproduct development to minimum, but to provide functional product.

The best testing environment is a production environment. So to test the plat-form during the development a methodology called Continues deployment35

was used. Once all the components were capable of their basic functionality. Theywere deployed into the production environment. There were agreements withsome web applications, that occasional disruptions in the service will be toler-ated and the applications had been deployed into the platform and started.

The functioning of the platform was monitored and whenever a problem ap-peared, the bug was patched and new version of the particular component hadbeen deployed back into the production system.

By using unit or behaviour testing the most obvious problems will be discov-ered, however Continues deployment in a production environment will discoverall possible problems. In such testing methodology all functional, behaviour andintegration testing is covered.

4.6.2 Load testing

The platform uses well tested and optimized components when possible and pro-vides the orchestration and management on top of them. The most crucial com-

34. http://theleanstartup.com/35. http://www.startuplessonslearned.com/2009/06/why-continuous-deployment.html

40

4. MAKING THE CLOUDS WILD

ponent from the point of view of performance is Router. All incoming requestsand outgoing responses have to go through it.

According to internal benchmarking inside the component, the overhead perconnection is 10ms. This time includes the internal processing of the request, con-necting to the backend, relaying the request and response.

41

5 Product evaluation

In the last chapter are discussed topics related to end users and the evaluationof the platform. This chapter provides real data, that may be used as a basis fordecisions.

5.1 Case-study

One of the possible deployments of the platform is staging environment for de-velopment of web applications. The case will be described regarding a medium-sized internet agency. In the text it will be referred to the internet agency as Com-pany.

5.1.1 Company

The Company is medium sized internet agency developing web applications andpresentations. For development of it’s products it utilizes two programming lan-guages - Ruby and PHP. PHP is used with combination of internal content man-agement system. Ruby is used to develop more complex and demanding appli-cations.

The Company has 10 employees. There is one coder for UI interface develop-ment. The Company has 2 PHP developers working on long-term projects andoccasional one-time contracts. Next, there are 5 Ruby developers and two projectmanager. The project managers are responsible for quality assurance.

The Company provides hosting for web applications using it’s own server.The Company has 5 servers located in server housing facility.

5.1.2 Problems

Because of the economic recession, the problem with gaining new clients ap-peared. The companies are uncertain of the future of economical stability and areafraid of investments. More over customers tend not to spend that much moneyeven with stronger marketing campaigns. With the decrease in customer rate andresistance to marketing actions the companies are investing less resources intothe promotion and marketing.

42

5. PRODUCT EVALUATION

The Company has to lower the operational costs and streamline the develop-ment process. The biggest problem is the overhead of deploying and testing theirproducts. To test the product a developer is required to deploy specific versionfor quality assurance.

5.1.3 Requirements

The Company has to automate the deployment of staging applications and lowerthe dependence of quality assurance on the developers.

When new version of application is pushed into the version control reposi-tory, the project manager is automatically notified. Using simple UI, the projectmanager starts the particular version of an application and tests it. When the testsare successful the customer is invited to do it’s testing. When the testing is done,found bugs are inserted into the bug tracking system or the application is stoppedin the staging environment and deployed into a production environment.

The whole process has to be automated and allow to be operated by userswithout technical education. It is also crucial to support both PHP and Ruby lan-guage using the same system and allow project managers to operate both lan-guages using the same set of tools.

In the future there should be one-click solution for the deployment into theproduction environment from the same user interface that is used for managingstaging environment.

5.1.4 Solution

The platform as implemented provides most of the features required by the Com-pany. It will streamline it’s development process and provide all required fea-tures. It also provides space to grow and to adopt new technologies in the future.

Managing version control system repositories is one of the core tasks the plat-form provides. As requested from the UI, the repositories are created and de-stroyed. The platform allows to manage access permission to particular reposito-ries by specific users. The also handles notifications of new versions pushed intoversion control repositories.

The platform is capable of starting applications from version control system.The deployer may specify a specific revision that is supposed to be deployed andthat revision of application will be started. By default the newest revision is used.

43

5. PRODUCT EVALUATION

Revisions may be specific commits, branches or user-defined tags and names.

Inside the platform the routing of requests to applications is based on domainnames and thus by setting the right DNS records the application may be pre-sented to the customer. Special domain names may used to provide basic securityby using random generated strings.

The platform provides simple web-based user interface to access these fea-tures and is therefore suitable for use of non-technical staff.

Wildcloud platform supports any technology that may run on Linux operat-ing system and inside Linux Containers. At the moment the Company is usingPHP and Ruby, however in a few months, is might happen that another tech-nology, like Node.js or Java, will be required. The platform is capable of accom-modating such requirements only by upgrading the base image, or simply byinstalling per-application system packages inside it’s image.

5.1.5 Benefits

The Company expects the need of one-click solution for deployment of web ap-plications into production environment. By using Wildcloud platform the com-pany may merge staging and production servers into one platform and thus min-imize the differences between production and staging environments.

Wildcloud platform offers application isolation. This way a bug in an applica-tion can not affect the other running applications on the server. With combinationwith per-application resource allocation there is no need to have separated pro-duction and staging servers.

Wildcloud allows to deploy applications using new operating system. By up-grading the base system image and rebuilding the application, the applicationwill be running inside the new environment, without the need to modify theouter operating system.

5.2 From customer’s point of view

The platform was thoroughly discussed from the point of view of architectureand used technologies in previous chapters. From the point of view of an end-user that information is not relevant. The user will be working with front-endapplication and this application is discussed in this chapter.

44

5. PRODUCT EVALUATION

5.2.1 Welcome screens

The application is created with simplicity in mind. The screens are as simple aspossible, but provide all necessary information to operate the system. The designis based in modern user interface frameworks to provide necessary user experi-ence.

Figure 5.1: Welcome screen when user enters the application

Registered users may login directly from the first (picture 5.1). The applicationuses user’s e-mail as a username to simplify users’s identification. User that doesnot have their accounts may register (picture 5.2). By default users after registra-tion are inactive and are not allowed to login. Manual action from operators isrequired to activate the account.

Figure 5.2: Registration form

45

5. PRODUCT EVALUATION

5.2.2 Navigation

The navigation is composed of two components. Main navigation panel is locatedon the top of the page and allows the user to move among the sections of theapplication (picture 5.3).

Figure 5.3: Navigation bar

To allow fine-grained navigation inside the sections a special area on the leftside of page is used. It may contain navigation elements. Standard navigation inthe application is done using hyperlinks (picture 5.4).

Figure 5.4: Navigation elements in sidebar

When the user is about to do an important action, that might affect the func-tioning of the platform, it is displayed as a coloured button (picture 5.5

Figure 5.5: Action buttons in sidebar

46

5. PRODUCT EVALUATION

Lastly the sidebar area may contain a text, that should help the user to un-derstand what the screen is responsible for, or to give advice how to operate theplatform (picture 5.6).

Figure 5.6: Special content in sidebar

5.2.3 Dashboard

When the user logs into the application, the dashboard screen is displayed. TheDashboard is special section that should provide the user quick access to impor-tant information regarding the functioning of the platform and deployed appli-cations (picture 5.7).

As implemented in the first version of the platform, as part of this thesis, theDashboard provides overview of applications in the platform, deployed instancesand domain name, the application is accessible from. From the sidebar, the usercan quickly create new application.

5.2.4 SSH keys

The platform uses SSH keys to authenticate users during the git-related com-munication. The keys are managed from this central place and distributed to gitservers.

When the key is saved, the repositories are accessible to the user using thatkey. Users may have multiple keys to allow access from multiple computers with-out the need of copying one key to all of them.

47

5. PRODUCT EVALUATION

Figure 5.7: Dashboard screen

Figure 5.8: SSH keys management

5.2.5 Repositories

The platform uses Git repositories to transfer changes from developers to servers.Developers may create multiple repositories for development. The repositoriesare not directly related to deployed applications. The repositories may be useindependently and the specified when used for deployment of an application.

Once created, an unique identifier is created for the repository. That identifieris based on user entered data, but can not be changed afterwards. Also the detailof the repository provides information how to access the repository (picture 5.9).

48

5. PRODUCT EVALUATION

Figure 5.9: Git repositories managemet

5.2.6 Applications

The main purpose of the platform is to deploy web applications. Applications sec-tion manages this aspect. As said before, the deployment process has two phases.First the application is build from Git repository and a deployable image is cre-ated. This images contains all necessary data to start the application. This is doneonly once for each version of a particular application.

When the image is build, the process moves into the second phase. The usermay deploy the application and have it accessible using some domain name andHTTP protocol. Whenever an instance is deployed, the platform automaticallymodifies the routing infrastructure to make the instance accessible. When the in-stance is undeployed, the routes are destroyed.

When the user deploys the application into multiple instances. The platformautomatically load balances the request among those instances. The actual rout-ing infrastructure may be inspected in the Router section.

From the main screen (picture 5.10) the user may build new application ordestroy an existing application. To update the application from version controlrepository, the user rebuilds the application from this screen. When the applica-tion is build, it may be inspected.

During the build of the application information is collected. The log is thenaccessible using he Build log (picture 5.11) subsection. The log contains informa-tion regarding git repository cloning, system package installation, or the actualapplication build.

49

5. PRODUCT EVALUATION

Figure 5.10: Applications overview

Figure 5.11: Application build log

In the Deployment subsection, the user deploys and undeploys instances ofthe application (picture 5.12).

Figure 5.12: Applications deployments overview

5.2.7 Router

The Router section manages routes inside the platform (picture 5.13). Wheneveran application is deployed or undeployed, the actual routes are created or de-stroyed respectively. The user does not have to explicitly create these routes. TheRouter allows to inspect these routes.

50

5. PRODUCT EVALUATION

Moreover the router may be instructed manually to route requests. From theapplication is is required to specify incoming Host name and port to identifythe requests. And target host name and port to create a proxy connection to theserver.

Figure 5.13: Routing overview

51

6 Conclusion

The main subject of the thesis was to analyse requirements for deploying web ap-plications and to implements a platform that will ease the process. The analyticalpart of the subject was fulfilled in the first two chapters of the thesis, where thechanges in understanding of web application deployment regarding the cloudcomputing phenomenon were described.

The implementational part of the subject was fulfilled the the third chapter ofthe theses. First, the requirements on such a platform for deploying web applica-tions were layed. The requirements are based on real problems, the deployers ofweb applications are confronted with.

Knowing the requirements, existing products were discussed. None of the an-alyzed projects did fulfil the requirements of the platform as stated before. There-fore new platform was implemented and it’s implementation details were dis-cussed. In the last chapter, the resulting product was described from the businessowners point of view. The simplicity of user-interface that allows to operate theplatform was considered, as well as functional and load testing of the applica-tion. The platform was developed inside a production environment and thereforewas continuously tested by the use of real users. One case-study was presentedto prove that the platform may solve different kinds of problems and may helpbusinesses to grow.

A project like this one is never finished. The development of the platform willcontinue. As implemented as part of the theses the platform provides functional-ity for deploying web applications. In the future, the platform might be extend toprovide project management features and provide single solution for web appli-cation developers, managers and deployers.

Also the platform may not be limited to deploying web applications. By em-ploying more sophisticated networking architecture the platform might be ex-tended to provide base for deploying any networking application and to serve asa basis for high-performance-computing cluster.

The platform should also be extended to provide wider range of technologies.It utilized Linux containers that provide in-kernel virtualization. By providinghardware-virtualization the platform may be extended to provide more complexset of features by providing support for using different set of operating systems.

52

BIBLIOGRAPHY

The source code is published on Github1 and may be accessed freely.

https://github.com/wildcloud

The project is licensed under the terms of Affero General Public License and isfree to use to anyone. Any modifications to the components have to be released asopen-source under the same license. The project is loosely coupled and thereforeby using unmodified components is allowed also in commercial deploymentswithout the need to open-source all components in the project. The license is,however, open to discussion and may change in the future.

By working on the project I have gained a lot of knowledge and created func-tional platform that may be used in production environments. The implemen-tation parts of the theses were interesting, because a lot of low level aspects ofnetworking and Linux operating systems had been explored.

1. http://www.github.com

53

Bibliography

[1] Kernel based virtual machine. http://www.linux-kvm.org/page/Main\_Page, 2011. [Online; accessed 12-Semptember-2011].

[2] Inc. 10gen. Bson. http://www.mongodb.org/display/DOCS/BSON,2011. [Online; accessed 12-November-2011].

[3] Inc. 10gen. Gridfs. http://www.mongodb.org/display/DOCS/GridFS, 2011. [Online; accessed 5-November-2011].

[4] Inc. 10gen. Mongodb. http://www.mongodb.org/, 2011. [Online; ac-cessed 12-November-2011].

[5] Muhammad Ali Babar and Muhammad Aufeef Chauhan. A tale of migra-tion to cloud computing for sharing experiences and observations. In Pro-ceedings of the 2nd International Workshop on Software Engineering forCloud Computing, SECLOUD ’11, pages 50–56, New York, NY, USA, 2011.ACM.

[6] Christian Baun and Marcel Kunze. The koala cloud management service: amodern approach for cloud infrastructure management. In Proceedings ofthe First International Workshop on Cloud Computing Platforms, CloudCP’11, pages 1:1–1:6, New York, NY, USA, 2011. ACM.

[7] Network Working Group (T. Brisco). Dns support for load balancing.http://tools.ietf.org/html/rfc1794, 1995. [Online; accessed 5-Semptember-2011].

[8] Sergey Bykov, Alan Geller, Gabriel Kliot, James R. Larus, Ravi Pandya, andJorgen Thelin. Orleans: cloud computing for everyone. In Proceedings of the2nd ACM Symposium on Cloud Computing, SOCC ’11, pages 16:1–16:14,New York, NY, USA, 2011. ACM.

[9] Scott Chacon. Pro Git. Apress, Berkely, CA, USA, 1st edition, 2009.

[10] Scott Chacon. Git - fast version control system. http://git-scm.com/,2011. [Online; accessed 27-November-2011].

[11] Inc. Citrix Systems. Welcome to xen.org, home of the xen hypervisor, thepowerful open source industry standard for virtualization. http://www.xen.org, 2011. [Online; accessed 12-October-2011].

54

BIBLIOGRAPHY

[12] Derek Collison. derekcollison/nats. https://github.com/derekcollison/nats, 2011. [Online; accessed 28-October-2011].

[13] ArchWiki coordinators. Ssh keys. https://wiki.archlinux.org/index.php/SSH\_Keys, 2011. [Online; accessed 5-November-2011].

[14] Oracle Corporation. Oracle solaris 11. http://www.oracle.com/us/products/servers-storage/solaris/solaris11/overview/index.html, 2011. [Online; accessed 5-August-2011].

[15] Dropbox. Dropbox - files - simplify your life. https://www.dropbox.com/, 2011. [Online; accessed 6-August-2011].

[16] Inc. Eucalyptus Systems. Cloud computing software from eucalyptus —leader in cloud software. http://www.eucalyptus.com/, 2011. [Online;accessed 27-Semptember-2011].

[17] Facebook. Facebook. http://www.facebook.com/, 2011. [Online; ac-cessed 12-December-2011].

[18] Roy Thomas Fielding. Architectural styles and the design of network-basedsoftware architectures. http://www.ics.uci.edu/˜fielding/pubs/dissertation/top.htm, 2000. [Online; accessed 27-November-2011].

[19] Jeremy Geelan. Twenty-one experts define cloud computing. http://cloudcomputing.sys-con.com/node/612375, 2009. [Online; accessed5-October-2011].

[20] Google. Gmail: Email from google. http://www.gmail.com, 2011. [On-line; accessed 20-Sepmtember-2011].

[21] Google. Google. http://www.google.com/, 2011. [Online; accessed 27-December-2011].

[22] Google. Google app engine. http://code.google.com/intl/cs/appengine/, 2011. [Online; accessed 5-November-2011].

[23] Google. Google apps for business — official website. http://www.google.com/apps/intl/en/business/index.html, 2011. [Online;accessed 5-November-2011].

[24] Google. Google chat - chat with family and friends. http://www.google.com/talk/, 2011. [Online; accessed 12-Sepmtember-2011].

55

BIBLIOGRAPHY

[25] Google. Google docs - online documents, spreadsheets, presentations, sur-veys, file storage and more. http://docs.google.com/, 2011. [Online;accessed 27-Sepmtember-2011].

[26] Google. Google+: real life sharing, rethought for the web. http://plus.google.com, 2011. [Online; accessed 13-Sepmtember-2011].

[27] Google. Picasa web albums: free photo sharing from google. http://picasaweb.google.com, 2011. [Online; accessed 12-Sepmtember-2011].

[28] AMQP Working Group. Amqp. http://www.amqp.org/, 2011. [Online;accessed 15-December-2011].

[29] PostgreSQL Global Development Group. Postgresql. http://www.postgresql.org/, 2011. [Online; accessed 27-September-2011].

[30] David Heinemeier Hansson. Ruby on rails. http://rubyonrails.org/,2011. [Online; accessed 15-October-2011].

[31] Inc. Heroku. Heroku — dev center. http://devcenter.heroku.com/,2011. [Online; accessed 27-November-2011].

[32] GitHub Inc. Github - social coding. http://www.github.com, 2011. [On-line; accessed 5-November-2011].

[33] Roger Jennings. Cloud Computing with the Windows Azure Platform. WroxPress Ltd., Birmingham, UK, UK, 2009.

[34] Inc. Joyent. Smartos: The complete modern operating system. http://smartos.org/, 2011. [Online; accessed 27-October-2011].

[35] Steffen Kachele, Jorg Domaschka, and Franz J. Hauck. Cosca: an easy-to-usecomponent-based paas cloud system for common applications. In Proceed-ings of the First International Workshop on Cloud Computing Platforms,CloudCP ’11, pages 4:1–4:6, New York, NY, USA, 2011. ACM.

[36] Inc. Linux Kernel Organization. Documentation/cgroups. http://www.kernel.org/doc/Documentation/cgroups/, 2011. [Online; accessed12-December-2011].

[37] Amazon Web Services LLC. About aws. http://aws.amazon.com/what-is-aws/, 2011. [Online; accessed 12-November-2011].

56

BIBLIOGRAPHY

[38] Amazon Web Services LLC. Amazon elastic block store (ebs). http://aws.amazon.com/ebs/, 2011. [Online; accessed 12-December-2011].

[39] Amazon Web Services LLC. Amazon elastic compute cloud (amazon ec2).http://aws.amazon.com/ec2/, 2011. [Online; accessed 5-November-2011].

[40] Canonical Ltd. Server — ubuntu. http://www.ubuntu.com/business/server/overview, 2011. [Online; accessed 12-December-2011].

[41] Parallels Holdings Ltd. Main page - openvz linux containers wiki. http://wiki.openvz.org/Main\_Page, 2011. [Online; accessed 5-October-2011].

[42] lxc Linux Containers. lxc - linux containers. http://lxc.sourceforge.net/, 2011. [Online; accessed 5-December-2011].

[43] Microsoft. Ixmlhttprequest. http://msdn.microsoft.com/en-us/library/ms759148(VS.85).aspx, 2011. [Online; accessed 12-October-2011].

[44] Microsoft. Windows azure. http://www.microsoft.com/windowsazure/, 2011. [Online; accessed 12-Semptember-2011].

[45] Rich Miller. Who has the most web servers? http://www.datacenterknowledge.com/archives/2009/05/14/whos-got-the-most-web-servers/, 2009. [Online; accessed 27-November-2011].

[46] Incorporated. NewServers. Newservers: Bare metal cloud. http://www.newservers.com/language/en/, 2011. [Online; accessed 5-September-2011].

[47] Nodejitsu. nodejitsu. https://github.com/nodejitsu, 2011. [Online;accessed 5-November-2011].

[48] Nodejitsu.com. Nodejitsu. http://nodejitsu.com/, 2011. [Online; ac-cessed 15-November-2011].

[49] Inc. Novell and others. Sdb:lxc. http://en.opensuse.org/SDB:LXC,2011. [Online; accessed 12-October-2011].

57

BIBLIOGRAPHY

[50] Stephen O’Grady. Deconstructing red hat’s openshift: Theq&a. http://redmonk.com/sogrady/2011/05/04/deconstructing-red-hats-openshift-the-qa/, 2011. [Online;accessed 5-December-2011].

[51] Junjiro R. Okajima. aufs.sourceforge.net. http://aufs.sourceforge.net/, 2011. [Online; accessed 27-December-2011].

[52] OpenNebula Project Leads (OpenNebula.org). .:: Opennebula: The opensource toolkit for data center virtualization ::. http://www.opennebula.org/, 2011. [Online; accessed 27-Semptember-2011].

[53] Siani Pearson. Taking account of privacy when designing cloud computingservices. In Proceedings of the 2009 ICSE Workshop on Software Engineer-ing Challenges of Cloud Computing, CLOUD ’09, pages 44–52, Washington,DC, USA, 2009. IEEE Computer Society.

[54] Heroku — Cloud Application Platform. Heroku, inc. http://www.heroku.com/, 2011. [Online; accessed 12-November-2011].

[55] Robey Pointer. scarling -¿ kestrel. http://robey.livejournal.com/53832.html, 2011. [Online; accessed 5-November-2011].

[56] The OpenStack project. Openstack open source cloud computing software.http://openstack.org/, 2011. [Online; accessed 5-October-2011].

[57] US Inc. Rackspace. Cloud computing, managed hosting, dedicated serverhosting by rackspace. http://www.rackspace.com/, 2011. [Online; ac-cessed 12-October-2011].

[58] Inc. Red Hat. Openshift by red hat. https://openshift.redhat.com/app/, 2011. [Online; accessed 5-December-2011].

[59] Inc. Red Hat. redhat.com — the world’s open source leader. http://www.redhat.com/, 2011. [Online; accessed 5-December-2011].

[60] Eric Ries. Why continuous deployment? http://www.startuplessonslearned.com/2009/06/why-continuous-deployment.html, 2009. [Online; accessed 27-Semptember-2011].

[61] Eric Ries. The lean startup. http://theleanstartup.com/, 2011. [On-line; accessed 12-Semptember-2011].

58

BIBLIOGRAPHY

[62] inc. salesforce.com. Crm - the enterprise cloud computing company - sales-force.com europe. http://www.salesforce.com, 2011. [Online; ac-cessed 12-October-2011].

[63] Salvatore Sanfilippo and Pieter Noordhuis. Redis. http://redis.io/,2011. [Online; accessed 12-Semptember-2011].

[64] Maxim Schnjakin, Rehab Alnemr, and Christoph Meinel. Contract-basedcloud architecture. In Proceedings of the second international workshop onCloud data management, CloudDB ’10, pages 33–40, New York, NY, USA,2010. ACM.

[65] S&P Softwaredesign. ulimit and sysctl. http://www.linuxhowtos.org/Tips\%20and\%20Tricks/ulimit.htm, 2011. [Online; accessed12-November-2011].

[66] GitHub team. Git reference. http://gitref.org/remotes/#push,2011. [Online; accessed 12-December-2011].

[67] Twitter. Twitter. https://twitter.com/, 2011. [Online; accessed 5-December-2011].

[68] Inc. VMware. Cloud foundry - make it yours! http://cloudfoundry.org/, 2011. [Online; accessed 13-October-2011].

[69] Inc. VMware. Rabbitmq - messaging that just works. http://www.rabbitmq.com/, 2011. [Online; accessed 12-October-2011].

[70] Jens-Sonke Vockler, Gideon Juve, Ewa Deelman, Mats Rynge, and Bruce Ber-riman. Experiences using cloud computing for a scientific workflow appli-cation. In Proceedings of the 2nd international workshop on Scientific cloudcomputing, ScienceCloud ’11, pages 15–24, New York, NY, USA, 2011. ACM.

[71] Robin Wauters. Salesforce.com buys heroku for $212 mil-lion in cash. http://techcrunch.com/2010/12/08/breaking-salesforce-buys-heroku-for-212-million-in-cash/,2010. [Online; accessed 5-December-2011].

[72] [email protected]. chroot(1) - linux man page. http://linux.die.net/man/1/chroot, 2011. [Online; accessed 5-November-2011].

59

BIBLIOGRAPHY

[73] Wikipedia. Cloud computing — wikipedia, the free encyclope-dia. http://en.wikipedia.org/w/index.php?title=Cloud_computing&oldid=467862539, 2011. [Online; accessed 28-November-2011].

[74] Wikipedia. Document-oriented database — wikipedia, the free ency-clopedia. http://en.wikipedia.org/wiki/Document-oriented\_database, 2011. [Online; accessed 5-November-2011].

[75] Wikipedia. Dot-com bubble — wikipedia, the free encyclopedia.http://en.wikipedia.org/w/index.php?title=Dot-com_bubble&oldid=468041713, 2011. [Online; accessed 28-November-2011].

[76] Wikipedia. Full virtualization — wikipedia, the free encyclopedia. http://en.wikipedia.org/wiki/Full\_virtualization, 2011. [Online;accessed 27-October-2011].

[77] Wikipedia. Operating system-level virtualization — wikipedia, thefree encyclopedia. http://en.wikipedia.org/wiki/Operating\_system-level\_virtualization, 2011. [Online; accessed 12-October-2011].

[78] Wikipedia. Sun microsystems — wikipedia, the free encyclopedia. http://en.wikipedia.org/wiki/Sun\_Microsystems, 2011. [Online; ac-cessed 12-December-2011].

[79] Wikipedia. World wide web — wikipedia, the free encyclopedia.http://en.wikipedia.org/w/index.php?title=World_Wide_Web&oldid=467071500, 2011. [Online; accessed 28-December-2011].

[80] Ecole Polytechnique Federale de Lausanne (EPFL). The scala programminglanguage. http://www.scala-lang.org/, 2011. [Online; accessed 13-November-2011].

60