MongoDB Cookbook

737

Transcript of MongoDB Cookbook

MongoDBCookbook

TableofContents

MongoDBCookbook

Credits

AbouttheAuthor

AbouttheReviewers

www.PacktPub.com

Supportfiles,eBooks,discountoffers,andmore

WhySubscribe?

FreeAccessforPacktaccountholders

Preface

Whatthisbookcovers

Whatyouneedforthisbook

Whothisbookisfor

Sections

Gettingready

Howtodoit…

Howitworks…

There’smore…

Seealso

Conventions

Readerfeedback

Customersupport

Downloadingtheexamplecode

Errata

Piracy

Questions

1.InstallingandStartingtheMongoDBServer

Introduction

SinglenodeinstallationofMongoDB

Gettingready

Howtodoit…

There’smore…

Seealso

Startingasinglenodeinstanceusingcommand-lineoptions

Gettingready

Howtodoit…

Howitworks…

There’smore…

Seealso

SinglenodeinstallationofMongoDBwithoptionsfromtheconfigfile

Gettingready

Howtodoit…

Howitworks…

ConnectingtoasinglenodefromtheMongoshellwithapreloadedJavaScript

Gettingready

Howtodoit…

Howitworks…

There’smore…

ConnectingtoasinglenodefromaJavaclient

Gettingready

Howtodoit…

Howitworks…

Startingmultipleinstancesaspartofareplicaset

Gettingready

Howtodoit…

Howitworks…

There’smore…

Seealso

Connectingtothereplicasetfromtheshelltoqueryandinsertdata

Gettingready

Howtodoit…

Howitworks…

Seealso

ConnectingtothereplicasettoqueryandinsertdatafromaJavaclient

Gettingready

Howtodoit…

Howitworks…

Startingasimpleshardedenvironmentoftwoshards

Gettingready

Howtodoit…

Howitworks

There’smore…

ConnectingtoashardfromtheMongoshellandperformingoperations

Gettingready

Howtodoit…

Howitworks…

There’smore…

2.Command-lineOperationsandIndexes

Creatingtestdata

Gettingready

Howtodoit…

Howitworks…

Seealso

Performingsimplequerying,projections,andpaginationfromtheMongoshell

Gettingready

Howtodoit…

Howitworks…

Updatinganddeletingdatafromtheshell

Gettingready

Howtodoit…

Howitworks…

Creatinganindexandviewingplansofqueries

Gettingready

Howtodoit…

Howitworks…

Analyzingtheplan

Improvingthequeryexecutiontime

Improvementusingindexes

Improvementusingcoveredindexes

Somegotchasofindexcreation

Backgroundandforegroundindexcreationfromtheshell

Gettingready

Howtodoit…

Howitworks…

Creatinguniqueindexesoncollectionanddeletingtheexistingduplicatedataautomatically

Gettingready

Howtodoit…

Howitworks…

Creatingandunderstandingsparseindexes

Gettingready

Howtodoit…

Howitworks…

ExpiringdocumentsafterafixedintervalusingtheTTLindex

Gettingready

Howtodoit…

Howitworks…

There’smore…

ExpiringdocumentsatagiventimeusingtheTTLindex

Gettingready

Howtodoit…

Howitworks…

There’smore…

3.ProgrammingLanguageDrivers

Introduction

InstallingPyMongo

Gettingready

Howtodoit…

There’smore…

ExecutingqueryandinsertoperationsusingPyMongo

Gettingready

Howtodoit…

Howitworks…

Seealso

ExecutingupdateanddeleteoperationsusingPyMongo

Gettingready

Howtodoit…

Howitworks…

AggregationinMongousingPyMongo

Gettingready

Howtodoit…

Howitworks…

MapReduceinMongousingPyMongo

Gettingready

Howtodoit…

Howitworks…

Seealso

ExecutingqueryandinsertoperationsusingaJavaclient

Gettingready

Howtodoit…

Howitworks…

ExecutingupdateanddeleteoperationsusingaJavaclient

Gettingready

Howtodoit…

Howitworks…

Seealso

AggregationinMongousingaJavaclient

Gettingready

Howtodoit…

Howitworks…

MapReduceinMongousingaJavaclient

Gettingready

Howtodoit…

Howitworks…

Seealso

4.Administration

Renamingacollection

Gettingready

Howtodoit…

Howitworks…

Viewingcollectionstats

Gettingready

Howtodoit…

Howitworks…

Seealso

Viewingdatabasestats

Gettingready

Howtodoit…

Howitworks…

Seealso

Disablingthepreallocationofdatafiles

Howtodoit…

Manuallypaddingadocument

Gettingready

Howtodoit…

Howitworks…

Understandingthemongostatandmongotoputilities

Gettingready

Howtodoit…

Howitworks…

Seealso

Estimatingtheworkingset

Gettingready

Howtodoit…

Howitworks…

Viewingandkillingthecurrentlyexecutingoperations

Gettingready

Howtodoit…

Howitworks…

Usingprofilertoprofileoperations

Gettingready

Howtodoit…

Howitworks…

SettingupusersinMongoDB

Gettingready

Howtodoit…

Howitworks…

There’smore…

Seealso

UnderstandinginterprocesssecurityinMongoDB

Gettingready

Howtodoit…

There’smore…

ModifyingcollectionbehaviorusingthecollModcommand

Gettingready

Howtodoit…

Howitworks…

SettingupMongoDBasaWindowsService

Gettingready

Howtodoit…

Configuringareplicaset

Gettingready

Electionsinareplicaset

Basicconfigurationforareplicaset

Howtodoit…

Howitworks…

Areplicasetmemberasanarbiter

Priorityofreplicasetmembers

Hidden,votes,slavedelayed,andbuildindexconfigurations

There’smore…

Steppingdownasaprimaryinstancefromthereplicaset

Gettingready

Howtodoit…

Howitworks…

Exploringthelocaldatabaseofareplicaset

Gettingready

Howtodoit…

Howitworks…

Seealso

Understandingandanalyzingoplogs

Gettingready

Howtodoit…

Howitworks…

Buildingtaggedreplicasets

Gettingready

Howtodoit…

Howitworks…

WriteConcernintaggedreplicasets

ReadPreferenceintaggedreplicasets

Configuringthedefaultshardfornonshardedcollections

Gettingready

Howtodoit…

Howitworks…

Manuallysplittingandmigratingchunks

Gettingready

Howtodoit…

Howitworks…

Performingdomain-drivenshardingusingtags

Gettingready

Howtodoit…

Howitworks…

Exploringtheconfigdatabaseinashardedsetup

Gettingready

Howtodoit…

Howitworks…

5.AdvancedOperations

Introduction

Atomicfindandmodifyoperations

Gettingready

Howtodoit…

Howitworks…

Seealso

ImplementingatomiccountersinMongoDB

Gettingready

Howtodoit…

Howitworks…

Seealso

Implementingserver-sidescripts

Gettingready

Howtodoit…

Howitworks…

CreatingandtailingcappedcollectioncursorsinMongoDB

Gettingready

Howtodoit…

Howitworks…

Seealso

Convertinganormalcollectiontoacappedcollection

Gettingready

Howtodoit…

Howitworks…

There’smore…

StoringbinarydatainMongoDB

Gettingready

Howtodoit…

Howitworks…

StoringlargedatainMongoDBusingGridFS

Gettingready

Howtodoit…

Howitworks…

There’smore…

Seealso

StoringdatatoGridFSfromaJavaclient

Gettingready

Howtodoit…

Howitworks…

Seealso

StoringdatatoGridFSfromaPythonclient

Gettingready

Howtodoit…

Howitworks…

Seealso

ImplementingtriggersinMongoDBusingoplog

Gettingready

Howtodoit…

Howitworks…

Executingflatplane(2D)geospatialqueriesinMongousinggeospatialindexes

Gettingready

Howtodoit…

Howitworks…

SphericalindexesandGeoJSON-compliantdatainMongoDB

Gettingready

Howtodoit…

Howitworks…

Implementingafull-textsearchinMongoDB

Gettingready

Howtodoit…

Howitworks…

There’smore…

Seealso

IntegratingMongoDBwithElasticsearchforafull-textsearch

Gettingready

Howtodoit…

Howitworks…

There’smore…

Seealso

6.MonitoringandBackups

Introduction

SigningupforMMSandsettinguptheMMSmonitoringagent

Gettingready

Howtodoit…

Howitworks…

There’smore…

ManagingusersandgroupsontheMMSconsole

Gettingready

Howtodoit…

Howitworks…

MonitoringMongoDBinstancesonMMS

Gettingready

Howtodoit…

Howitworks…

There’smore…

Seealso

SettingupmonitoringalertsonMMS

Gettingready

Howtodoit…

Howitworks…

Seealso

BackingupandrestoringdatainMongousingout-of-theboxtools

Gettingready

Howtodoit…

Howitworks…

ConfiguringtheMMSbackupservice

Gettingready

Howtodoit…

Howitworks…

ManagingbackupsintheMMSbackupservice

Gettingready

Howtodoit…

Howitworks…

Seealso

7.CloudDeploymentonMongoDB

Introduction

SettingupandmanagingtheMongoLabaccount

Howtodoit…

Howitworks…

SettingupasandboxMongoDBinstanceonMongoLab

Gettingready

Howtodoit…

Howitworks…

PerformingoperationsonMongoDBfromMongoLabGUI

Gettingready

Howtodoit…

Howitworks…

SettingupMongoDBonAmazonEC2usingtheMongoDBAMI

Gettingready

Howtodoit…

Howitworks…

SettingupMongoDBonAmazonEC2withoutusingtheMongoDBAMI

Gettingready

Howtodoit…

Howitworks…

Seealso

8.IntegrationwithHadoop

Introduction

ExecutingourfirstsampleMapReducejobusingthemongo-hadoopconnector

Gettingready

Howtodoit…

Howitworks…

There’smore…

Seealso

WritingourfirstHadoopMapReducejob

Gettingready

Howtodoit…

Howitworks…

There’smore…

RunningMapReducejobsonHadoopusingstreaming

Gettingready

Howitworks…

Howtodoit…

RunningaMapReducejobonAmazonEMR

Gettingready

Howtodoit…

Howitworks…

Seealso

9.OpenSourceandProprietaryTools

Introduction

Developingusingspring-data-mongodb

Gettingready

Howtodoit…

Howitworks…

Seealso

AccessingMongoDBusingJavaPersistenceAPI

Gettingready

Howtodoit…

Howitworks…

Seealso

AccessingMongoDBoverREST

Gettingready

Howtodoit…

Howitworks…

Seealso

InstallingtheGUI-basedclient,MongoVUE,forMongoDB

Gettingready

Howtodoit…

Howitworks…

Seealso

A.ConceptsforReference

Writeconcernanditssignificance

Settingupareplicaset

Readpreferenceforquerying

Knowingtheinternals

Index

MongoDBCookbook

MongoDBCookbookCopyright©2014PacktPublishing

Allrightsreserved.Nopartofthisbookmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,withoutthepriorwrittenpermissionofthepublisher,exceptinthecaseofbriefquotationsembeddedincriticalarticlesorreviews.

Everyefforthasbeenmadeinthepreparationofthisbooktoensuretheaccuracyoftheinformationpresented.However,theinformationcontainedinthisbookissoldwithoutwarranty,eitherexpressorimplied.Neithertheauthor,norPacktPublishing,anditsdealersanddistributorswillbeheldliableforanydamagescausedorallegedtobecauseddirectlyorindirectlybythisbook.

PacktPublishinghasendeavoredtoprovidetrademarkinformationaboutallofthecompaniesandproductsmentionedinthisbookbytheappropriateuseofcapitals.However,PacktPublishingcannotguaranteetheaccuracyofthisinformation.

Firstpublished:November2014

Productionreference:1221114

PublishedbyPacktPublishingLtd.

LiveryPlace

35LiveryStreet

BirminghamB32PB,UK.

ISBN978-1-78216-194-3

www.packtpub.com

CoverimagebyPratyushMohanta(<[email protected]>)

CreditsAuthor

AmolNayak

Reviewers

JanBorgelin

DougDuncan

LaurencePutra

LiranTal

KhaledTannir

AcquisitionEditor

NehaNagwekar

ContentDevelopmentEditor

PriyankaShah

TechnicalEditors

VeronicaFernandes

AnkitaThakur

CopyEditors

KarunaNarayanan

ShambhaviPai

ProjectCoordinators

MaryAlex

NehaThakur

Proofreaders

StephenCopestake

PaulHindle

KellyHutchinson

ClydeJenkins

Indexers

MariammalChettiyar

MonicaAjmeraMehta

RekhaNair

Graphics

SheetalAute

AbhinashSahu

ProductionCoordinators

KyleAlbuquerque

ConidonMiranda

NiteshThakur

CoverWork

KyleAlbuquerque

AbouttheAuthorAmolNayakisacertifiedMongoDBdeveloperandhasbeenworkingasadeveloperforover8years.Heiscurrentlyemployedwithaleadingfinancialdataprovider,workingoncutting-edgetechnologies.HehasusedMongoDBasadatabaseforvarioussystemsathiscurrentandpreviousworkplacestosupportenormousdatavolumes.Heisanopensourceenthusiastandsupportsitbycontributingtoopensourceframeworksandpromotingthem.HehasmadecontributionstotheSpringIntegrationproject,andhiscontributionsareadaptersforJPA,XQuery,andMongoDBandpushnotificationsformobiledevicesandAmazonWebServices(AWS).HealsohassomecontributionstotheSpringDataMongoDBproject.Apartfromtechnology,heispassionateaboutmotorsportsandisaraceofficialatBuddhInternationalCircuit,India,forvariousmotor-sportsevents.Earlier,hewastheauthorofInstantMongoDB,PacktPublishing.

IwouldliketothankeveryoneatPacktPublishingwhohavebeeninvolvedwiththisbook.ItstartedwhenLukePreslandfromPacktPublishingapproachedmetoauthorabookonMongoDB.Iwasskepticaltotakeuptheopportunityduetoothercommitmentsandtightdeadlines,butifitwasn’tformymom,friends,andofficecolleagues,whoconvincedmetotakeuptheopportunity,Iwouldnothavewrittenthisbook.Thechaptersandcontenttobecoveredwasalot,andIwashavingatoughtimekeepingupwiththetimelines.AspecialthankstoPriyankaShah,RebeccaPedley,MaryAlex,andJoelGoveya,withwhomIinteractedthemost;theywereveryflexibletomychangesindeliverytimelines.AbigthankstoDougDuncanandotherreviewersofthebookforreviewingthebookcloselyandhelpingimprovethequalityofthecontentdrastically.Finally,IwouldliketothanktheotherstaffatPacktPublishingwhowereinvolvedinthebook’spublishingprocessbuthaven’tinteractedwithme.

AbouttheReviewersJanBorgelinisatechnicalgeekwithover15yearsofprofessionalsoftware-developmentexperience.HeiscurrentlytheCTOofBAGroupLtd.,aconsultancybasedinFinland.BAGroupwasoneoftheearlyadoptersofMongoDBandthefirstofficialMongoDBpartnerinScandinavia.

DougDuncanhasbeenworkingwithRDBMSesforthepast15yearsandhasstartedshiftinggearstowardsthenewerdatastoressincethepast3years.HehasfocusedmainlyonMongoDBsincehecameacrossthe0.8release.InadditiontohisdayjobasaMongoDBdatabaseadministrator,heworksasanonlineteachingassistantfortheMongoDBeducationteamforseveraloftheironlinecourses(https://university.mongodb.com/),wherehehelpsstudentsunderstandhowMongoDBworks.Whennotworking,helikestoreadaboutnewtechnologiesandtrytofigureouthowtheycanintegrateandworkinconjunctionwiththemoreestablishedsystemsalreadyinplace.

LaurencePutraisasoftwareengineerworkinginSingaporeandrunstheSingaporeMongoDBUserGroup.Inhisfreetime,hehacksawayonrandomstuffandpicksupnewtechnologies.Hiskeyinterestslieinsecurityanddistributedsystems.Formoreinformation,viewhisprofileatgeeksphere.net.

LiranTalisacertifiedMongoDBdeveloperandtopcontributortotheopensourceMEAN.IOandMEAN.JSfull-stackJavaScriptframeworks.Beinganavidsupporterofandcontributortotheopensourcemovement,in2007,heredefinednetworkRADIUSmanagementbyestablishingdaloRADIUS,aworld-recognizedandindustry-leadingopensourceproject.

LiraniscurrentlyworkingatHPSoftwareasanR&DteamleaderonacombinedtechnologystackfeaturingaDrupal-basedcollaborationplatform,Java,Node.js,andMongoDB.

AtHPLiveNetwork,Liranplaysakeyroleinsystem-architecturedesign,shapingthetechnologystrategyfromplanninganddevelopmenttodeploymentandmaintenanceinHP’sIaaScloud.Actingasthetechnologicalfocalpoint,helovesmentoringteammates,drivingforbettercodemethodology,andseekingoutinnovativesolutionstosupportbusinessstrategies.

Hehasacumlaude(honors)inhisBachelor’sdegreeinBusinessandInformationSystemsAnalysisstudiesandenjoysspendinghistimewithhisbelovedwife,Tal,andhisnewbornson,Ori.Amongotherthings,hishobbiesincludeplayingtheguitar,hackingallthingsonLinux,andcontinuouslyexperimentingwithandcontributingtoopensourceprojects.

KhaledTannirisavisionarysolutionarchitectwithmorethan20yearsoftechnicalexperience,focusingonBigDatatechnologiesanddataminingsince2010.

HeiswidelyrecognizedasanexpertinthesefieldsandhasaMasterofResearchdegree

inBigDataandCloudComputingandaMaster’sdegreeinSystemInformationArchitectureswithinitiallyaBachelorofTechnologydegreeinElectronics.

KhaledisaMicrosoftCertifiedSolutionsDeveloper(MCSD)andanavidtechnologist.HeworkedformanycompaniesinFrance(andrecentlyinCanada),leadingthedevelopmentandimplementationofsoftwaresolutionsandgivingtechnicalpresentations.

HeistheauthorofRavenDB2.xBeginner’sGuideandOptimizingHadoopforMapReduceandisthetechnicalreviewerforPentahoAnalyticsforMongoDBandMongoDBHighAvailability,allavailableatPacktPublishing.

Heenjoystakinglandscapeandnightphotos;traveling;playingvideogames;creatingfunnyelectronicgadgetswithArduino,RaspberryPI,and.NETGadgeteer;andofcourse,spendingtimewithhiswifeandfamily.

Youcanreachhimat<[email protected]>.

www.PacktPub.com

Supportfiles,eBooks,discountoffers,andmoreForsupportfilesanddownloadsrelatedtoyourbook,pleasevisitwww.PacktPub.com.

DidyouknowthatPacktofferseBookversionsofeverybookpublished,withPDFandePubfilesavailable?YoucanupgradetotheeBookversionatwww.PacktPub.comandasaprintbookcustomer,youareentitledtoadiscountontheeBookcopy.Getintouchwithusat<[email protected]>formoredetails.

Atwww.PacktPub.com,youcanalsoreadacollectionoffreetechnicalarticles,signupforarangeoffreenewslettersandreceiveexclusivediscountsandoffersonPacktbooksandeBooks.

https://www2.packtpub.com/books/subscription/packtlib

DoyouneedinstantsolutionstoyourITquestions?PacktLibisPackt’sonlinedigitalbooklibrary.Here,youcansearch,access,andreadPackt’sentirelibraryofbooks.

WhySubscribe?FullysearchableacrosseverybookpublishedbyPacktCopyandpaste,print,andbookmarkcontentOndemandandaccessibleviaawebbrowser

FreeAccessforPacktaccountholdersIfyouhaveanaccountwithPacktatwww.PacktPub.com,youcanusethistoaccessPacktLibtodayandview9entirelyfreebooks.Simplyuseyourlogincredentialsforimmediateaccess.

PrefaceMongoDBisadocument-oriented,leadingNoSQLdatabase,whichofferslinearscalability,thusmakingitagoodcontenderforhigh-volume,high-performancesystemsacrossallbusinessdomains.IthasanedgeoverthemajorityofNoSQLsolutionsforitseaseofuse,highperformance,andrichfeatures.

ThisbookprovidesdetailedrecipesthatdescribehowtousethedifferentfeaturesofMongoDB.TherecipescovertopicsrangingfromsettingupMongoDB,knowingitsprogramming-languageAPI,monitoringandadministration,tosomeadvancedtopicssuchasclouddeployment,integrationwithHadoop,andsomeopensourceandproprietarytoolsforMongoDB.Therecipeformatpresentstheinformationinaconcise,actionableform;thisletsyourefertotherecipetoaddressandknowthedetailsofjusttheusecaseinhand,withoutgoingthroughtheentirebook.

WhatthisbookcoversChapter1,InstallingandStartingtheMongoDBServer,isallaboutstartingMongoDB.Itwilldemonstratehowtostarttheserverinthestandalonemode,asareplicaset,andasashard,withtheprovidedstart-upoptionsfromthecommandlineorconfigfile.

Chapter2,Command-lineOperationsandIndexes,hassimplerecipestoperformCRUDoperationsfromtheMongoshellandcreatevarioustypesofindexesfromtheshell.

Chapter3,ProgrammingLanguageDrivers,isaboutprogramminglanguageAPIs.ThoughMongosupportsavastarrayoflanguages,wewilllookathowtousethedriverstoconnecttotheMongoDBserverfromJavaandPythonprogramsonly.ThischapteralsoexplorestheMongoDBwireprotocolusedforcommunicationbetweentheserverandtheprogramming-languageclients.

Chapter4,Administration,containsmanyrecipesaroundadministrationoryourMongoDBdeployment.Thischaptercoversalotoffrequentlyusedadministrativetaskssuchasviewingthestatsofthecollectionsanddatabase,viewingandkillinglong-runningoperationsandotherreplica,andsharding-relatedadministration.

Chapter5,AdvancedOperations,isanextensionofChapter2,Command-lineOperationsandIndexes.Wewilllookatsomeoftheslightlyadvancedfeaturessuchasimplementingserver-sidescripts,geospatialsearch,GridFS,full-textsearch,andhowtointegrateMongoDBwithanexternalfull-textsearchengine.

Chapter6,MonitoringandBackups,isallaboutadministrationandsomebasicmonitoring.However,MongoDBprovidesastate-of-the-artmonitoringandreal-timebackupservice,MongoDBMonitoringService(MMS).Inthischapter,wewilllookatsomerecipesaroundmonitoringandbackupusingMMS.

Chapter7,CloudDeploymentonMongoDB,coversrecipesthatuseMongoDBserviceprovidersforclouddeployment,andwewillsetupourownMongoDBserverontheAWScloud.

Chapter8,IntegrationwithHadoop,coversrecipestointegrateMongoDBwithHadooptousetheHadoopMapReduceAPItorunMapReducejobsondataresidinginMongoDB/MongoDBdatafilesandwritetheresultsbacktothem.WewillalsoseehowtouseAWSEMRtorunourMapReducejobsonthecloudusingAmazon’smanagedHadoopcluster,EMRwiththemongo-hadoopconnector.

Chapter9,OpenSourceandProprietaryTools,isaboutusingframeworksandproductsbuiltaroundMongoDBtoimproveadeveloper’sproductivityoraboutmakingsomeoftheday-to-dayjobsinusingMongoeasy.Unlessexplicitlymentioned,theproducts/frameworkswewillbelookingatinthischapterareopensource.

Appendix,ConceptsforReference,givesyouabitofadditionalinformationonwriteconcernandreadpreferenceforreference.

WhatyouneedforthisbookTheversionofMongoDBusedtotryouttherecipesis2.4.6.Therecipesholdgoodforversion2.6.xaswell.Incaseofsomespecialfeaturespecifictoversion2.6.x,itwouldbeexplicitlymentionedintherecipe.

ThesampleswhereJavaprogrammingwasinvolvedweretestedandrunonJavaVersion1.7.40.PythonVersion2.7isusedwhereverPythonisused.ForMongoDBdrivers,youmaychoosetousethelatestavailableversion.

Theseareprettycommontypesofsoftware,andtheirminimumversionsareusedacrossdifferentrecipes.Alltherecipesinthisbookwillmentiontherequiredsoftwaretocompleteitandtheirrespectiveversion.SomerecipesneedtobetestedonWindowssystemwhilesomeonLinux.

WhothisbookisforThisbookisdesignedforadministratorsanddeveloperswhoareinterestedinknowingMongoDBandusingitasahigh-performanceandscalabledatastorage.ItisalsoforthosewhoknowthebasicsofMongoDBandwouldliketoexpandtheirknowledgefurther.TheaudienceofthisbookisexpectedtoatleasthavesomebasicknowledgeofMongoDB.

SectionsInthisbook,youwillfindseveralheadingsthatappearfrequently(Gettingready,Howtodoit,Howitworks,There’smore,andSeealso).

Togiveclearinstructionsonhowtocompletearecipe,weusethesesectionsasfollows:

GettingreadyThissectiontellsyouwhattoexpectintherecipe,anddescribeshowtosetupanysoftwareoranypreliminarysettingsrequiredfortherecipe.

Howtodoit…Thissectioncontainsthestepsrequiredtofollowtherecipe.

Howitworks…Thissectionusuallyconsistsofadetailedexplanationofwhathappenedintheprevioussection.

There’smore…Thissectionconsistsofadditionalinformationabouttherecipeinordertomakethereadermoreknowledgeableabouttherecipe.

SeealsoThissectionprovideshelpfullinkstootherusefulinformationfortherecipe.

ConventionsInthisbook,youwillfindanumberoftextstylesthatdistinguishbetweendifferentkindsofinformation.Herearesomeexamplesofthesestylesandanexplanationoftheirmeaning.

Codewordsintext,databasetablenames,foldernames,filenames,fileextensions,pathnames,dummyURLs,userinput,andTwitterhandlesareshownasfollows:“Createthe/data/mongo/dbdirectory(oranyofyourchoice).”

Ablockofcodeissetasfollows:

importcom.mongodb.DB;

importcom.mongodb.DBCollection;

importcom.mongodb.DBObject;

importcom.mongodb.MongoClient;

Anycommand-lineinputoroutputiswrittenasfollows:

$sudoapt-getinstalldefault-jdk

Newtermsandimportantwordsareshowninbold.Wordsthatyouseeonthescreen,forexample,inmenusordialogboxes,appearinthetextlikethis:“Withouteditinganydefaultsettings,clickonLaunch.”

NoteWarningsorimportantnotesappearinaboxlikethis.

TipTipsandtricksappearlikethis.

ReaderfeedbackFeedbackfromourreadersisalwayswelcome.Letusknowwhatyouthinkaboutthisbook—whatyoulikedordisliked.Readerfeedbackisimportantforusasithelpsusdeveloptitlesthatyouwillreallygetthemostoutof.

Tosendusgeneralfeedback,simplye-mail<[email protected]>,andmentionthebook’stitleinthesubjectofyourmessage.

Ifthereisatopicthatyouhaveexpertiseinandyouareinterestedineitherwritingorcontributingtoabook,seeourauthorguideatwww.packtpub.com/authors.

CustomersupportNowthatyouaretheproudownerofaPacktbook,wehaveanumberofthingstohelpyoutogetthemostfromyourpurchase.

DownloadingtheexamplecodeYoucandownloadtheexamplecodefilesfromyouraccountathttp://www.packtpub.comforallthePacktPublishingbooksyouhavepurchased.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.

ErrataAlthoughwehavetakeneverycaretoensuretheaccuracyofourcontent,mistakesdohappen.Ifyoufindamistakeinoneofourbooks—maybeamistakeinthetextorthecode—wewouldbegratefulifyoucouldreportthistous.Bydoingso,youcansaveotherreadersfromfrustrationandhelpusimprovesubsequentversionsofthisbook.Ifyoufindanyerrata,pleasereportthembyvisitinghttp://www.packtpub.com/submit-errata,selectingyourbook,clickingontheErrataSubmissionFormlink,andenteringthedetailsofyourerrata.Onceyourerrataareverified,yoursubmissionwillbeacceptedandtheerratawillbeuploadedtoourwebsiteoraddedtoanylistofexistingerrataundertheErratasectionofthattitle.

Toviewthepreviouslysubmittederrata,gotohttps://www.packtpub.com/books/content/supportandenterthenameofthebookinthesearchfield.TherequiredinformationwillappearundertheErratasection.

PiracyPiracyofcopyrightedmaterialontheInternetisanongoingproblemacrossallmedia.AtPackt,wetaketheprotectionofourcopyrightandlicensesveryseriously.IfyoucomeacrossanyillegalcopiesofourworksinanyformontheInternet,pleaseprovideuswiththelocationaddressorwebsitenameimmediatelysothatwecanpursuearemedy.

Pleasecontactusat<[email protected]>withalinktothesuspectedpiratedmaterial.

Weappreciateyourhelpinprotectingourauthorsandourabilitytobringyouvaluablecontent.

QuestionsIfyouhaveaproblemwithanyaspectofthisbook,youcancontactusat<[email protected]>,andwewilldoourbesttoaddresstheproblem.

Chapter1.InstallingandStartingtheMongoDBServerInthischapter,wewillcoverthefollowingrecipes:

SinglenodeinstallationofMongoDBStartingasinglenodeinstanceusingcommand-lineoptionsSinglenodeinstallationofMongoDBwithoptionsfromtheconfigfileConnectingtoasinglenodefromtheMongoshellwithapreloadedJavaScriptConnectingtoasinglenodefromaJavaclientStartingmultipleinstancesaspartofareplicasetConnectingtothereplicasetfromtheshelltoqueryandinsertdataConnectingtothereplicasettoqueryandinsertdatafromaJavaclientStartingasimpleshardedenvironmentoftwoshardsConnectingtoashardfromtheMongoshellandperformingoperations

IntroductionInthischapter,wewilllookatstartinguptheMongoDBserver.Thoughitisacakewalktostarttheserverfordevelopmentpurposesandwiththedefaultsettings,therearenumerousoptionsthatletustunethestartupbehavior.Wewillstarttheserverasasinglenode;then,we’llintroducevariousconfigurationsbeforeweconcludebystartingupasimplereplicasetandashardedsetup.So,let’sgetstartedbyinstallingandsettinguptheMongoDBserverintheeasiestwaypossible,forsimpledevelopmentpurposes.

SinglenodeinstallationofMongoDBInthisrecipe,wewilllookattheprocessofinstallingMongoDBinthestandalonemode.ThisisthesimplestandquickestwaytostartaMongoDBserverbutisseldomusedforproductionusecases.However,thisisthemostcommonwaytostarttheserverforthepurposeofdevelopment.Inthisrecipe,wewillstarttheserverwithoutlookingatalotofotherstartupoptions.

GettingreadyWell,assumingthatwehavedownloadedtheMongoDBbinariesfromthedownloadsite,extractedthem,andhavethebindirectoryofMongoDBintheoperatingsystem’spathvariable(thisisnotmandatorybutitreallybecomesconvenient),thebinariescanbedownloadedfromhttp://www.mongodb.org/downloadsafterselectingyourhostoperatingsystem.

Howtodoit…PerformthefollowingstepstostartwiththesinglenodeinstallationofMongoDB:

1. Createthe/data/mongo/dbdirectory(oranyofyourchoice).Thiswillbeourdatabasedirectory,anditneedstohavepermissiontoletthemongodprocess(themongoserverprocess)writetoit.

2. Wewillstarttheserverfromtheconsolewiththe/data/mongo/dbdatadirectoryasfollows:

$mongod--dbpath/data/mongo/db

There’smore…Ifyouseethefollowingmessageontheconsole,youhavesuccessfullystartedtheserver:

[initandlisten]waitingforconnectionsonport27017

Startingaservercan’tgeteasierthanthis.Despitethesimplicityinstartingtheserver,therearealotofconfigurationoptionsthatwillbeusedtotunethebehavioroftheserveronstartup.Mostofthedefaultoptionsaresensibleandneednotbechanged.Withthedefaultvalues,theservershouldbelisteningtoport27017fornewconnections,andthelogswillbeprintedouttothestandardoutput.

SeealsoTheStartingasinglenodeinstanceusingcommand-lineoptionsrecipeformorestartupoptions

Startingasinglenodeinstanceusingcommand-lineoptionsInthisrecipe,wewillseehowtostartastandalonesingleNodeserverwithsomecommand-lineoptions.Wewillseeanexamplewherewewillperformthefollowingtasks:

Startingtheserverthatlistenstoport27000Writinglogsto/logs/mongo.logSettingthedatabasedirectoryto/data/mongo/db

Sincetheserverisstartedfordevelopmentpurposes,wedon’twanttopreallocatefullsizedatabasefiles(wewillsoonseewhatthismeans).

GettingreadyIfyouhavealreadyseenandexecutedthestepsmentionedintheSinglenodeinstallationofMongoDBrecipe,youneednotdoanythingdifferent.Ifalltheprerequisitesaremet,wearegoodforthisrecipetoo.

Howtodoit…Youcanstartasinglenodeinstanceusingcommand-lineoptionswiththefollowingsteps:

1. The/data/mongo/dbdirectoryforthedatabaseand/logs/forthelogsshouldbecreatedandpresentonyourfilesystemwithappropriatewritepermissions.

2. Executethefollowingcommand:

>mongod--port27000--dbpath/data/mongo/db--logpath/logs/mongo.log

--smallfiles

TipDownloadingtheexamplecode

YoucandownloadtheexamplecodefilesforallPacktbooksyouhavepurchasedfromyouraccountathttp://www.packtpub.com.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.

Howitworks…OK,thiswasn’ttoodifficultandissimilartothepreviousrecipe,butwehavesomeadditionalcommand-lineoptionsthistimearound.MongoDBactuallysupportsquiteafewoptionsatstartup,andwewillseealistoftheonesthataremostcommonandimportantinmyopinion:

Option Description

--helpor-h

Thisisusedtoprinttheinformationofvariousstartupoptionsavailable.

--config

or-f

Thisspecifiesthelocationoftheconfigurationfilethatcontainsalltheconfigurationoptions.WewilllearnmoreaboutthisoptionintheSinglenodeinstallationofMongoDBwithoptionsfromtheconfigfilerecipe.Itisjustaconvenientwayofspecifyingtheconfigurationsinafileratherthaninacommandprompt,especiallywhenthenumberofoptionsspecifiedismore.Usingaseparateconfigurationfilesharedacrossdifferentmongodinstanceswillalsoensurethatalltheinstancesarerunningwithidenticalconfigurations.

--verbose

or-vThismakesthelogsmoreverbose.Wecanputmorev’stomaketheoutputevenmoreverbose,forexample,-vvvvv.

--quietThisisthequieteroutput.Thisistheoppositeofverboseorthe-voption.Itwillkeepthelogslesschattyandclean.

--port

Thisoptionisusedifyouarelookingtostarttheserverthatlistenstoaportotherthanthedefault27017.WewillfrequentlyusethisoptionwheneverwearelookingtostartmultipleMongoserversonthesamemachine;forexample,--port27018willstarttheserverthatlistenstoport27018fornewconnections.

--logpath

Thisprovidesapathtoalogfilewherethelogswillbewritten.ThevaluedefaultstoSTDOUT.Forexample,--logpath/logs/server.outwilluse/logs/server.outasthelogfilefortheserver.Rememberthatthevalueprovidedshouldbeafileandnotadirectorywherethelogswillbewritten.

--

logappend

Thisoptionwillappendtotheexistinglogfileifany.ThedefaultbehavioristorenametheexistinglogfileandthencreateanewfileforthelogsofthecurrentlystartedMongoinstance.Let’sassumethatweusedthenameofthelogfileasserver.outandonstartupthefileexists.Then,bydefault,thisfilewillberenamedasserver.out.<timestamp>,where<timestamp>isthecurrenttime.ThetimeisGMTasagainstthelocaltime.SupposethecurrentdateisOctober28,2013andthetimeis12:02:15,thenthefilegeneratedwillhavethe2013-10-28T12-02-15valueasthetimestamp.

--dbpath

Thisprovidesthedirectorywhereanewdatabasewillbecreatedoranexistingdatabaseispresent.Thevaluedefaultsto/data/db.Wewillstarttheserverusing/data/mongo/dbasthedatabasedirectory.Notethatthevalueshouldbeadirectoryratherthanthenameofthefile.

--

smallfiles

ThisisusedfrequentlyfordevelopmentpurposeswhenweplantostartmorethanoneMongoinstanceonourlocalmachine.Onstartup,Mongocreatesadatabasefileofsize64MB(on64-bitmachines).Thispreallocationhappensforperformancereasons,andthefileiscreatedwithzeroswrittentoittofilloutthespaceonthedisk.Addingthisoptiononstartupcreatesapreallocatedfileof16MBonly(againona64-bitmachine).Thisoptionalsoreducesthemaximumsizeofthedatabaseandjournalfiles.Avoidusingthisoptionforproductiondeployments.Also,thedatabasefilesizedoublestoamaximumof2GBbydefault.Ifthe--smallfileoptionischosen,itgoesuptoamaximumof512MB.

--replSet

Thisoptionisusedtostarttheserverasamemberofthereplicaset.Thevalueofthisargumentisthenameofthereplicaset,forexample,--replSetrepl1.Moreinformationonthisoptioniscoveredinthe

Startingmultipleinstancesaspartofareplicasetrecipe,wherewewillstartasimpleMongoreplicaset.

--

configsvr

Thisoptionisusedtostarttheserverasaconfigserver.TheroleoftheconfigserverwillbemadeclearerwhenwesetupasimpleshardedenvironmentintheStartingasimpleshardedenvironmentoftwoshardsrecipeinthischapter.This,however,willbestartedandlistentoport27019bydefaultandthe/data/configdbdatadirectory.Thesecan,ofcourse,beoverriddenusingthe--portand--dbpathoptions.

--shardsvr

Thisinformsthestartedmongodprocessthatthisserverisbeingstartedasashardserver.Bygivingthisoption,theserveralsolistenstoport27018insteadofthedefault27017.Wewilllearnmoreaboutthisoptionwhenwestartasimpleshardedserver.

--

oplogSize

Oplogisthebackboneofreplication.Itisacappedcollectionwherethedatabeingwrittentotheprimaryisstoredtobereplicatedtothesecondaryinstances.Thiscollectionresidesinadatabasenamedlocal.Oninitializationofareplicaset,thediskspacefortheoplogispreallocated,andthedatabasefile(forthelocaldatabase)isfilledwithzerosasplaceholders.Thedefaultvalueis5percentofthediskspace,whichshouldbegoodenoughinmostcases.Thesizeoftheoplogiscrucial,becausecappedcollectionsareofafixedsize,andtheydiscardtheoldestdocumentsinthemuponexceedingtheirsize-makingspacefornewdocuments;iftheoplogsizeistoosmall,itcanresultinthedatabeingdiscardedbeforebeingreplicatedtosecondarynodes.Alargeoplogsizecanresultinunnecessarydisk-spaceutilizationandalongertimeforthereplicasetinitialization.Fordevelopmentpurposes,whenwestartmultipleserverprocessesonthesamehost,wemightwanttokeeptheoplogsizetoaminimumvaluesothatitquicklyinitiatesthereplicasetandusestheminimumdiskspacepossible.

There’smore…Foranexhaustivelistoftheoptionsavailable,usethe--helpor-hoption.Theprecedinglistofoptionsisnotexhaustive,andwewillseesomemorecomingupintheupcomingrecipesasandwhenweneedthem.Inthenextrecipe,wewillseehowtouseaconfigfileinsteadofthecommand-linearguments.

SeealsoTheSinglenodeinstallationofMongoDBwithoptionsfromtheconfigfilerecipetouseconfigfilestoprovidestartupoptionsTostartareplicaset,refertotheStartingmultipleinstancesaspartofareplicasetrecipeTosetupashardedenvironment,refertotheStartingasimpleshardedenvironmentoftwoshardsrecipe

SinglenodeinstallationofMongoDBwithoptionsfromtheconfigfileAswecansee,providingoptionsfromthecommandlinedoesthework,butitstartsgettingawkwardassoonasthenumberofoptionsweprovideincreases.Wehaveaniceandcleanalternativetoprovidingthestartupoptionsfromaconfigurationfileratherthanascommand-linearguments.

GettingreadyIfyouhavealreadyseenandexecutedthestepsmentionedintheSinglenodeinstallationofMongoDBrecipe,youneednotdoanythingdifferent,andalltheprerequisitesofthisrecipearethesame.

Howtodoit…The/data/mongo/dbdirectoryforthedatabaseand/logs/forthelogsshouldbecreatedandpresentonyourfilesystem,withtheappropriatewritepermissions.Let’stakealookatthestepsindetail:

1. Createaconfigfilethatcanhaveanyarbitraryname.Inourcase,let’ssaywecreatethefileat/conf/mongo.conf.Wewilltheneditthefileandaddthefollowinglinesofcodetoit:

port=27000

dbpath=/data/mongo/db

logpath=/logs/mongo.log

smallfiles=true

2. StarttheMongoserverusingthefollowingcommand:

>mongod--config/conf/mongo.conf

Howitworks…Allthecommand-lineoptionswediscussedinthepreviousrecipe,Startingasinglenodeinstanceusingcommand-lineoptions,holdtrue.Wearejustprovidingtheseoptionsinaconfigurationfileinstead.Ifyouhavenotvisitedthepreviousrecipe,Irecommendthatyoudoso,asthisiswherewehavediscussedsomeofthecommoncommand-lineoptions.Thepropertiesarespecifiedas<propertyname>=<value>.Forallthosepropertiesthatdon’thavevalues,forexample,thesmallfilesoption,thevaluegivenisaBooleanvalue,true.Ifyouneedtohaveaverboseoutput,youwilladdv=true(ormultiplev’stomakeitmoreverbose)toourconfigfile.Ifyoualreadyknowwhatthecommand-lineoptionis,itisprettyeasytoguessthevalueofthepropertyinthefile.Itisthesimilartothecommand-lineoption,withjustthehyphenremoved.

ConnectingtoasinglenodefromtheMongoshellwithapreloadedJavaScriptThisrecipeisaboutstartingtheMongoshellandconnectingtoaMongoDBserver.Here,we’llalsodemonstratehowtoloadJavaScriptcodeintotheshell.Thoughthisisnotalwaysrequired,itishandywhenwehavealargeblockofJavaScriptcode,includingvariablesandfunctionswithsomebusinesslogicinthemthatisrequiredtobeexecutedfromtheshellfrequently,andwewantthesefunctionstobeavailableintheshellalways.

GettingreadyItisnotnecessaryfortheMongoDBservertoruntostartashell.WewillrarelystartashellwithoutconnectingittoarunningMongoDBserver.Tostartaserveronthelocalhostwithoutmuchofahassle,takealookatthefirstrecipe,SinglenodeinstallationofMongoDB,andstarttheserver.

Howtodoit…Let’stakealookatthestepsindetail:

1. First,wewillstartbycreatingasimpleJavaScriptfile;let’scallithello.js.Typeinthefollowinglinesinthehello.jsfile:

functionsayHello(name){

print('Hello'+name+',howareyou?')

}

2. Savethisfileat/mongo/scripts.(itcanbesavedatanyotherlocationtoo).3. Inthecommandprompt,executethefollowingcommand:

>mongo--shell/mongo/scripts/hello.js

4. Onexecutingthis,weshouldseethefollowingmessageonourconsole:

MongoDBshellversion:2.4.6

connectingto:test

>

5. Testthedatabasethattheshellisconnectedtobytypingthefollowingcommand:

>db

Thisshouldprintouttestontheconsole.

6. Now,typeinthefollowingcommandontheshell:

>sayHello('Fred')

HelloFred,howareyou?

Howitworks…TheJavaScriptfunctionweexecutedhereisofnopracticaluse,butit’sjustusedtodemonstratehowafunctioncanbepreloadeduponthestartupoftheshell.Therecanbemultiplefunctionsinthe.jsfilethatcontainvalidJavaScriptcode,possiblysomecomplexbusinesslogic.

Whenweexecutedthemongocommandwithoutanyarguments,weconnectedtotheMongoDBserverthatrunsonthelocalhostandlistensfornewconnectionsonthedefaultport27017.Theformatofthecommandisasfollows:

mongo<options><dbaddress><.jsfiles>

Iftherearenoargumentspassedtothemongoexecutable,itisequivalenttopassingdbaddressaslocalhost:27017/test.

There’smore…Let’slookatsomeexamplevaluesofthedbaddresscommand-lineoptionanditsinterpretation:

mydb:Thiswillconnecttotheserverthatrunsonthelocalhostandlistensforconnectiononport27017.Thedatabaseconnectedwillbemydb.mongo.server.host/mydb:Thiswillconnecttotheserverthatrunsonmongo.server.hostandthedefaultport27017.Thedatabaseconnectedwillbemydb.mongo.server.host:27000/mydb:Thiswillconnecttotheserverthatrunsonmongo.server.hostandtheport27000.Thedatabaseconnectedwillbemydb.mongo.server.host:27000:Thiswillconnecttotheserverthatrunsonmongo.server.hostandtheport27000.Thedatabaseconnectedwillbethedefaultdatabase,test.

Now,therearequiteafewoptionsavailableontheMongoclienttoo.Wewillseeafewoftheminthefollowingtable:

Option Description

--help

or–h Thisoffershelpregardingtheusageofvariouscommand-lineoptions.

--shell

When.jsfilesaregivenasarguments,thesescriptsgetexecuted,andtheMongoclientwillexit.ProvidingthisoptionensuresthattheshellremainsrunningaftertheJavaScriptfilesexecute.Allthefunctionsandvariablesdefinedinthese.jsfilesareavailableintheshelluponstartup.Asintheprecedingcase,thesayHellofunctiondefinedintheJavaScriptfileisavailableintheshellforinvocation.

--port ThisspecifiestheportoftheMongoserverwheretheclientneedstoconnect.

--hostThisspecifiesthehostnameoftheMongoserverwheretheclientneedstoconnect.Ifthedbaddressisprovidedwiththehostname,port,anddatabase,boththe--hostand--portoptionsneednotbespecified.

--

username

or–uThisisrelevantwhensecurityisenabledforMongo.Itisusedtoprovidetheusernameoftheusertobeloggedin.

--

password

or–pThisisrelevantwhensecurityisenabledforMongo.Itisusedtoprovidethepasswordoftheusertobeloggedin.

ConnectingtoasinglenodefromaJavaclientThisrecipeisaboutsettinguptheJavaclientforMongoDB.Youwillberepeatedlyreferringtothisrecipewhileworkingonothers,soreaditverycarefully.

GettingreadyThefollowingaretheprerequisitesforthisrecipe:

Version1.6oraboveofJavaSDKisrecommended.UsethelatestavailableversionofMaven.Version3.1.1wasthelatestatthetimeofwritingthisbook.UsetheMongoDBJavadriver.Version2.11.3wasthelatestatthetimeofwritingthisbook.ConnectivitytotheInternettoaccesstheonlineMavenrepositoryoralocalrepositoryisneeded.Alternatively,youmightchooseanappropriatelocalrepositoryaccessibletoyoufromyourcomputer.TheMongoserverisupandrunningonthelocalhostandonport27017.Takealookatthefirstrecipe,SinglenodeinstallationofMongoDB,andstarttheserver.

Howtodoit…Let’stakealookatthestepsindetail:

1. InstallthelatestversionofJDKifyoudon’talreadyhaveitonyourmachine.WewillnotbegoingthroughthestepstoinstallJDKinthisrecipebut,beforemovingonwith,nextstep,theJDKshouldbepresent.Typejavac-versionontheshelltocheckfortheversioninstalled.

2. OncetheJDKissetup,thenextstepistosetupMaven.SkipthenextthreestepsifMavenisalreadyinstalledonyourmachine.

3. Mavenneedstobedownloadedfromhttp://maven.apache.org/download.cgi.Choosethebinariesinthe.tar.gzor.zipformatanddownloadit.ThisrecipeisexecutedonamachinethatrunsontheWindowsplatform;thus,thesestepsareforinstallationonWindows.ThefollowingscreenshotshowsthedownloadpageofMaven:

4. Oncethearchiveisdownloaded,weneedtoextractitandputtheabsolutepathofthebinfolderintheextractedarchiveintheoperatingsystem’spathvariable.MavenalsoneedsthepathoftheJDKtobesetastheJAVA_HOMEenvironmentvariable.RemembertosettherootofyourJDKasthevalueofthisvariable.

5. Allweneedtodonowistypemvn-versioninthecommandprompt.IfyouseetheversionofMavenonthecommandprompt,wehavesuccessfullysetupMaven:

>mvn-version

6. Atthisstage,wehaveMaveninstalled,andwearenowreadytocreateoursimpleprojecttowriteourfirstMongoclientinJava.Wewillstartbycreatingaprojectfolder.Let’sassumethatwecreateafoldercalledMongoJava.Then,wewillcreateafolderstructuresrc/main/javainthisprojectfolder.Therootoftheprojectfolderthencontainsafilecalledpom.xml.Oncethisfoldercreationisdone,thefolderstructureshouldlookasfollows:

MongoJava

+--src

|+main

|+java

|--pom.xml

7. Wejusthavetheprojectskeletonwithusnow.Wewillnowaddsomecontenttothe

pom.xmlfile.Notmuchisneededforthis.Addthefollowingcodesnippetinthepom.xmlfileandsaveit:

<project>

<modelVersion>4.0.0</modelVersion>

<name>MongoJava</name>

<groupId>com.packtpub</groupId>

<artifactId>mongo-cookbook-java</artifactId>

<version>1.0</version>

<packaging>jar</packaging>

<dependencies>

<dependency>

<groupId>org.mongodb</groupId>

<artifactId>mongo-java-driver</artifactId>

<version>2.11.3</version>

</dependency>

</dependencies>

</project>

8. Finally,wewillwriteourJavaclientthatwillbeusedtoconnecttotheMongoserverandexecutesomeverybasicoperations.ThefollowingistheJavaclasslocatedatsrc/main/javainthecom.packtpub.mongo.cookbookpackage,andthenameoftheclassisFirstMongoClient:

packagecom.packtpub.mongo.cookbook;

importcom.mongodb.BasicDBObject;

importcom.mongodb.DB;

importcom.mongodb.DBCollection;

importcom.mongodb.DBObject;

importcom.mongodb.MongoClient;

importjava.net.UnknownHostException;

importjava.util.List;

/**

*SimpleMongoJavaclient

*

*/

publicclassFirstMongoClient{

/**

*MainmethodfortheFirstMongoClient.Hereweshallbe

connectingtoamongo

*instancerunningonlocalhostandport27017.

*

*@paramargs

*/

publicstaticfinalvoidmain(String[]args)

throwsUnknownHostException{

MongoClientclient=newMongoClient("localhost",27017);

DBtestDB=client.getDB("test");

System.out.println("Droppingpersoncollectionintest

database");

DBCollectioncollection=testDB.getCollection("person");

collection.drop();

System.out.println("Addingapersondocumentintheperson

collectionoftestdatabase");

DBObjectperson=

newBasicDBObject("name","Fred").append("age",30);

collection.insert(person);

System.out.println("NowfindingapersonusingfindOne");

person=collection.findOne();

if(person!=null){

System.out.printf("Personfound,nameis%sandageis

%d\n",person.get("name"),person.get("age"));

}

List<String>databases=client.getDatabaseNames();

System.out.println("Databasenamesare");

inti=1;

for(Stringdatabase:databases){

System.out.println(i+++":"+database);

}

System.out.println("Closingclient");

client.close();

}

}

9. It’snowtimetoexecutetheprecedingJavacode.WewillexecuteitusingMavenfromtheshell.Youshouldbeinthesamedirectoryasthepom.xmlfileoftheproject:

mvncompileexec:java-

Dexec.mainClass=com.packtpub.mongo.cookbook.FirstMongoClient

Howitworks…Thosewerequitealotofstepstofollow!Let’slookatsomeoftheminmoredetail.Everythinguptostep6isstraightforwardanddoesn’tneedanyexplanation.Let’slookattheothersteps.

Thepom.xmlfilewehavehereisprettysimple.WedefinedadependencyonMongo’sJavadriver.Itreliesontheonlinerepository(http://search.maven.org)forresolvingtheartifacts.Foralocalrepository,allweneedtodoisdefinetherepositoriesandpluginRepositoriestagsinpom.xml.FormoreinformationonMaven,refertotheMavendocumentationathttp://maven.apache.org/guides/index.html.

Now,fortheJavaclass,theorg.mongodb.MongoClientclassisthebackbone.Wewillfirstinstantiateitusingoneofitsoverloadedconstructorsthatgivestheserver’shostandport.Inthiscase,thehostnameandportwerenotreallyneededasthevaluesprovidedarethedefaultvaluesanyway,andtheno-argumentconstructorwouldhaveworkedwelltoo.Thefollowinglineofcodeinstantiatesthisclient:

MongoClientclient=newMongoClient("localhost",27017);

Thenextstepistogetthedatabase;inthiscase,testusingthegetDBmethod.Thisisreturnedasanobjectoftypecom.mongodb.DB.Notethatthisdatabasemightnotexist,yetgetDBwillnotthrowanyexception.Instead,thedatabasewillgetcreatedwheneverweaddanewdocumenttothecollectioninthisdatabase.Similarly,getCollectionontheDBobjectwillreturnanobjectoftypecom.mongodb.DBCollection,representingthecollectioninthedatabase.Thistoomightnotexistinthedatabaseandwillgetcreatedautomaticallyupontheinsertionofthefirstdocument.

ThefollowinglinesofcodefromourclassshowhowtogetaninstanceofDBandDBCollection:

DBtestDB=client.getDB("test");

DBCollectioncollection=testDB.getCollection("person");

Beforeweinsertadocument,wewilldropthecollectionsothatevenuponmultipleexecutionsoftheprogram,wewillhavejustonedocumentinthepersoncollection.Thecollectionisdroppedusingthedrop()methodontheDBCollectionobject’sinstance.Next,wewillcreateaninstanceofcom.mongodb.DBObject.Thisisanobjectthatrepresentsthedocumenttobeinsertedinthecollection.TheconcreteclassusedhereisBasicDBObject,whichisatypeofjava.util.LinkedHashMapclass,wherethekeyisastringandthevalueisanobject.ThevaluecanbeanotherDBObject,too,inwhichcaseitisadocumentnestedwithinanotherdocument.Inourcase,wehavetwokeys:nameandage.Thesearethefieldnamesinthedocumenttobeinserted,andthevaluesareoftypestringandinteger,respectively.TheappendmethodofBasicDBObjectaddsanewkey-valuepairtotheBasicDBObjectinstanceandreturnsthesameinstance,whichallowsustochaintheappendmethodcallstoaddmultiplekeyvaluepairs.DBObjectistheninsertedintothecollectionusingtheinsertmethod.ThisishowweinstantiatedaDBObjectforthepersonandinserteditinthecollection:

DBObjectperson=newBasicDBObject("name","Fred").append("age",30);

collection.insert(person);

ThefindOnemethodonDBCollectionisstraightforwardandreturnsonedocumentfromthecollection.ThisversionoffindOnedoesn’tacceptDBObject(which,otherwise,actsasaqueryexecutedbeforeadocumentisselectedandreturned)asaparameter.Thisissynonymoustoexecutingadb.person.findOne()fromtheMongoshell.

Finally,wewillsimplyinvokegetDatabaseNamestogetalistofdatabasesnamesintheserver.Atthispointoftime,weshouldatleastbehavingthetestandlocaldatabasesinthereturnedresult.Oncealltheoperationsarecompleted,wewillclosetheclient.TheMongoClientclassisthread-safe;generally,oneinstanceisusedperapplication.Toexecutetheprogram,wewilluseMaven’sexecplugin.Onexecutingstep9,wewillseethefollowingoutputontheconsole:

[INFO]---exec-maven-plugin:1.2.1:java(default-cli)@mongo-cookbook-java

---

Droppingpersoncollectionintestdatabase

Addingapersondocumentinthepersoncollectionoftestdatabase

NowfindingapersonusingfindOne

Personfound,nameisFredandageis30

Databasenamesare

1:local

2:test

Closingclient

[INFO]--------------------------------------------------------------------

----

[INFO]BUILDSUCCESS

[INFO]--------------------------------------------------------------------

----

[INFO]Totaltime:5.183s

[INFO]Finishedat:WedOct3000:42:29IST2013

[INFO]FinalMemory:7M/19M

[INFO]--------------------------------------------------------------------

----

StartingmultipleinstancesaspartofareplicasetInthisrecipe,wewilllookatstartingmultipleserversonthesamehostbutasacluster.StartingasingleMongoserverisenoughfordevelopmentpurposesorapplicationsthatarenotmission-critical.Forcrucialproductiondeployments,weneedtheavailabilitytobehighwhere,ifoneserverinstancefails,anotherinstancetakesoverandthedataremainsavailableforquerying,inserting,orupdating.Clusteringisanadvancedconcept,andwewon’tbedoingitjusticebycoveringthiswholeconceptinonerecipe.Inthisrecipe,wewilltouchthesurfaceandgetintomoredetailsinotherrecipesinChapter4,Administration,laterinthebook.Inthisrecipe,wewillstartmultipleMongoserverprocessesonthesamemachinefortestingpurpose.Intheproductionenvironment,theywillberunningondifferentmachines(orvirtualmachines)inthesameordifferentdatacenters.

Let’sseeinbriefexactlywhatareplicasetis.Asthenamesuggests,itisasetofserversthatarereplicasofeachotherintermsofdata.LookingathowtheyarekeptinsyncwitheachotherandotherinternalsissomethingwewilldefertosomelaterrecipesinChapter4,Administration,butonethingtorememberisthatwriteoperationswillhappenonlyononenode,theprimaryone.Allthequeryingalsohappensfromtheprimarynodebydefault,thoughwemightpermitreadoperationsonsecondaryinstancesexplicitly.Animportantfacttorememberisthatreplicasetsarenotmeanttoachievescalabilitybydistributingthereadoperationsacrossvariousnodesinareplicaset.Theirsoleobjectiveistoensurehighavailability.

GettingreadyThoughnotaprerequisite,takingalookattheStartingasinglenodeinstanceusingcommand-lineoptionsrecipewilldefinitelymakethingseasier,justincaseyouarenotawareofthevariouscommand-lineoptionsandtheirsignificancewhilestartingaMongoserver.Also,thenecessarybinariesandsetupasmentionedintheSinglenodeinstallationofMongoDBrecipemustbemasteredbeforewecontinuewiththisrecipe.Let’ssumupwhatweneedtodo.

Wewillstartthreemongodprocesses(Mongoserverinstances)onourlocalhost.Then,wewillcreatethreedatadirectories,/data/n1,/data/n2,and/data/n3,fornode1,node2,andnode3,respectively.Similarly,wewillredirectthelogsto/logs/n1.log,/logs/n2.log,and/logs/n3.log.Thefollowingdiagramwillgiveyouanideaastohowtheclusterwilllooklike:

Howtodoit…Let’stakealookatthestepsindetail:

1. Createthe/data/n1,/data/n2,and/data/n3directories,/logsfordata,andlogsofthethreenodes.OntheWindowsplatform,youcanchoosethec:\data\n1,c:\data\n2,c:\data\n3,orc:\logs\directory(oranyotherdirectoryofyourchoice)fordataandlogs,respectively.EnsurethatthesedirectorieshaveappropriatewritepermissionsfortheMongoservertowritethedataandlogs.

2. Startthethreeserversasfollows(notethatusersontheWindowsplatformneedtoskipthe--forkoption,asitisnotsupported):

$mongod--replSetrepSetTest--dbpath/data/n1--logpath/logs/n1.log

--port27000--smallfiles--oplogSize128--fork

$mongod--replSetrepSetTest--dbpath/data/n2--logpath/logs/n2.log

--port27001--smallfiles--oplogSize128--fork

$mongod--replSetrepSetTest--dbpath/data/n3--logpath/logs/n3.log

--port27002--smallfiles--oplogSize128--fork

3. StarttheMongoshellandconnecttoanyoftheMongoserversthatarerunning.Inthiscase,wewillconnecttothefirstone(theonelisteningtoport27000).Executethefollowingcommand:

$mongolocalhost:27000

4. TrytoexecuteaninsertoperationfromtheMongoshellafterconnectingtoitasfollows:

>db.person.insert({name:'Fred',age:35})

Thisoperationshouldfailasthereplicasetisnotinitializedyet.MoreinformationcanbefoundintheHowitworks…sectionofthisrecipe.

5. Thenextstepistostartconfiguringthereplicaset.WewillstartbypreparingaJSONconfigurationintheshell:

cfg={

'_id':'repSetTest',

'members':[

{'_id':0,'host':'localhost:27000'},

{'_id':1,'host':'localhost:27001'},

{'_id':2,'host':'localhost:27002'}

]

}

6. Thelaststepistoinitiatethereplicasetwiththeprecedingconfigurationasfollows:

>rs.initiate(cfg)

Executers.status()afterafewsecondsontheshelltoseethestatus.Inafewseconds,oneofthemshouldbecomeprimary,andtheremainingtwoshouldbecomesecondary.

Howitworks…Wedescribedthecommonoptionsandallthesecommand-lineoptionsintheStartingasinglenodeinstanceusingcommand-lineoptionsrecipeindetail.

Aswearestartingthreeindependentmongodservices,wehavethreededicateddatabasepathsonthefilesystem.Similarly,wehavethreeseparatelogfilelocationsforeachoftheprocesses.Wethenstartedthreemongodprocesseswiththedatabaseandlogfilepathspecified.Asthissetupisfortestpurposesandstartedonthesamemachine,weusedthe--smallfilesand--oplogSizeoptions.Avoidusingtheseoptionsintheproductionenvironment.Asthesearerunningonthesamehost,wealsochoosetheportsexplicitlytoavoidportconflicts.Theportswechosehereare27000,27001,and27002.Whenwestarttheserversondifferenthosts,wemightormightnotchooseaseparateport.Wecanverywellchoosetousethedefaultonewheneverpossible.

The--forkoptiondemandssomeexplanation.Bychoosingthisoption,westartedtheserverasabackgroundprocessfromouroperatingsystem’sshellandgotthecontrolbackintheshell,wherewecanthenstartmoresuchmongodprocessesorperformotheroperations.Intheabsenceofthe--forkoption,wecannotstartmorethanoneprocesspershellandwillneedtostartthreemongodprocessesinthreeseparateshells.Thisoption,however,doesn’tworkontheWindowsplatform,andweneedtostartoneprocesspershell.Wecan,however,executethefollowingcommandtospawnanewshellandthenstartthenewMongoserviceinthisnewlyspawnedshell:

startmongod--replSetrepSetTest--dbpathc:\data\c1--logpath

c:\logs\n1.log--port27000--smallfiles--oplogSize128

Theprecedingcommandallowsustohaveabatchfile(a.batfile)thatcontainsallthelogictocreatetherelevantdirectoriesandthenspawnthreemongodprocessesinthreeshells.

Let’sgetbacktothereplicasetcreation;wearenotyetdonewithsettingupareplicaset.Ifwetakealookatthelogsgeneratedinthelogdirectory,wewillseethefollowinglinesinit:

[rsStart]replSetcan'tgetlocal.system.replsetconfigfromselforany

seed(EMPTYCONFIG)

[rsStart]replSetinfoyoumayneedtorunreplSetInitiate—rs.initiate()in

theshell—ifthatisnotalreadydone

Thoughwestartedthreemongodprocesseswiththe--replSetoption,westillhaven’tconfiguredthemtoworkwitheachotherasareplicaset.Thiscommand-lineoptionisjustusedtotelltheserveronstartupthatthisprocesswillberunningaspartofareplicaset.Thenameofthereplicasetisthesameasthevalueofthisoptionpassedonthecommandprompt.Thisalsoexplainswhytheinsertoperationexecutedononeofthenodesfailedbeforethereplicasetwasinitialized.Inmongoreplicasets,onlyonenodeistheprimarynodewherealltheinsertsandqueryinghappen.Intheprecedingdiagram,noden1isshownastheprimarynodeandlistenstoport27000forclientconnections.Alltheothernodesareslave/secondaryinstancesthatsyncthemselvesupwiththeprimarynode;hence,

queryingtooisdisabledonthembydefault.Itisonlywhentheprimarynodegoesdownthatoneofthesecondariestakesoverandbecomesaprimarynode.Itis,however,possibletoquerythesecondaryinstancesfordata,asweshowedintheprecedingdiagram.Wewillseehowtoqueryfromasecondaryinstanceinthenextrecipe.

Well,allthatisleftnowistoconfigurethereplicasetbygroupingthethreeprocesseswestarted.ThisisdonebyfirstdefiningaJSONobjectasfollows:

cfg={

'_id':'repSetTest',

'members':[

{'_id':0,'host':'localhost:27000'},

{'_id':1,'host':'localhost:27001'},

{'_id':2,'host':'localhost:27002'}

]

}

Therearetwofields,_idandmembers,fortheuniqueIDofthereplicasetandanarrayofthehostnamesandportnumbersofthemongodserverprocessesaspartofthisreplicaset,respectively.Usingthelocalhosttorefertothehostisnotaverygoodideaandisusuallydiscouraged.However,inthiscase,westartedalltheprocessesonthesamemachine;thus,weareOKwithit.Itis,however,preferredtorefertothehostsbytheirhostnameseveniftheyarerunningonthelocalhost.Notethatyoucannotmixreferringtheinstancesusingthelocalhostandhostnamesbothinthesameconfig.Youcanuseeitherthehostnamesorthelocalhost.Toconfigurethereplicaset,wethenconnecttoanyoneofthreerunningmongodprocesses;inthiscase,wewillconnecttothefirstoneandthenexecutethefollowingcommandfromtheshell:

>rs.initiate(cfg)

The_idinthecfgobjectpassedhasthesamevalueasthevaluewegavetothe--replSetoptioninthecommandpromptwhenwestartedtheserverprocesses.Notgivingthesamevaluewillthrowthefollowingerror:

{

"ok":0,

"errmsg":"couldn'tinitiate:setnamedoesnotmatchtheset

namehostAmol-PC:27000expects"

}

Ifallgoeswellandtheinitiatecallissuccessful,youwillseesomethinglikethefollowingJSONresponseontheshell:

{

"info":"Confignowsavedlocally.Shouldcomeonlineinabouta

minute.","ok":1

}

Inafewseconds,youshouldseeadifferentpromptfortheshellfromwhichweexecutedthiscommand.Itshouldnowbecomeaprimaryorsecondarynode.Thefollowingcommandisanexampleoftheshellconnectedtoaprimarymemberofthereplicaset:

repSetTest:PRIMARY>

Executingrs.status()shouldgiveussomestatsonthereplicasetstatus.ThestateStrfieldhereisimportant,anditcontainsthetextPRIMARY,SECONDARY,andsoon.

There’smore…Ifyouarelookingtoconvertastandaloneinstancetoareplicaset,theinstancewithdataneedstobecomeaprimaryinstancefirst,andthenemptysecondaryinstanceswillbeadded,towhichthedatawillbesynchronized.Formoreinformationonhowtoperformthisoperation,visithttp://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/.

SeealsoTheConnectingtothereplicasetfromtheshelltoqueryandinsertdatarecipetoperformmoreoperationsfromtheshellafterconnectingtoareplicasetChapter4,Administration,formoreadvancedrecipesonreplication

ConnectingtothereplicasetfromtheshelltoqueryandinsertdataInthepreviousrecipe,westartedareplicasetofthreemongodprocesses.Inthisrecipe,wewillbeworkingontopofitandwillconnecttoitfromtheclientapplication,performquerying,insertdata,andtakealookatsomeoftheinterestingaspectsofthereplicasetfromaclient’sperspective.

GettingreadyTheprerequisiteforthisrecipeisthatthereplicasetshouldbesetup,anditshouldbeupandrunning.Fordetailsonhowtostartthereplicaset,refertotheStartingmultipleinstancesaspartofareplicasetrecipe.

Howtodoit…Let’stakealookatthestepsindetail:

1. Createthe/data/n1,/data/n2,/data/n3,and/logsdirectoriesfordataandlogsofthethreenodes,respectively.

2. Wewillstarttwoshellshere:oneforprimaryandoneforsecondary.Executethefollowingcommandinthecommandprompt:

mongolocalhost:27000

3. Thepromptoftheshelltellswhethertheservertowhichweconnectedisprimaryorsecondary.Itshouldshowthereplicaset’snamefollowedby:andthenfollowedbytheserver’sstate.Inthiscase,ifthereplicasetisinitializedandisupandrunning,wewillseeeitherrepSetTest:PRIMARY>orrepSetTest:SECONDARY>.

4. Supposethefirstserverweconnectedtoisasecondaryserver,thenweneedtofindtheprimaryserverasfollows:

1. Executethers.status()commandintheshellandlookoutforthestateStrfield.Thisshouldgiveustheprimaryserver.UsetheMongoshelltoconnecttothisserver.Atthispoint,weshouldhavetwoshellsrunning:oneconnectedtoaprimarynodeandtheotherconnectedtoasecondarynode.

5. Intheshellconnectedtotheprimarynode,executethefollowinginsertcommand:

repSetTest:PRIMARY>db.replTest.insert({_id:1,value:'abc'})

Thereisnothingspecialaboutit.Wehavejustinsertedasmalldocumentinacollectionthatweuseforthereplicationtest.

6. Byexecutingthefollowingqueryontheprimarynode,weshouldgetoneresult:

repSetTest:PRIMARY>db.replTest.findOne()

{"_id":1,"value":"abc"}

7. Sofarsogood.Now,wewillgototheshellthatisconnectedtothesecondarynodeandexecutethefollowingcommand:

repSetTest:SECONDARY>db.replTest.findOne()

8. Ondoingthis,wewillseethefollowingerrorontheconsole:

{"$err":"notmasterandslaveOk=false","code":13435}

9. Now,executethefollowingcommandontheconsole:

repSetTest:SECONDARY>rs.slaveOk(true)

10. Executethequeryweexecutedinstep7againontheshell.Thiswillnowgetthefollowingresults:

repSetTest:SECONDARY>db.replTest.findOne()

{"_id":1,"value":"abc"}

11. Executethefollowinginsertcommandonthesecondarynode;itshouldnotsucceedwiththefollowingmessage:

repSetTest:SECONDARY>db.replTest.insert({_id:1,value:'abc'})not

master

Howitworks…Wehavedonealotofthingsinthisrecipe,andwewilltrytothrowsomelightonsomeoftheimportantconceptstoremember.

Webasicallyconnectedtoaprimaryandasecondarynodefromtheshellandperformed(Iwouldsay,triedtoperform)theselectandinsertoperations.ThearchitectureofaMongoreplicasetismadeupofoneprimary(justone;nomore,noless)andmultiplesecondarynodes.Allwriteshappenontheprimarynodeonly.Notethatreplicationisnotamechanismtodistributearead-requestloadthatenablesustoscalethesystem.Itsprimaryintentistoensurehighavailabilityofdata.Bydefaultwearenotpermittedtoreaddatafromthesecondarynodes.Instep6,wesimplyinserteddatafromtheprimarynodeandthenexecutedthequerytogetthedocumentthatweinserted.Thisisstraightforward,andthereisnothingrelatedtoclusteringhere.Justnotethatweinsertedthedocumentfromtheprimarynodeandthenquerieditback.

Inthenextstep,weexecutedthesamequerybut,thistime,fromthesecondarynode’sshell.Bydefault,queryingisnotenabledonthesecondarynode.Theremightbeasmalllaginreplicatingthedata,possiblyduetoheavydatavolumestobereplicated,networklatency,andhardwarecapacitytonameafewofthecauses;thus,queryingonthesecondarynodemightnotreflectthelatestinsertsorupdatesmadeontheprimarynode.If,however,weareOKwithitandcanlivewiththeslightlaginthedatabeingreplicated,allweneedtodoisenablequeryingonthesecondarynodeexplicitlybyjustexecutingonecommand,rs.slaveOk()orrs.slaveOk(true).Oncethisisdone,wearefreetoexecutequeriesonthesecondarynodestoo.

Finally,wetriedtoinsertdatainacollectionoftheslavenode.Undernocircumstancesthisispermitted,regardlessofwhetherwehaveexecutedrs.slaveOk().Whenrs.slaveOk()isinvoked,itjustpermitsthedatatobequeriedfromthesecondarynode.Allthewriteoperationsstillhavetogototheprimarynodeandthenflowdowntothesecondarynode.TheinternalsofreplicationwillbecoveredinadifferentrecipeintheUnderstandingandanalyzingoplogsrecipeinChapter4,Administration.

SeealsoTheConnectingtothereplicasettoqueryandinsertdatafromaJavaclientrecipeistogetdetailsonhowtoconnecttoreplicasetfromaJavaclient

ConnectingtothereplicasettoqueryandinsertdatafromaJavaclientInthisrecipe,wewilldemonstratehowtoconnecttoareplicasetusingaJavaclientandexecutequeriesandinsertdatausingtheJavaclientforMongoDB.Wewillalsoseehowtheclientwouldautomaticallyfailovertoanothermemberinthereplicasetshouldaprimarymembergoesdown.

GettingreadyWefirstneedtotakealookattheConnectingtoasinglenodefromaJavaclientrecipe,asitcontainsalltheprerequisitesandstepstosetupMavenandotherdependencies.AswearedealingwithaJavaclientforreplicasets,areplicasetmustbeupandrunning.RefertotheStartingmultipleinstancesaspartofareplicasetrecipefordetailsonhowtostartthereplicaset.

Howtodoit…Let’stakealookatthestepsindetail:

1. First,weneedtowrite/copythefollowingpieceofcode(thisJavaclassisalsoavailablefordownloadfromthebook’ssite):

packagecom.packtpub.mongo.cookbook;

importcom.mongodb.BasicDBObject;

importcom.mongodb.DB;

importcom.mongodb.DBCollection;

importcom.mongodb.DBObject;

importcom.mongodb.MongoClient;

importcom.mongodb.ServerAddress;

importjava.util.Arrays;

/**

*

*/

publicclassReplicaSetMongoClient{

/**

*Mainmethodforthetestclientconnectingtothereplicaset.

*@paramargs

*/

publicstaticfinalvoidmain(String[]args)throwsException{

MongoClientclient=newMongoClient(

Arrays.asList(

newServerAddress("localhost",27000),

newServerAddress("localhost",27001),

newServerAddress("localhost",27002)

)

);

DBtestDB=client.getDB("test");

System.out.println("DroppingreplTestcollection");

DBCollectioncollection=testDB.getCollection("replTest");

collection.drop();

DBObjectobject=newBasicDBObject("_id",1).append("value",

"abc");

System.out.println("Addingatestdocumenttoreplicaset");

collection.insert(object);

System.out.println("Retrievingdocumentfromthecollection,

thisonecomesfromprimarynode");

DBObjectdoc=collection.findOne();

showDocumentDetails(doc);

System.out.println("NowRetrievingdocumentsinaloopfromthe

collection.");

System.out.println("Stoptheprimaryinstancemanuallyafter

fewiterations");

for(inti=0;i<20;i++){

try{

doc=collection.findOne();

showDocumentDetails(doc);

}catch(Exceptione){

//Ignoringorlogamessage

}

Thread.sleep(5000);

}

}

/**

*

*@paramobj

*/

privatestaticvoidshowDocumentDetails(DBObjectobj){

System.out.printf("_id:%d,valueis%s\n",obj.get("_id"),

obj.get("value"));

}

}

2. Connecttoanyofthenodesinthereplicaset,saytolocalhost:27000,and,fromtheshell,executers.status().Takeanoteoftheprimaryinstanceinthereplicasetandconnecttoitfromtheshelliflocalhost:27000isnotaprimarynode.Now,switchtotheadmindatabaseasfollows:

repSetTest:PRIMARY>useadmin

3. Now,executetheprecedingprogramfromtheoperatingsystemshellasfollows:

$mvncompileexec:java-

Dexec.mainClass=com.packtpub.mongo.cookbook.ReplicaSetMongoClient

4. ShutdowntheprimaryinstancebyexecutingthefollowingcommandontheMongoshellconnectedtotheprimarynode:

repSetTest:PRIMARY>db.shutdownServer()

5. Watchtheoutputontheconsolewherethecom.packtpub.mongo.cookbook.ReplicaSetMongoClientclassisexecutedusingMaven.

Howitworks…AninterestingthingtoobserveishowweinstantiateaMongoClientinstance.Itisdoneasfollows:

MongoClientclient=newMongoClient(Arrays.asList(new

ServerAddress("localhost",27000),newServerAddress("localhost",27001),new

ServerAddress("localhost",27002)));

Theconstructortakesalistofcom.mongodb.ServerAddress.Thisclasshasalotofoverloadedconstructors,butwechosetousetheonethattakesthehostnameandportnumber.Here.weprovidedalltheserverdetailsinareplicasetasalist.Wehaven’tmentionedwhattheprimarynodeisandwhatthesecondarynodesare.TheMongoClientclassisintelligentenoughtofigurethisoutandconnecttotheappropriateinstance.Thelistofserversprovidediscalledtheseedlist.Itneednotcontainanentiresetofserversinareplicaset,thoughtheobjectiveistoprovideasmuchaswecan.TheMongoClientclasswillfigureoutalltheserverdetailsfromtheprovidedsubset.Forexample,ifthereplicasetisoffivenodesbutweprovideonlythreeservers,itstillworksfine.Onconnectingwiththeprovidedreplicasetservers,theclientwillquerythemtogetthereplicasetmetadataandfigureouttherestoftheprovidedserversinthereplicaset.Intheprecedingcase,weinstantiatedtheclientwiththreeinstancesinthereplicaset.Ifthereplicasethasfivemembers,instantiatingtheclientwithjustthreeofthemaswedidearlierisstillgoodenough,andtheremainingtwoinstanceswillbeautomaticallydiscovered.

Next,wewillstarttheclientfromthecommandpromptusingMaven.Oncetheclientisrunninginthelooptofindonedocument,wewillbringdowntheprimaryinstance.Wewillseesomethinglikethefollowingoutputontheconsole:

_id:1,valueisabc

Now,retrievingdocumentsinaloopfromthecollection.

Stoptheprimaryinstancemanuallyafterafewiterations:

_id:1,valueisabc

_id:1,valueisabc

Nov03,20135:21:57PMcom.mongodb.ConnectionStatus$UpdatableNodeupdate

WARNING:Serverseendown:Amol-PC/192.168.1.171:27002

java.net.SocketException:Softwarecausedconnectionabort:recvfailed

atjava.net.SocketInputStream.socketRead0(NativeMethod)

atjava.net.SocketInputStream.read(SocketInputStream.java:150)

WARNING:PrimaryswitchingfromAmol-PC/192.168.1.171:27002toAmol-

PC/192.168.1.171:27001

_id:1,valueisabc

Aswecansee,thequeryintheloopwasinterruptedwhentheprimarynodewentdown.Theclient,however,switchedtothenewprimarynodeseamlessly,well,nearlyseamlessly,astheclientmighthavetocatchanexceptionandretrytheoperationafterapredeterminedintervalhaselapsed.

StartingasimpleshardedenvironmentoftwoshardsInthisrecipe,wewillsetupasimpleshardedsetupmadeupoftwodatashards.Therewillbenoreplicationtokeepitsimple,asthisisthemostbasicshardsetuptodemonstratetheconcept.Wewon’tbegettingdeepintotheinternalsofsharding,whichwewillexplorefurtherinChapter4,Administration.

Hereisabitoftheorybeforeweproceed.Scalabilityandavailabilityaretwoimportantcornerstonesforbuildinganymission-criticalapplication.Availabilityissomethingthatwastakencareofbyreplicasets,whichwediscussedinthepreviousrecipesofthischapter.Let’slookatscalabilitynow.Simplyput,scalabilityistheeasewithwhichthesystemcancopewithanincreasingdataandrequestload.Considerane-commerceplatform.Onregulardays,thenumberofhitstothesiteandloadisfairlymodest,andthesystemresponsetimesanderrorratesareminimal(thisissubjective).

Now,considerthedayswherethesystemloadbecomestwiceorthreetimesanaverageday’sload(orevenmore),forexample,sayonThanksgivingDay,Christmas,andsoon.Iftheplatformisabletodeliversimilarlevelsofserviceonthesehigh-loaddayscomparedwithanyotherday,thesystemissaidtohavescaledupwelltothesuddenincreaseinthenumberofrequests.

Now,consideranarchivingapplicationthatneedstostorethedetailsofalltherequeststhathitaparticularwebsiteoverthepastdecade.Foreachrequestthathitsthewebsite,wewillcreateanewrecordintheunderlyingdatastore.Supposeeachrecordisof250byteswithanaverageloadof3millionrequestsperday,thenwewillcrossthe1TBdatamarkinabout5years.Thisdatawillbeusedforvariousanalyticpurposesandmightbefrequentlyqueried.Thequeryperformanceshouldnotbedrasticallyaffectedwhenthedatasizeincreases.Ifthesystemisabletocopewiththisincreasingdatavolumeandstillgivesadecentperformancecomparabletothatonlowdatavolumes,thesystemissaidtohavescaledupwellagainsttheincreasingdatavolumes.

Nowthatwehaveseeninbriefwhatscalabilityis,letmetellyouthatshardingisamechanismthatletsasystemscaletoincreasingdemands.Thecruxliesinthefactthattheentiredataispartitionedintosmallersegmentsanddistributedacrossvariousnodescalledshards.Let’sassumethatwehaveatotalof10milliondocumentsinaMongocollection.Ifweshardthiscollectionacross10shards,wewillideallyhave10,000,000/10=1,000,000documentsoneachshard.Atagivenpointoftime,onedocumentwillonlyresideononeshard(which,byitself,willbeareplicasetinaproductionsystem).Thereis,however,somemagicinvolvedthatkeepsthisconcepthiddenfromthedeveloperqueryingthecollection,whogetsoneunifiedviewofthecollectionirrespectiveofthenumberofshards.Basedonthequery,itisMongothatdecideswhichshardtoqueryforthedataandreturntheentireresultset.Withthisbackground,let’ssetupasimpleshardandtakeacloserlookatit.

GettingreadyApartfromtheMongoDBserveralreadyinstalled,therearenoprerequisitesfromasoftwareperspective.Wewillcreatetwodatadirectories,oneforeachshard.Therewillbeonedirectoryfordataandoneforlogs.

Howtodoit…Let’stakealookatthestepsindetail:

1. Wewillstartbycreatingdirectoriesforlogsanddata.Createthe/data/s1/db,/data/s2/db,and/logsdirectories.OnWindows,wecanhavec:\data\s1\db,andsoonforthedataandlogdirectories.Thereisalsoaconfigserverthatisusedinashardedenvironmenttostoresomemetadata.Wewilluse/data/con1/dbasthedatadirectoryfortheconfigserver.

2. Startthefollowingmongodprocesses,oneforeachofthetwoshardsandonefortheconfigdatabase,andonemongosprocess(wewillseewhatthisprocessdoes).FortheWindowsplatform,skipthe--forkparameterasitisnotsupported:

$mongod--shardsvr--dbpath/data/s1/db--port27000--logpath

/logs/s1.log--smallfiles--oplogSize128--fork

$mongod--shardsvr--dbpath/data/s2/db--port27001--logpath

/logs/s2.log--smallfiles--oplogSize128--fork

$mongod--configsvr--dbpath/data/con1/db--port25000--logpath

/logs/config.log--fork

$mongos--configdblocalhost:25000--logpath/logs/mongos.log--fork

3. Inthecommandprompt,executethefollowingcommand.Thiswillshowamongosprompt:

$mongo

MongoDBshellversion:2.4.6

connectingto:test

mongos>

4. Finally,wesetuptheshard.Fromthemongosshell,executethefollowingtwocommands:

mongos>sh.addShard("localhost:27000")

mongos>sh.addShard("localhost:27001")

5. Ontheadditionofeachshard,wewillgetanokreply.SomethinglikethefollowingJSONmessagewillbeseengivingtheuniqueIDforeachshardthatisadded:

{"shardAdded":"shard0000","ok":1}

NoteWehaveusedlocalhosteverywheretorefertothelocallyrunningservers.Itisnotarecommendedapproachandisdiscouraged.Abetterapproachwillbetousehostnameseveniftheyarelocalprocesses.

HowitworksLet’sseewhatwedidintheprocess.Wecreatedthreedirectoriesfordata(twofortheshardsandonefortheconfigdatabase)andonedirectoryforlogs.Wecanhaveashellscriptorabatchfiletocreatethedirectoriesaswell.Infact,inlargeproductiondeployments,settingupshardsmanuallyisnotonlytime-consumingbutalsoerror-prone.

Let’strytogetapictureofwhatexactlywehavedoneandwhatwearetryingtoachieve.

Thefollowingdiagramshowstheshardsetupwejustbuilt:

Ifwelookattheprecedingdiagramandtheserversstartedinstep2,wewillseethatwehaveshardserversthatwillstoretheactualdatainthecollections.Thesewerethefirsttwoofthefourprocessesthatstartedlisteningtoport27000and27001.Next,westartedaconfigserver,whichisseenontheleft-handsideintheprecedingdiagram.Itisthethirdserverofthefourserversstartedinstep2,anditlistenstoport25000forincomingconnections.Thesolepurposeofthisdatabaseistomaintainthemetadataoftheshardservers.Ideally,onlythemongosprocessordriversconnecttothisserverforthesharddetails/metadataandtheshardkeyinformation.Wewillseewhatashardkeyisinthenextrecipe,wherewewillplayaroundwithashardedcollectionandseetheshardswecreatedinaction.

Finally,wehaveamongosprocess.Thisisalightweightprocessthatdoesn’tdoanypersistenceofdataandjustacceptsconnectionsfromclients.Thisisthelayerthatactsasagatekeeperandabstractstheclientfromtheconceptofshards.Fornow,wecanviewitasarouterthatconsultstheconfigserverandtakesthedecisiontoroutetheclient’squerytotheappropriateshardserverforexecution.Itthenaggregatestheresultfromvariousshardsifapplicableandreturnstheresulttotheclient.Itissafetosaythatnoclientdirectlyconnectstotheconfigortheshardservers;infact,ideally,nooneshouldconnecttotheseprocessesdirectly,exceptforsomeadministrationoperations.Clientssimplyconnecttothemongosprocessandexecutetheirqueries,orinsertorupdateoperations.

Justbystartingtheshardservers,theconfigserverandmongosprocessdon’tcreateashardedenvironment.Onstartingupthemongosprocess,weprovideditwiththedetailsoftheconfigserver.Whataboutthetwoshardsthatwillbestoringtheactualdata?Thetwomongodprocessesthatstartedasshardserversare,however,notyetdeclaredanywhereas

shardserversintheconfiguration.Thatisexactlywhatwedointhefinalstepbyinvokingsh.addShard()forboththeshardservers.Themongosprocessisprovidedwiththeconfigserver’sdetailsonstartup.Addingshardsfromtheshellstoresthismetadataabouttheshardsintheconfigdatabase;then,themongosprocesseswillquerythisconfigdatabasefortheshard’sinformation.Onexecutingallthestepsofthisrecipe,wewillhaveanoperationalshard.Beforeweconclude,theshardwesetuphereisfarfromidealandnothowitwillbedoneintheproductionenvironment.Thefollowingdiagramgivesusanideaofhowatypicalshardwillbeinaproductionenvironment:

Thenumberofshardswillnotbetwobutmuchmore.Also,eachshardwillbeareplicasettoensurehighavailability.Therewillbethreeconfigserverstoensuretheavailabilityoftheconfigserverstoo.Similarly,therewillbeanynumberofmongosprocessescreatedforashardthatlistensforclientconnections.Insomecases,itmightevenbestartedonaclientapplication’sserver.

There’smore…Whatgoodisashardunlessweputittoactionandseewhathappensfromtheshelloninsertingandqueryingthedata?Inthenextrecipe,wewillmakeuseoftheshardsetup,addsomedata,andseeitinaction.

ConnectingtoashardfromtheMongoshellandperformingoperationsInthisrecipe,wewillbeconnectingtoashardfromacommandprompt;wewillalsoseehowtoshardacollectionandobservethedatasplittinginactiononsometestdata.

GettingreadyObviously,weneedashardedmongoserversetupthatisupandrunning.Seethepreviousrecipeformoredetailsonhowtosetupasimpleshard.Themongosprocess,asinthepreviousrecipe,shouldbelisteningtoportnumber27017.WehavegotsomenamesinaJavaScriptfilecallednames.js.Thisfileneedstobedownloadedfromthisbook’ssiteandkeptonthelocalfilesystem.Thefilecontainsavariablecallednames,andthevalueisanarraywithsomeJSONdocumentsasthevalues,eachonerepresentingaperson.Thecontentslookasfollows:

names=[

{name:'JamesSmith',age:30},

{name:'RobertJohnson',age:22},

]

Howtodoit…Let’stakealookatthestepsindetail:

1. StarttheMongoshellandconnecttothedefaultportonthelocalhostasfollows(thiswillensurethatthenameswillbeavailableinthecurrentshell):

mongo--shellnames.js

MongoDBshellversion:2.4.6

connectingto:test

mongos>

2. Switchtothedatabasethatwillbeusedtotestshardingasfollows(wecallitshardDB):

mongos>useshardDB

3. Enableshardingatthedatabaselevelasfollows:

mongos>sh.enableSharding("shardDB")

4. Shardacollectioncalledpersonasfollows:

mongos>sh.shardCollection("shardDB.person",{name:"hashed"},false)

5. Addtestdatatotheshardedcollectionasfollows:

mongos>for(i=1;i<=300000;i++){

...person=names[Math.round(Math.random()*100)%20]

...doc={_id:i,name:person.name,age:person.age}

...db.person.insert(doc)

}

6. Executethefollowingcommandtogetaqueryplanandthenumberofdocumentsoneachshard:

mongos>db.person.find().explain()

Howitworks…Thisrecipedemandssomeexplanation.WehavedownloadedaJavaScriptfilethatdefinesanarrayof20people.EachelementofthearrayisaJSONobjectwithanameandageattribute.WestartedtheshellthatconnectstothemongosprocessloadedwiththisJavaScript.WethenswitchedtoshardDB,whichwewilluseforthepurposeofsharding.

Foracollectiontobesharded,thedatabaseinwhichitwillbecreatedneedstobeenabledforshardingfirst.Wedothisusingsh.enableSharding().

Thenextstepistoenablethecollectiontobesharded.Bydefault,allthedatawillbekeptononeshardandwillnotbesplitacrossdifferentshards.ThinkabouthowMongowillbeabletomeaningfullysplitthedata.Thewholeintentionistosplititmeaningfullyandasevenlyaspossiblesothatwheneverwequerybasedonashardkey,Mongowilleasilybeabletodeterminewhichshard(s)toquery.Ifaquerydoesn’tcontainashardkey,theexecutionofthequerywillhappenonalltheshards,andthedatawillthenbecollatedbythemongosprocessbeforereturningittotheclient.Thus,choosingtherightshardkeyisverycrucial.

Let’snowseehowtoshardthecollection.Wewilldothisbyinvokingsh.shardCollection("shardDB.person",{name:"hashed"},false).Therearethreeparametershere.

Thefirstparameterspecifiesafullyqualifiednameofthecollectioninthe<dbname>.<collectionname>format.ThisisthefirstparameteroftheshardCollectionmethod.Thesecondparameterspecifiesthefieldnametosharduponinthecollection.Thisisthefieldthatwillbeusedtosplitthedocumentsontheshards.Oneoftherequirementsofagoodshardkeyisthatitshouldhavehighcardinality(thenumberofpossiblevaluesshouldbehigh).Inourtestdata,thenamevaluehasaverylowcardinalityandthus,isnotagoodchoiceasashardkey.Wethushashthiskeywhenusingitasashardkey.Wedosobymentioningthekeyas{name:"hashed"}.Thelastparameterspecifieswhetherthevalueusedasashardkeyisuniqueornot.Thenamefieldisdefinitelynotunique;thus,itwillbefalse.Ifthefieldwas,say,theperson’ssocialsecuritynumber,itcouldhavebeensetastrue.Also,SSNisagoodchoiceforashardkeyduetoitshighcardinality.Rememberthough,forthequerytobeefficient,theshardkeyhastobepresentinit.

Thelaststepistoseetheexecutionplantofindallthedata.Theintentofthisoperationistoseehowthedataisbeingsplitacrosstwoshards.With3,00,000documents,weexpectsomethingaround1,50,000documentsoneachshard.Fromtheexplainplan’soutput,theshardattributehasanarraywithadocumentvalueforeachshardinthecluster.Inourcase.wehavetwo;thus.wehavetwoshardsthatgivethequeryplanforeachshard.Ineachofthem,thevalueofnissomethingtolookat.Itshouldgiveusthenumberofdocumentsthatresideoneachshard.ThefollowingcodesnippetistherelevantJSONdocumentweseefromtheconsole.Thenumberofdocumentsonshardsoneandtwois164938and135062,respectively:

"shards":{

"localhost:27000":[

{

"cursor":"BasicCursor",

"isMultiKey":false,

"n":164938,

"nscannedObjects":164938,

"nscanned":164938,

"nscannedObjectsAllPlans":164938,

"nscannedAllPlans":164938,

"scanAndOrder":false,

"indexOnly":false,

"nYields":1,

"nChunkSkips":0,

"millis":974,

"indexBounds":{

},

"server":"Amol-PC:27000"

}

],

"localhost:27001":[

{

"cursor":"BasicCursor",

"isMultiKey":false,

"n":135062,

"nscannedObjects":135062,

"nscanned":135062,

"nscannedObjectsAllPlans":135062,

"nscannedAllPlans":135062,

"scanAndOrder":false,

"indexOnly":false,

"nYields":0,

"nChunkSkips":0,

"millis":863,

"indexBounds":{

},

"server":"Amol-PC:27001"

}

]

}

ThereareacoupleofadditionalthingsthatIrecommendyoualltodo.

ConnecttotheindividualshardfromtheMongoshellandexecutequeriesonthepersoncollection.Seethatthecountsinthesecollectionsaresimilartowhatweseeintheprecedingplan.Also,onecanfindoutthatnodocumentexistsonboththeshardsatthesametime.

Wediscussedinbriefhowcardinalityaffectsthewaythedataissplitacrossshards.Let’sdoasimpleexercise.WewillfirstdropthepersoncollectionandexecutetheshardCollectionoperationagainbut,thistime,withthe{name:1}shardkeyinsteadof{name:"hashed"}.Thisensuresthattheshardkeyisnothashedandstoredasis.Now,loadthedatausingtheJavaScriptfunctionweusedearlierinstep5andthenexecute

explainonthecollectiononcethedataisloaded.Observehowthedataisnowsplit(ornot)acrosstheshards.

There’smore…Alotofquestionsmightnowcomeup,suchaswhatarethebestpractices,whataresometipsandtricks,howistheshardingthingpulledoffbyMongoDBbehindthescenesinawaytransparenttotheenduser,andsoon.

Thisrecipeonlyexplainedthebasics.AllthesequestionswillbeansweredinChapter4,Administration.

Chapter2.Command-lineOperationsandIndexesInthischapter,wewillcoverthefollowingrecipes:

CreatingtestdataPerformingsimplequerying,projections,andpaginationfromtheMongoshellUpdatinganddeletingdatafromtheshellCreatinganindexandviewingplansofqueriesBackgroundandforegroundindexcreationfromtheshellCreatinguniqueindexesoncollectionanddeletingtheexistingduplicatedataautomaticallyCreatingandunderstandingsparseindexesExpiringdocumentsafterafixedintervalusingtheTTLindexExpiringdocumentsatagiventimeusingtheTTLindex

CreatingtestdataThisrecipeisaboutcreatingtestdataforsomeoftherecipesinthischapterandalsoforthelaterchaptersinthisbook.WewilldemonstratehowtoloadaCSVfileintoaMongodatabaseusingtheimportutility.Thisisabasicrecipe;ifreadersareawareofthedata-importprocess,theymightjustdownloadtheCSVfile(pincodes.csv)fromthebook’ssite,loaditinthecollectionbythemselves,andskiptherestoftherecipe.Wewillusethedefaultdatabasetest,andthecollectionwillbenamedpostalCodes.

GettingreadyThedatausedhereisforpostalcodesinIndia.Downloadthepincodes.csvfilefromthebook’swebsite.ThefileisaCSVfilewith39,732records;itshouldcreate39,732documentsuponsuccessfulimport.WeneedtohavetheMongoserverupandrunning.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,forinstructionsonhowtostarttheserver.Theservershouldbeginlisteningforconnectionsonthedefaultport27017.

Howtodoit…1. Executethefollowingcommandfromtheshellwiththefiletobeimportedinthe

currentdirectory:

$mongoimport--typecsv-dtest-cpostalCodes--headerline--drop

pincodes.csv

2. StarttheMongoshellbytypinginmongointhecommandprompt.3. Intheshell,executethefollowingcommand:

>db.postalCodes.count()

Howitworks…Assumingthattheserverisupandrunning,theCSVfileisdownloadedandkeptinalocaldirectorywherewecanexecutetheimportutilitywiththefileinthecurrentdirectory.Let’slookattheoptionsgiventotheMongoimportutilityandtheirmeanings:

Command-lineoption Decription

--type ThisspecifiesthatthetypeofinputfileisCSV.ItdefaultstoJSON,theotherpossiblevaluebeingTSV.

-d Thisisthetargetdatabaseintowhichthedatawillbeloaded.

-c Thisisthecollectionintheprecedingdatabaseintowhichthedatawillbeloaded.

--

headerline

ThisisrelevantonlyinthecaseofTSVorCSVfiles.Itindicatesthatthefirstlineofthefileistheheader.Thesamenamewillbeusedasthenameofthefieldinthedocument.

--drop Thisindicatesthatweneedtodropthecollectionbeforethedatagetsloadedinit.

Afteralltheoptionsaregiven.thefinalvalueinthecommandpromptisthenameofthefile,pincodes.csv.

Iftheimportgoesthroughsuccessfully,youwillseesomethinglikethefollowingoutputontheconsole:

connectedto:127.0.0.1

MonDec923:29:13.004Progress:1593394/228608069%

MonDec923:29:13.014280009333/second

MonDec923:29:14.116check939733

MonDec923:29:14.116imported39732objects

Finally,wewillstarttheMongoshellandfindthecountofthedocumentsinthecollection.Itshouldindeedbe39,732,asseenintheprecedingimportlog.

NoteThepostalcodedataistakenfromhttps://github.com/kishorek/India-Codes/.Thisdataisnottakenfromanofficialsourceandmightnotbeaccurateasitisbeingcompiledmanuallyforfreepublicuse.ThankstoKishoreforcompilingthedataandsharingit.

SeealsoThePerformingsimplequerying,projections,andpaginationfromtheMongoshellrecipetoknowsomebasicqueriesonthedataimported

Performingsimplequerying,projections,andpaginationfromtheMongoshellInthisrecipe,wewillgetourhandsdirtywithabitofqueryingtoselectdocumentsfromthetestdatawesetupinthepreviousrecipe.Thereisnothingextravagantinthisrecipe,andsomeonewellversedwithquerylanguagebasicscanskipthisrecipecomfortably.Otherswhoaren’ttoocomfortablewithbasicqueryingorthosewhowanttogetasmallrefreshercancontinuetoreadthenextsectionoftherecipe.Additionally,thisrecipeisintendedtogiveafeelofthetestdatasetupfromthepreviousrecipe.

GettingreadyToexecutesimplequeries,weneedtohaveaserverupandrunning.Asimplesinglenodeiswhatwewillneed.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Thedataonwhichwewillbeoperatingneedstobeimportedintothedatabase.Thestepstoimportthedataaregiveninthepreviousrecipe.YoualsoneedtostarttheMongoshellandconnecttotheserverthatrunsonthelocalhost.Oncewehavetheseprerequisites,wearegoodtogo.

Howtodoit…1. Let’sfirstfindacountofdocumentsinthecollection:

>db.postalCodes.count()

2. Let’sfindjustonedocumentfromthepostalCodescollection:

>db.postalCodes.findOne()

3. Now,weneedtofindmultipledocumentsinthecollection:

>db.postalCodes.find().pretty()

4. Theprecedingqueryretrievedallthekeysofthefirst20documentsanddisplayedthemontheshell.Let’sdoacoupleofthingsnow;wewilljustdisplaythecity,state,andpincodefields.Additionally,wewanttodisplaythedocumentsnumbered91to100inthecollection.Todothis,executethefollowingcommand:

>db.postalCodes.find({},{_id:0,city:1,state:1,

pincode:1}).skip(90).limit(10)

5. Let’smoveastepaheadandwriteaslightlycomplexquerywherewewillfindthetop10citiesinthestateofGujaratsortedbythenameofthecity.Likethelastquery,wewilljustselectthecity,state,andpincodefields:

>db.postalCodes.find({state:'Gujarat'},{_id:0,city:1,state:1,

pincode:1}).sort({city:1}).limit(10)

Howitworks…Therecipeisprettysimpleandallowsustogetafeelofthetestdatawesetupinthepreviousrecipe.Nevertheless,aswiththeotherrecipes,Ioweyouallsomeexplanationaboutwhatwedidhere.

Wefirstfoundthecountofthedocumentsinthecollectionusingdb.postalCodes.count(),anditshouldgiveus39,732documents.ThisshouldbeinsyncwiththelogswesawwhileimportingthedataintothepostalCodescollection.Next,wequeriedforonedocumentfromthecollectionusingfindOne.Thismethodreturnedthefirstdocumentintheresultsetofthequery.Intheabsenceofaqueryorsortorder,asinthiscase,itwillbethefirstdocumentinthecollectionsortedbyitsnaturalorder.

Next,weusedfindratherthanfindOne.Thedifferenceisthatthefindoperationreturnsaniteratorfortheresultsetthatwecanusetotraversethroughtheresultsofthefindoperation,whereasfindOnereturnsadocument.Addingaprettymethodcalltothefindoperationwillprinttheresultinaprettyorformattedway.

NoteNotethattheprettymethodmakessenseandworksonlywiththefindmethodandnotwithfindOne.ThisisbecausethereturnvalueoffindOneisadocument,andthereisnoprettyoperationonthereturneddocument.

WewillnowexecutethefollowingqueryontheMongoshell:

>db.postalCodes.find({},{_id:0,city:1,state:1,

pincode:1}).skip(90).limit(10)

Here,wewillpasstwoparameterstothefindmethod:

Thefirstoneis{},whichisthequerytoselectthedocuments;inthiscase,wewillaskMongotoselectallthedocuments.Thesecondparameteristhesetoffieldsthatwewantintheresultdocuments.Rememberthatthe_idfieldispresentbydefault,unlessweexplicitlysay_id:0.Foralltheotherfields,weneedtosay<field_name>:1or<field_name>:true.Thefindmethodwithprojectionsisthesameassayingselectfield1andfield2fromthetableintherelationalworld,andnotspecifyingthefieldstobeselectedinthefindmethodislikesayingselect*fromthetableintherelationalworld.

Movingon,wejustneedtolookatwhatskipandlimitdo.Theskipfunctionskipsthegivennumberofdocumentsfromtheresultset,allthewayuptotheenddocumentintheresultset.Thelimitfunctionthenlimitstheresulttothegivennumberofdocuments.

Let’sseewhatallthismeanswithanexample.Byexecuting.skip(90).limit(10),wesaythatwewanttoskipthefirst90documentsfromtheresultsetandstartreturningfromtheninety-firstdocument.Thelimit,however,saysthatwewillreturnonly10documentsfromtheninety-firstdocument.

Now,therearesomeborderconditionsthatweneedtoknowhere.Whatifskipisbeingprovidedwithavaluemorethanthetotalnumberofdocumentsinthecollection?Well,in

thiscase,nodocumentswillbereturned.Also,ifthenumberprovidedtothelimitfunctionismorethantheactualnumberofdocumentsthatremaininthecollection,thenumberofdocumentsreturnedwillbethesameastheremainingdocumentsinthecollection,andnoexceptionwillbethrownineithercase.

UpdatinganddeletingdatafromtheshellThis,again,willbeasimplerecipethatwilllookatexecutingdeletesandupdatesonatestcollection.Wewon’tdealwiththesametestdataweimported,aswedon’twanttoupdate/deleteanyofthis;instead,wewillworkonatestcollectioncreatedonlyforthisrecipe.

GettingreadyForthisrecipe,wewillcreateacollectioncalledupdAndDelTest.Wewillneedtheservertobeupandrunning.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Also,starttheshellwiththeUpdAndDelTest.jsscriptloaded.Thisscriptwillbeavailableonthebook’swebsitefordownload.Toknowhowtostarttheshellwithareloadedscript,refertotheConnectingtoasinglenodefromtheMongoshellwithapreloadedJavaScriptrecipeinChapter1,InstallingandStartingtheMongoDBServer.

Howtodoit…1. Withtheshellstartedandthescriptloaded,executethefollowingcommandinthe

shell:

>prepareTestData()

Ifallgoeswell,youshouldsee20documentsinsertedinupdAndDelTestandprintedontheconsole.

2. Togetafeelofthecollection,let’squeryitasfollows:

>db.updAndDelTest.find({},{_id:0})

3. Wewillseethatforeachvalueofxas1and2,wehaveyincrementingfrom1to10.4. Wewillfirstupdatesomedocumentsandobservetheresults.Executethefollowing

updatecommand:

>db.updAndDelTest.update({x:1},{$set:{y:0}})

5. Executethefollowingfindcommandandobservetheresults;wewillget10documents(foreachofthem,notethevalueofy):

>db.updAndDelTest.find({x:1},{_id:0})

6. Wewillnowexecutethefollowingupdatecommand:

>db.updAndDelTest.update({x:1},{$set:{y:0}},false,true)

7. Executingthequerygiveninstep5againtoviewtheupdateddocumentswillshowthesamedocumentswesawearlier.Takeanoteofthevaluesofyagainandcomparethemtotheresultswesawwhenweexecutedthisquerythelasttimearoundbeforeexecutingtheupdatecommandgiveninstep6.

8. Wewillnowseehowthedeleteoperationworks.Wewillagainchoosethedocumentswherexis1forthedeletiontest.Let’sdeleteallthedocumentswherexis1fromthecollection:

>db.updAndDelTest.remove({x:1})

9. Executethefollowingfindcommandandobservetheresults.Wewillnotgetanyresults.Itseemsthattheremoveoperationhasremovedallthedocumentswithxas1:

>db.updAndDelTest.find({x:1},{_id:0})

Howitworks…First,wesetupthedatathatwewouldbeusingforupdatesanddeletion.Wehavealreadyseenthedataandknowwhatitis.Aninterestingthingtoobserveisthat,whenweexecuteanupdatesuchasdb.updAndDelTest.update({x:1},{$set:{y:0}}),itonlyupdatesthefirstdocumentthatmatchesthequeryprovidedasthefirstparameter.Thisissomethingwewillobservewhenwequerythecollectionafterthisupdate.Theupdatefunctionhasthedb.<collectionname>.update(query,updateobject,isUpsert,isMulti)format.

WewillseewhatUpsertisinAtomicfindandmodifyoperationsrecipeinChapter5,AdvancedOperations.Thefourthparameter(isMulti)isbydefaultfalse,andthismeansthatmultipledocumentswillnotbeupdatedbytheupdatecall.So,onlythefirstmatchingdocumentwillbeupdatedbydefault.However,whenweexecutedb.updAndDelTest.update({x:1},{$set:{y:0}},false,true)withthefourthparametersettotrue,allthedocumentsinthecollectionthatmatchthegivenquerygetupdated.Thisissomethingwecanverifyafterqueryingthecollection.

Removals,onotherhand,behavedifferently.Bydefault,theremoveoperationdeletesallthedocumentsthatmatchtheprovidedquery.However,ifwewishtodeleteonlyonedocument,wewillexplicitlypassthesecondparameterastrue.

NoteThedefaultbehaviorofupdateandremoveisdifferent.Anupdatecallbydefaultupdatesonlythefirstmatchingdocument,whereasremovedeletesallthedocumentsthatmatchthequery.

CreatinganindexandviewingplansofqueriesInthisrecipe,wewilllookatqueryingdata,analyzingitsperformancebyexplainingthequeryplan,andthenoptimizingitbycreatingindexes.

GettingreadyForthecreationofindexes,weneedtohaveaserverupandrunning.Asimplesinglenodeiswhatwewillneed.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Thedatawithwhichwewillbeoperatingneedstobeimportedintothedatabase.ThestepstoimportthedataaregivenintheCreatingtestdatarecipe.Oncewehavethisprerequisite,wearegoodtogo.

Howtodoit…Wewilltryingtowriteaquerythatwillfindallthezipcodesinagivenstate.Todothis,performthefollowingsteps:

1. Executethefollowingquerytoviewtheplanofaquery:

>db.postalCodes.find({state:'Maharashtra'}).explain()

Takeanoteofthecursor,n,nscannedObjects,andmillisfieldsintheresultoftheexplainplanoperation

2. Let’sexecutethesamequeryagain;thistime,however,wewilllimittheresultstoonly100results:

>db.postalCodes.find({state:'Maharashtra'}).limit(100).explain()

Again,takeanoteofthecursor,n,nscannedObjects,andmillisfieldsintheresult

3. Wewillnowcreateanindexonthestateandpincodefieldsasfollows:

>db.postalCodes.ensureIndex({state:1,pincode:1})

4. Executethefollowingquery:

>db.postalCodes.find({state:'Maharashtra'}).explain()

Again,takeanoteofthecursor,n,nscannedObjects,millis,andindexOnlyfieldsintheresult

5. Sincewewantonlythepincodes,wewillmodifythequeryasfollowsandviewitsplan:

>db.postalCodes.find({state:'Maharashtra'},{pincode:1,

_id:0}).explain()

Takeanoteofthecursor,n,nscannedObjects,nscanned,millis,andindexOnlyfieldsintheresult.

Howitworks…Thereisalottoexplainhere.Wewillfirstdiscusswhatwejustdidandhowtoanalyzethestats.Next,wewilldiscusssomepointstobekeptinmindforindexcreationandsomegotchas.

AnalyzingtheplanLet’slookatthefirststepandanalyzetheoutputweexecuted:

>db.postalCodes.find({state:'Maharashtra'}).explain()

Theoutputonmymachineisasfollows(Iamskippingthenonrelevantfieldsfornow):

{

"cursor":"BasicCursor",

"n":6446,

"nscannedObjects":39732,

"nscanned":39732,

"millis":55,

}

ThevalueofthecursorfieldintheresultisBasicCursor,whichmeansafullcollectionscan(allthedocumentsarescannedoneafteranother)hashappenedtosearchthematchingdocumentsintheentirecollection.Thevalueofnis6446,whichisthenumberofresultsthatmatchedthequery.Thenscannedandnscannedobjectsfieldshavevaluesof39,732,whichisthenumberofdocumentsinthecollectionthatarescannedtoretrievetheresults.Thisisthealsothetotalnumberofdocumentspresentinthecollection,andallwerescannedfortheresult.Finally,millisisthenumberofmillisecondstakentoretrievetheresult.

ImprovingthequeryexecutiontimeSofar,thequerydoesn’tlooktoogoodintermsofperformance,andthereisgreatscopeforimprovement.Todemonstratehowthelimitappliedtothequeryaffectsthequeryplan,wecanfindthequeryplanagainwithouttheindexbutwiththelimitclause:

>db.postalCodes.find({state:'Maharashtra'}).limit(100).explain()

{

"cursor":"BasicCursor",…

"n":100,

"nscannedObjects":19951,

"nscanned":19951,

"millis":30,

}

Thequeryplanthistimearoundisinteresting.Thoughwestillhaven’tcreatedanindex,wesawanimprovementinthetimethequerytookforexecutionandthenumberof

objectsscannedtoretrievetheresults.ThisisduetothefactthatMongodoesnotscantheremainingdocumentsoncethenumberofdocumentsspecifiedinthelimitfunctionisreached.Wecanthusconcludethatitisrecommendedthatyouusethelimitfunctiontolimityournumberofresults,whereasthemaximumnumberofdocumentsaccessedisknownupfront.Thismightgivebetterqueryperformance.Theword“might”isimportantas,intheabsenceofanindex,thecollectionmightstillbecompletelyscannedifthenumberofmatchesisnotmet.

ImprovementusingindexesMovingon,wewillcreateacompoundindexonstateandpincode.Theorderoftheindexisascendinginthiscase(asthevalueis1)andisnotsignificantunlessweplantoexecuteamultikeysort.ThisisadecidingfactorastowhethertheresultcanbesortedusingonlytheindexorwhetherMongoneedstosortitinmemorylateron,beforewereturntheresults.Asfarastheplanofthequeryisconcerned,wecanseethatthereisasignificantimprovement:

{

"cursor":"BtreeCursorstate_1_pincode_1",

"n":6446,

"nscannedObjects":6446,

"nscanned":6446,

"indexOnly":false,

"millis":16,

}

ThecursorfieldnowhastheBtreeCursorstate_1_pincode_1value,whichshowsthattheindexisindeedusednow.Asexpected,thenumberofresultsstaysthesameat6446.Thenumberofobjectsscannedintheindexanddocumentsscannedinthecollectionhavenowreducedtothesamenumberofdocumentsasintheresult.Thisisbecausewehavenowusedanindexthatgaveusthestartingdocumentfromwhichwecouldscan;then,onlytherequirednumberofdocumentswasscanned.Thisissimilartousingthebook’sindextofindawordorscanningtheentirebooktosearchfortheword.Thetime,millis,hascomedowntoo,asexpected.

ImprovementusingcoveredindexesThisleavesuswithonefield,indexOnly,andwewillseewhatthismeans.Toknowwhatthisvalueis,weneedtolookbrieflyathowindexesoperate.

Indexesstoreasubsetoffieldsoftheoriginaldocumentinthecollection.Thefieldspresentintheindexarethesameasthoseonwhichtheindexiscreated.Thefields,however,arekeptsortedintheindexinanorderspecifiedduringthecreationoftheindex.Apartfromthefields,thereisanadditionalvaluestoredintheindex;thisactsasapointertotheoriginaldocumentinthecollection.Thus,whenevertheuserexecutesaquery,ifthe

querycontainsfieldsonwhichanindexispresent,theindexisconsultedtogetasetofmatches.ThepointerstoredwiththeindexentriesthatmatchthequeryisthenusedtomakeanotherI/Ooperationtofetchthecompletedocumentfromthecollection;thisdocumentisthenreturnedtotheuser.

ThevalueofindexOnly,whichisfalse,indicatesthatthedatarequestedbytheuserinthequeryisnotentirelypresentintheindex;anadditionalI/Ooperationisneededtoretrievetheentiredocumentfromthecollectionthatfollowsthepointerfromtheindex.Hadthevaluebeenpresentintheindexitself,anadditionaloperationtoretrievethedocumentfromthecollectionwouldnotbenecessary,andthedatafromtheindexwillbereturned.Thisiscalledacoveredindex,andthevalueofindexOnly,inthiscase,willbetrue.

Inourcase,wejustneedthepincodes,sowhynotuseprojectioninourqueriestoretrievejustwhatweneed?Thiswillalsomaketheindexcoveredastheindexentrythatjusthasthestate’snameandpincode,andtherequireddata,canbeservedcompletelywithoutretrievingtheoriginaldocumentfromthecollection.Theplanofthequeryinthiscaseisinterestingtoo.Executingthefollowingqueryplan:

db.postalCodes.find({state:'Maharashtra'},{pincode:1,_id:0}).explain()

{

"cursor":"BtreeCursorstate_1_pincode_1",

"n":6446,

"nscannedObjects":0,

"nscanned":6446,

"indexOnly":true,

"millis":15,

}

ThevaluesofthenscannedobjectsandindexOnlyfieldsaresomethingtobeobserved.Asexpected,sincethedatawerequestedintheprojectioninthefindqueryisthepincodeonly,whichcanbeservedfromtheindexalone,thevalueofindexOnlyistrue.Inthiscase,wescanned6,446entriesintheindex;thus,thenscannedvalueis6446.We,however,didn’treachouttoanydocumentinthecollectiononthedisk,asthisquerywascoveredbytheindexalone,andnoadditionalI/Owasneededtoretrievetheentiredocument.Hence,thevalueofnscannedobjectsis0.

Asthiscollectioninourcaseissmall,wedonotseeasignificantdifferenceintheexecutiontimeofthequery.Thiswillbemoreevidentonlargercollections.Makinguseofindexesisgreatandgivesgoodperformance.Makinguseofcoveredindexesgivesevenbetterperformance.

NoteAnotherthingtorememberisthat,whereverpossible,tryanduseprojectiontoretrieveonlythenumberoffieldsweneed.The_idfieldisretrievedeverytimebydefault,unlessweplantoset_id:0tonotretrieveitifitisnotpartoftheindex.Executingacovered

queryisthemostefficientwaytoqueryacollection.

SomegotchasofindexcreationWewillnowseesomepitfallsinindexcreationandsomefactsaboutthearrayfield,whichisusedintheindex.

Someoftheoperatorsthatdonotusetheindexefficientlyarethe$where,$nin,and$existsoperators.Whenevertheseoperatorsareusedinthequery,oneshouldbearinmindapossibleperformancebottleneckwhenthedatasizeincreases.Similarly,the$inoperatormustbepreferredoverthe$oroperator,asbothcanbemoreorlessusedtoachievethesameresult.Asanexercise,trytofindthepincodesinthestateofMaharashtraandGujaratfromthepostalCodescollection.Writetwoqueries:oneusingthe$oroperatorandtheotherusingthe$inoperator.Explaintheplanforboththesequeries.

Whathappenswhenanarrayfieldisusedintheindex?Mongocreatesanindexentryforeachelementpresentinthearrayfieldofadocument.So,ifthereare10elementsinanarrayinadocument,therewillbe10indexentries,oneforeachelementinthearray.However,thereisaconstraintwhilecreatingindexesthatcontainarrayfields.Whencreatingindexesusingmultiplefields,nomorethanonefieldcanbeofthearraytype.Thisisdonetopreventapossibleexplosioninthenumberofindexesonaddingevenasingleelementtothearrayusedintheindex.Ifwethinkaboutitcarefully,foreachelementinthearray,anindexentryiscreated.Ifmultiplefieldsoftypearraywereallowedtobepartofanindex,wewouldhavealargenumberofentriesintheindexthatwouldbeaproductofthelengthofthesearrayfields.Forexample,adocumentaddedwithtwoarrayfields,eachoflength10,willadd100entriestotheindex,haditbeenallowedtocreateoneindexusingthesetwoarrayfields.

Thisshouldbegoodenoughfornowtoscratchthesurfacesofaplainvanillaindex.Wewillseemoreoptionsandtypesinsomeoftheupcomingrecipes.

BackgroundandforegroundindexcreationfromtheshellInthepreviousrecipe,welookedathowtoanalyzequeries,howtodecidewhatindexneedstobecreated,andhowwecreateindexes.This,byitself,isstraightforwardandlooksreasonablysimple.However,forlargecollections,thingsstartgettingworseastheindex-creationtimeislarge.Therearesomecaveatsthatweneedtokeepinmind.Theobjectiveofthisrecipeistothrowsomelightontheseconceptsandavoidpitfallswhilecreatingindexes,especiallyonlargecollections.

GettingreadyForthecreationofindexes,weneedtohaveaserverupandrunning.Asimplesinglenodeiswhatwewillneed.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,forhowtostarttheserver.

Startconnectingtwoshellstotheserverbyjusttypinginmongofromtheoperatingsystemshell.Bothofthemwill,bydefault,connecttothetestdatabase.

Ourtestdataforzippedcodesisprettysmalltodemonstratetheproblemfacedduringindexcreationonlargecollections.Weneedtohavemoredata;thus,wewillstartbycreatingsometosimulatetheproblemsduringindexcreation.Thedatahasnopracticalmeaningbutisgoodenoughtotesttheconcepts.Copythefollowingpieceofcodeinoneofthestartedshellsandexecuteit(itisaprettyeasysnippettotypeouttoo):

for(i=0;i<5000000;i++){

doc={}

doc._id=i

doc.value='Sometextwithnomeaningandnumber'+i+'inbetween'

db.indexTest.insert(doc)

}

Adocumentinthiscollectionwillbeasfollows:

{_id:0,value:"Sometextwithnomeaningandnumber0inbetween"}

Executionwilltakeaquitealotoftime,soweneedtobepatient.Oncetheexecutionisover,weareallsetfortheaction.

NoteIfyouarekeentoknowwhatthecurrentnumberofdocumentsloadedinthecollectionis,evaluatethefollowingcommandfromthesecondshellperiodically:

>db.indexTest.count()

Howtodoit…1. Createanindexonthevaluefieldofthedocument:

>db.indexTest.ensureIndex({value:1})

2. Whiletheindexcreationisinprogress,whichwilltakequitesometime,switchovertothesecondconsoleandexecutethefollowingcommand:

>db.indexTest.findOne()

BoththeindexcreationshellandtheonewhereweexecutedfindOnewillbeblocked,andthepromptwillnotbeshownonbothofthemuntiltheindexcreationiscomplete.

3. Now,thiswasforegroundindexcreationbydefault.Wewanttoseethebehaviorinbackgroundindexcreation.Dropthecreatedindex:

>db.indexTest.dropIndex({value:1})

4. Createtheindexagainbut,thistime,inthebackground:

>db.indexTest.ensureIndex({value:1},{background:true})

5. InthesecondMongoshell,executethefindOnequerythistimearound:

>db.indexTest.findOne()

Thisshouldreturnonedocumentthistimearound,unlikethefirstinstancewheretheoperationwasblockeduntilindexcreationcompletedintheforeground

6. Inthesecondshell,alsorepeatedlyexecutethefollowingexplainoperationwithanintervalofabout4to5secondsbetweeneachexplainplaninvocationuntiltheindex-creationprocessiscomplete:

>db.indexTest.find({value:"Sometextwithnomeaningandnumber0in

between"}).explain()

Howitworks…Let’snowanalyzewhatwejustdid.Wecreatedabout5milliondocumentswithnopracticalimportance,butwearejustlookingtogetsomedatathatwilltakeasignificantamountoftimeforindexbuilding.

Indexescanbebuiltintwoways,intheforegroundandbackground.Ineithercase,theshelldoesn’tshowthepromptuntiltheensureIndexoperationiscompletedanditdoesn’tshowtheblockstilltheindexiscreated.Youmightthenbewonderingwhatdifferenceitmakestocreateanindexinthebackgroundorforeground.

Thatisexactlywherethesecondshellwestartedcameintothepicture.Thisiswherewedemonstratedthedifferencebetweenabackgroundandforegroundindex-creationprocess.Wefirstcreatedtheindexintheforeground,whichisthedefaultbehavior.Thisindexbuildingdidn’tallowustoquerythecollection(fromthesecondshell)untiltheindexwasconstructed.ThefindOneoperationisblockeduntiltheentireindexisbuilt(fromthefirstshell)beforereturningtheresult.Ontheotherhand,theindexthatwasbuiltinthebackgrounddidn’tblockthefindOneoperation.Ifyouwanttotryinsertingnewdocumentsintothecollectionwhiletheindexbuildison,thistooshouldworkwell.FeelfreetodroptheindexandrecreateitinthebackgroundwhilesimultaneouslyinsertingadocumentintheindexTestcollection;youwillnoticethatitworkssmoothly.

Well,whatisthedifferencebetweenthetwoapproachesandwhynotalwaysbuildtheindexinthebackground?Apartfromanextraparameter,{background:true},whichcanalsobe{background:1},passedasasecondparametertotheensureIndexcall,therearefewdifferences.Theindex-creationprocessinthebackgroundwillbeslightlyslowerthantheindexcreatedintheforeground.Furthermore,internally,thoughitisnotrelevanttotheenduser,theindexcreatedintheforegroundwillbemorecompactthantheonecreatedinthebackground.

Otherthanthat,therewillbenosignificantdifference.Infact,ifasystemisrunningandanindexneedstobecreatedwhileitisservingtheendusers(notrecommended,buttherecanbeasituationthatdemandsindexcreationonalivesystem),thencreatingtheindexinthebackgroundistheonlywaywecandoit.ThereareotherstrategiesforperformingsuchadministrativeactivitiesthatwewillseeinsomerecipesinChapter4,Administration.

Tomakethingsworseforforegroundindexcreation,thelockacquiredbyMongoduringindexcreationisnotatthecollectionlevelbutatthedatabaselevel.Toexplainwhatthismeans,wewillhavetodroptheindexontheindexTestcollectionandperformasmallexerciseasfollows:

1. Startbycreatingtheindexintheforegroundfromtheshellbyexecutingthefollowingcommand:

>db.indexTest.ensureIndex({value:1})

2. Now,insertadocumentinthepersoncollection,whichmightormightnotexistatthispointinthetestdatabase:

>db.person.insert({name:'Amol'})

Wewillseethatthisinsertoperationonthepersoncollectionwillcreateablock,whiletheindexcreationontheindexTestcollectionisinprocess.If,however,thisinsertoperationisdoneonacollectioninadifferentdatabaseduringindexbuild(youcantrythisouttoo),itwillexecutenormallywithoutblocking.Thisclearlyshowsthatthelockisacquiredatthedatabaselevelandnotatthecollectionlevelorgloballevel.

NotePriortoversion2.2ofMongo,lockswereatthegloballevel,whichisatthemongodprocesslevelandnotatthedatabaselevelaswesawearlier.YouneedtorememberthisfactwhendealingwiththedistributionofMongothatisolderthanversion2.2.

CreatinguniqueindexesoncollectionanddeletingtheexistingduplicatedataautomaticallyInthisrecipe,wewilllookatcreatinguniqueindexesonacollection.Uniqueindexes,fromthenameitself,tellusthatthevaluewithwhichtheindexiscreatedhastobeunique.Whatifthecollectionalreadyhasdataandwewanttocreateauniqueindexonafieldwhosevalueisnotuniqueintheexistingdata?

Obviously,wecannotcreatetheindex,anditwillfail.Thereis,however,awaytodroptheduplicatesandcreatetheindex.Curioushowthiscanbeachieved?Yes?Keepreadingthisrecipe.

GettingreadyForthisrecipe,wewillcreateacollectioncalleduserDetails.Wewillneedtheservertobeupandrunning.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Also,starttheshellwiththeUniqueIndexData.jsscriptloaded.Thisscriptwillbeavailableonthebook’swebsitefordownload.Tofindouthowtostarttheshellwithascriptreloaded,refertotheConnectingtoasinglenodefromtheMongoshellwithapreloadedJavaScriptrecipeinChapter1,InstallingandStartingtheMongoDBServer.

Howtodoit…1. LoadtherequireddatainthecollectionusingtheloadUserDetailsDatamethod.2. ExecutethefollowingcommandontheMongoshell:

>loadUserDetailsData()

3. Seethecountofthedocumentsinthecollectionusingthefollowingquery(itshouldbe100):

>db.userDetails.count()

4. Now,trytocreateauniqueindexontheloginfieldontheuserDetailscollection:

>db.userDetails.ensureIndex({login:1},{unique:true})

5. Thiswillnotbesuccessfulandsomethinglikethefollowingerrorwillbeseenontheconsole:

{

"err":"E11000duplicatekeyerrorindex:test.userDetails.$login_1

dupkey:{:\"bander\"}",

"code":11000,

"n":0,

"connectionId":6,

"ok":1

}

6. Next,wewilltrytocreateanindexonthiscollectionbyeliminatingtheduplicates:

>db.userDetails.ensureIndex({login:1},{unique:true,dropDups:true})

7. Thiswillthrownoerrorsandfindthecountinthecollectionagain(takeanoteofthecountandcompareitwiththecountseenearlier,priortoindexcreation):

>db.userDetails.count()

8. Checkwhethertheindexisbeingusedbyviewingtheplanofthequery:

>db.userDetails.find({login:'mtaylo'}).explain()

Howitworks…Weinitiallyloadedourcollectionwith100documentsusingtheloadUserDetailsDatafunctionfromtheUniqueIndexData.jsfile.Welooped100timesandloadedthesamedataoverandoveragain.Thus,wegotduplicatedocuments.

WewillthentrytocreateauniqueindexontheloginfieldintheuserDetailscollectionasfollows:

>db.userDetails.ensureIndex({login:1},{unique:true})

Thiscreationfailsandindicatestheduplicatekeyitfirstencounteredonindexcreation.Itisbanderinthiscase.CanyouguesswhyanerrorwasfirstencounteredforthisuserID?ThisisnoteventhefirstIDwesawintheloadeddata.

TipWhenspecifying1inindexcreation,wemeantoconveythattheorderofthevaluesisascending.Trycreatingauniqueindexusing{login:-1}andseeiftheuserIDforwhichtheerrorisencounteredisdifferent.

Insuchascenario,weareleftwithtwooptions:

Manuallypickthedatatobedeleted/fixedandensurethatthefieldonwhichtheindexistobecreatedhasuniquedataacrosscollection.Thiscaneitherbedonemanuallyorprogrammatically,butitisoutsidethescopeofMongoanddonebytheenduseronacase-to-casebasis.Alternatively,ifwedon’tcaremuchaboutthedataasitisgenuinelyduplicatedandweneedtoretainjustonecopyofit,Mongoprovidesabrilliantwaytohandlethis.Apartfromtheregular{unique:true}optionusedtocreateauniqueindex,wewillprovideanadditionaldropDups:trueoption(ordropDups:1ifyouwish)thatwillblindlydeletealltheduplicatedataitencountersduringindexcreation.Notethatthereisnoguaranteeofwhichdocumentwillberetainedandwhichonewillbedeleted,butjustonewillberetained.Inthiscase,thereare20uniqueloginIDs.Onuniqueindexcreation,ifthevalueoftheloginIDisnotalreadypresentintheindex,itwillbeadded.Subsequently,whentheloginIDencounteredisalreadypresentintheindex,thecorrespondingdocumentisdeletedfromthecollection;thisexplainswhywewereleftwithjust20documentsintheuserDetailscollection.

CreatingandunderstandingsparseindexesSchema-freedesignisoneofthefundamentalfeaturesofMongo.Thisallowsdocumentsinacollectiontohavedisparatefields,withsomefieldspresentinsomedocumentsandabsentintheothers.Inotherwords,thesefieldsmightbesparse;thismighthavealreadygivenyouacluetowhatsparseindexesare.Inthisrecipe,wewillcreatesomerandomtestdataandseehowsparseindexesbehaveagainstanormalindex.Wewillseetheadvantageofusingasparseindexandonemajorpitfallinitsusage.

GettingreadyForthisrecipe,weneedtocreateacollectioncalledsparseTest.Wewillrequireaservertobeupandrunning.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Also,starttheshellwiththeSparseIndexData.jsscriptloaded.Thisscriptwillbeavailableonthebook’swebsitefordownload.Toknowhowtostarttheshellwithascriptreloaded,refertotheConnectingtoasinglenodefromtheMongoshellwithapreloadedJavaScriptrecipeinChapter1,InstallingandStartingtheMongoDBServer.

Howtodoit…1. Loadthedatainthecollectionbyinvokingthefollowingcommand(thisshould

import100documentsinthesparseTestcollection):

>createSparseIndexData()

2. Now,takealookatthedatabyexecutingthefollowingquery,takingnoteoftheyfieldinthetopfewresults:

>db.sparseTest.find({},{_id:0})

3. Wecanseethattheyfieldiseitherabsent,oritisuniqueifitispresent.Let’sthenexecutethefollowingquery:

>db.sparseTest.find({y:{$ne:2}},{_id:0}).limit(15)

4. Takeanoteoftheresult;itcontainsboththedocumentsthatmatchtheconditionaswellasthefieldsthatdonotcontainthegivenyfield.

5. Sincethevalueofyseemsunique,let’screateanewuniqueindexontheyfield:

>db.sparseTest.ensureIndex({y:1},{unique:1})

6. Thisthrowsanerror;itcomplainsthatthevalueisnotuniqueandthattheoffendingvalueisthenullvalue.

7. Wewillfixthisbymakingthisindexsparseasfollows:

>db.sparseTest.ensureIndex({y:1},{unique:1,sparse:1})

8. Thisshouldfixourproblem.Toconfirmthattheindexgotcreated,executethefollowingcommandontheshell:

>db.sparseTest.getIndexes()

9. Thisshouldshowtwoindexes:thedefaultoneon_idandtheonewejustcreatedintheprecedingstep.

10. Now,executethequeryweexecutedinstep3again,andseetheresult.11. Lookattheresultandcompareitwithwhatwesawbeforetheindexwascreated.Re-

executethequerybutwiththefollowinghint,forcingafulltablescan:

>db.sparseTest.find({y:{$ne:2}},{_id:0}).limit(15).hint({$natural:1})

12. Observetheresultagain.

Howitworks…Thosewerealotofstepsandworkthatwedid.Wewillnowdigdeeperandexplaintheinternalsandthereasoningfortheweirdbehaviorwesawwhilequeryingthecollectionthatusedsparseindexes.

ThetestdatathatwecreatedusingtheJavaScriptmethodjustcreateddocumentswithanxkeywhosevalueisanumberstartingfrom1andcangoallthewayupto100.Thevalueofyissetonlywhenxisamultipleof3;itsvaluetooisarunningnumberstartingfrom1andshouldgouptoamaximumof33whenxis99.

Wewillthenexecutethefollowingqueryandseethefollowingresultasexpected:

>db.sparseTest.find({y:{$ne:2}},{_id:0}).limit(15)

{"x":1}

{"x":2}

{"x":3,"y":1}

{"x":4}

{"x":5}

{"x":7}

{"x":8}

{"x":9,"y":3}

{"x":10}

{"x":11}

{"x":12,"y":4}

{"x":13}

{"x":14}

{"x":15,"y":5}

{"x":16}

Thevaluewhereyis2ismissingintheresult,andthisiswhatweintended.Notethatthedocumentswhereyisn’tpresentarestillseenintheresult.Wewillnowplantocreateanindexonthefieldy.Asthefieldiseithernotpresentorhasavaluethatisunique,itseemsnaturalthatauniqueindexshouldwork.

Internally,indexes,bydefault,addanentryintheindexevenifthefieldisabsentintheoriginaldocumentinthecollection.Thevaluethatgoesintheindexwillhoweverbenull.Thismeansthattherewillbethesamenumberofentriesintheindexasthenumberofdocumentsinthecollection.Forauniqueindex,thevalue(includingnullvalues)shouldbeuniqueacrossthecollection;thisexplainswhywegotanexceptionduringindexcreationwherethefieldissparse(notpresentinalldocuments).

Asolutionforthisproblemistomaketheindexsparse,andallwedidwasaddsparse:1totheoptionsalongwithunique:1.Thisdoesnotputanentryintheindexifthefielddoesn’texistinthedocument.Thus,theindexwillnowcontainfewerentries.Itwillonlycontainthoseentrieswherethefieldispresentinthedocument.Thisnotonlymakestheindexsmaller,makingiteasytofitinmemory,butalsosolvesourproblemofaddingauniqueconstraint.Thelastthingwewantistohaveanindexofacollectionwithmillionsofdocumentswithmillionsofentrieswhereonlyafewhundredhavesomevaluesdefined.

Thoughwesawthatcreatingasparseindexmadetheindexefficient,itintroducedanewproblemwheresomequeryresultswerenotconsistent.Whenweexecutedthesamequeryearlier,ityieldeddifferentresults.Executethefollowingcommand:

>db.sparseTest.find({y:{$ne:2}},{_id:0}).limit(15)

{"x":3,"y":1}

{"x":9,"y":3}

{"x":12,"y":4}

{"x":15,"y":5}

{"x":18,"y":6}

{"x":21,"y":7}

{"x":24,"y":8}

{"x":27,"y":9}

{"x":30,"y":10}

{"x":33,"y":11}

{"x":36,"y":12}

{"x":39,"y":13}

{"x":42,"y":14}

{"x":45,"y":15}

{"x":48,"y":16}

Whydidthishappen?Theanswerliesinthequeryplanforthisquery.Executethefollowingcommandtoviewtheplanofthisquery:

>db.sparseTest.find({y:{$ne:2}},{_id:0}).limit(15).explain()

Theplanshowsthatitusedtheindextofetchthematchingresults.Asthisisasparseindex,allthedocumentsthatdidn’thavethefieldyarenotpresentinit;thisdidn’tshowupintheresultthoughitshouldhave.Thisisthepitfallweneedtobecarefulofwhenqueryingacollectionwithasparseindexandthequeryhappenstousetheindex.Itwillyieldunexpectedresults.Onesolutionistoforceafulltablescan,whereweprovidethequeryanalyzerwithahint,usingthehintfunction.

Hintsareusedtoforcequeryanalyzerstouseauser-specifiedindex.Thoughthisisusuallynotrecommendedasyoureallyneedtoknowwhatyouaredoing,thisisoneofthescenarioswherethisisreallyneeded.So,howdoweforceafulltablescan?Allweneedtodoisprovide{$natural:1}inthehintfunction.Thenaturalorderingofacollectionistheorderinwhichitisstoredonthediskforaparticularcollection.Thishintforcesafulltablescan;now,wewillgettheresultsaswedidearlier.Thequeryperformancewill,however,degradeforlargecollections,asitisnowusingafulltablescan.

Ifthefieldispresentinalotofdocuments(thereisnoformalcutoffforwhatisalot;itcanbe50percentforsomeor75percentforothers)andnotreallysparse,makingtheindexsparsedoesn’tmakemuchsense,apartfromwhenwewanttomakeitunique.

NoteRememberthatthenullvalueofafieldandtheonenotpresentinthedocumentaredifferent.Iftwodocumentshaveanullvalueforthesamefield,uniqueindexcreationwillfailandcreatingitassparseindexwillnothelpeither.

ExpiringdocumentsafterafixedintervalusingtheTTLindexOneoftheniceandinterestingfeaturesinMongoisautomaticallyexpiringdatainthecollectionafterapredeterminedamountoftime.Thisisaveryusefultoolwhenwedesiretopurgesomedataolderthanaparticulartimeframe.Forarelationaldatabase,itisnotcommonforfolkstosetupabatchjobthatrunseverynighttoperformthisoperation.

WiththeTimeToLive(TTL)featureofMongo,weneednotworryaboutthisasthedatabasetakescareofitout-of-the-box.Let’sseehowwecanachievethis.

GettingreadyLet’screatesomedatainMongothatwewanttoplaywithusingtheTTLindexes.WewillcreateacollectioncalledttlTestforthispurpose.Wewillrequireaservertobeupandrunning.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Also,starttheshellwiththeTTLData.jsscriptloaded.Thisscriptwillbeavailableonthebook’swebsitefordownload.Toknowhowtostarttheshellwithascriptreloaded,refertotheConnectingtoasinglenodefromtheMongoshellwithapreloadedJavaScriptrecipeinChapter1,InstallingandStartingtheMongoDBServer.

Howtodoit…1. Assumingthattheserverisstartedandthescriptprovidedisloadedontheshell,

invokethefollowingmethodfromtheMongoshell:

>addTTLTestData()

2. CreateaTTLindexonthecreateDatefield:

>db.ttlTest.ensureIndex({createDate:1},{expireAfterSeconds:300})

3. Now,querythecollection:

>db.ttlTest.find()

4. Thisshouldgivethreedocuments.Repeattheprocessandexecutethefindqueryinapproximately30to40secondsrepeatedly,toseethethreedocumentsgettingdeleteduntiltheentirecollectionhaszerodocumentsleftinit.

Howitworks…Let’sstartbyopeningtheTTLData.jsfileandseewhatisgoingoninit.Thecodeisprettysimple;itjustgotthecurrentdateusingnewDate().ItthencreatedthreedocumentswithcreateDatethatweresome4,3,and2minutesbehindthecurrenttimeforthethreedocuments.So,ontheexecutionoftheaddTTLTestData()methodinthisscript,wewillhavethreedocumentsinthettlTestcollection,eachhavingadifferenceof1minuteintheircreationtime.

ThenextstepisthecoreoftheTTLfeature:thecreationoftheTTLindex.ItissimilartothecreationofanyotherindexusingtheensureIndexmethod,exceptthatitalsoacceptsasecondparameter,aJSONobject.Let’sseewhatthesetwoparametersare:

Thefirstparameteris{createDate:1};thiswilltellMongotocreateanindexonthecreateDatefield,andtheorderoftheindexisascendingasthevalueis1(-1wouldhavebeendescending)Thesecondparameter,{expireAfterSeconds:300},iswhatmakesthisindexaTTLindex;ittellsMongotoautomaticallyexpirethedocumentsafter300seconds(5minutes)

OK,but5minutessincewhen?Sincethetimetheywereinsertedinthecollectionorisitsomeothertimestamp?InthiscaseitconsidersthecreateTimefieldasthebase,asthiswasthefieldonwhichwecreatedtheindex.

Thisnowraisesaquestion:ifafieldisbeingusedasthebaseforthecomputationoftime,therehastobesomerestrictiononitstype.Itjustdoesn’tmakesensetocreateaTTLindex,aswecreatedearlier,onacharfieldthatholds,say,thenameofaperson.

Asweguessed,thetypeofthefieldcanbeaBSONtypedateoranarrayofdates.Whatwillhappeninthecasewhereanarrayhasmultipledates?Whatwillbeconsideredinthiscase?

ItturnsoutthatMongousestheminimumofdatesavailableinthearray.Tryoutthisscenarioasanexercise.

Puttwodatesseparatedbyabout5minutesfromeachotherinadocumentagainsttheupdateFieldfieldnameandthencreateaTTLindexonthisfield,asyoudidearlier,toexpirethedocumentafter10minutes(600seconds).Querythecollectionandseewhenthedocumentgetsdeletedfromthecollection.Itshouldgetdeletedafterroughly10minuteshaveelapsedsincetheminimumtimevaluepresentintheupdateFieldarray.

Apartfromtheconstraintforthetypeoffield,thereareafewmoreconstraints.

Ifafieldalreadyhasanindexonit,youcannotcreateaTTLindexonit.Asthe_idfieldofthecollectionalreadyhasanindexbydefault,iteffectivelymeansyoucannotcreateaTTLindexonthe_idfield.ATTLindexcannotbeacompoundindexthatinvolvesmultiplefields.Ifafielddoesn’texist,itwillneverexpire(thisisprettylogical,Iguess).ATTLindexcannotbecreatedoncappedcollections.Incaseyouarenotawareof

cappedcollections,theyarespecialcollectionsinMongowithasizelimitonthemwiththefirstinfirstout(FIFO)insertionorder;theydeleteolddocumentstomakeplacefornewdocuments,ifneeded.

NoteTTLindexesaresupportedonlyonMongoVersion2.2andabove.Alsonotethatthedocumentwillnotbedeletedatexactlythegiventimeinthefield.Thecyclewillbeofthegranularityof1minute;itwilldeleteallthedocumentseligiblefordeletionsincethelasttimethecyclewasrun.

There’smore…Ausecasemightnotdemandthedeletionofallthedocumentsafterafixedintervalhaselapsed.Whatifwewanttocustomizethepointuntilwhichadocumentstaysinthecollection?Thistoocanbeachieved,andwillbedemonstratedinthenextrecipe.

ExpiringdocumentsatagiventimeusingtheTTLindexInthepreviousrecipe,wesawhowdocumentscanbeexpiredafterafixedtimeperiod.However,therecanbesomecaseswherewemightwanttohavedocumentsthatexpireatdifferenttimes.Thisisnotwhatwesawinthepreviousrecipe.Inthisrecipe,wewillseehowwecanspecifythetimeatwhichadocumentcanbeexpired(itmightbedifferentfordifferentdocuments).

GettingreadyForthisrecipe,wewillcreateacollectioncalledttlTest2.Wewillrequireaservertobeupandrunning.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Also,starttheshellwiththeTTLData.jsscriptloaded.Thisscriptwillbeavailableonthebook’swebsitefordownload.Toknowhowtostarttheshellwithascriptreloaded,refertotheConnectingtoasinglenodefromtheMongoshellwithapreloadedJavaScriptrecipeinChapter1,InstallingandStartingtheMongoDBServer.

Howtodoit…1. LoadtherequireddatainthecollectionusingtheaddTTLTestData2method.Execute

thefollowingcommandontheMongoshell:

>addTTLTestData2()

2. Now,createtheTTLindexonthettlTest2collection:

>db.ttlTest2.ensureIndex({expiryDate:1},{expireAfterSeconds:0})

3. Executethefollowingfindquerytoviewthethreedocumentsinthecollection:

>db.ttlTest2.find()

4. Now,afterapproximately4,5,and7minutes,seethedocumentswithIDs2,1,and3,respectively,gettingdeleted.

Howitworks…Let’sstartbyopeningtheTTLData.jsfileandseeingwhatmustbegoingoninit.OurmethodforthisrecipeisaddTTLTestData2.ThismethodsimplycreatesthreedocumentsinthetllTest2collectionwith_idof1,2,and3,withtheirexipryDatefieldsetto5,4,and7minutes,respectively,afterthecurrenttime.Notethatthisfieldhasafuturedate,unlikethedategiveninthepreviousrecipewhereitwasacreationdate.

Next,wewillcreateanindex:

>db.ttlTest2.ensureIndex({expiryDate:1},{expireAfterSeconds:0})

Thisisdifferentfromthewaywecreatedtheindexforthepreviousrecipe,wheretheexpireAfterSecondsfieldoftheobjectwassettoanon-zerovalue.ThisishowthevalueoftheexpireAfterSecondsattributeisinterpreted.Ifthevalueisnon-zero,thatis,thetimeinsecondselapsedafterabasetime,thenthedocumentwillbedeletedfromthecollectionbyMongo.Thisbasetimeisthevalueheldinthefieldonwhichtheindexiscreated(createTime,asinthepreviousrecipe).Ifthisvalueis0,thedatevalueonwhichtheindexiscreated(expiryDateinthiscase)willbethetimewhenthedocumentwillexpire.

Toconclude,TTLindexesworkwellifyouwanttodeletethedocumentuponexpiry.Therearequiteafewcaseswherewemightwanttomovethedocumenttoanarchivecollectionwherethearchivedcollectionmightbecreatedbasedon,say,theyearandmonth.Inanysuchscenario,theTTLindexisnothelpful,andwemightourselveshavetowriteanexternaljobthatdoesthisworkorreadsthecollectionforarangeofdocuments,addsthemtothetargetcollection,anddeletesthemfromthesourcecollection.JIRA(https://jira.mongodb.org/browse/SERVER-6895)isalreadyopentoaddressthisissue.YoumightwanttokeepaneyeonJIRAforfurtherdevelopmentonit.

There’smore…Inthisandthepreviousrecipe,welookedatwhatTTLindexesareandhowtousethem.However,whatifaftercreatingaTTLindexwewanttomodifyittochangethevalueoftheexpireAfterSecondsvalue?ItispossibleusingthecollModoption.SeemoreonthisoptioninChapter4,Administration.

Chapter3.ProgrammingLanguageDriversInthischapter,wewillcoverthefollowingrecipes:

InstallingPyMongoExecutingqueryandinsertoperationsusingPyMongoExecutingupdateanddeleteoperationsusingPyMongoAggregationinMongousingPyMongoMapReduceinMongousingPyMongoExecutingqueryandinsertoperationsusingaJavaclientExecutingupdateanddeleteoperationsusingaJavaclientAggregationinMongousingaJavaclientMapReduceinMongousingaJavaclient

IntroductionWhatwehaveseensofarusingMongoDBisthatweexecutethemajorityofoperationsfromtheshell.TheMongoshellisagreattoolforadministratorstoperformadministrativetasksandfordeveloperswhowouldliketoquicklytestthingsbyqueryingthedatabeforecodingthelogicintheapplication.However,howdowewriteapplicationcodethatwillallowustoquery,insert,update,anddelete(amongotherthings)thedatainMongoDB?Therehastobealibraryfortheprogramminglanguageinwhichwewriteourapplication.

WeshouldbeabletoinstantiatesomethingorinvokemethodsfromtheprogramtoperformsomeoperationsontheremoteMongoprocess.HowwillthishappenunlessthereisabridgethatunderstandstheprotocolofcommunicationwiththeremoteserverandisabletotransmittheoperationtoexecuteoverthewirewerequireontheMongoDBserverprocessandgettheresultbacktotheclient?Thisbridge,simplyput,iscalledthedriver,whichisalsoreferredtoasaclientlibrary.DriversformthebackboneofMongo’sprogramminglanguageinterface.Intheabsenceofdrivers,itwouldhavebeentheresponsibilityoftheapplicationtocommunicatewiththeMongoDBserverusingaprotocolthattheserverunderstands.Thiswouldhavebeenalotofworknotonlytodevelopbutalsototestandmaintain.Thoughthecommunicationprotocolisstandard,therecannotbeoneimplementationthatworksforalllanguages.Avarietyofprogramminglanguagesneedtohavetheirownimplementationsthatexpose,moreorless,thesamesortofprogramminginterfacetoalllanguages.ThecoreconceptsofclientAPIsthatwewillseeinthechapterholdsgoodforalllanguages.

NoteMongohassupportforallthemajorprogramminglanguagesandissupportedbyMongoDB,Inc.Thereisevenahugearrayofprogramminglanguagessupportedbythecommunity.YoumighttakealookatthevariousplatformssupportedbyMongobyvisitinghttp://docs.mongodb.org/ecosystem/drivers/community-supported-drivers/.

ToknowmoreabouttheunderlyingprotocolusedbyMongoDBoraboutcommunicationbetweentheclientandtheserver,andtoseewhatgoesoverthewire,refertomyblogathttp://amolnayak.blogspot.in/2014/09/mongodb-wire-protocol-analysis_14.html.

InstallingPyMongoThisrecipeisaboutsettingupPyMongo,whichisthePythondriverforMongoDB.Inthisrecipe,wewilldemonstratetheinstallationofPyMongoonboththeWindowsandLinuxplatforms.

GettingreadyAsimplesinglenodeiswhatwewillneedforthesanitytestingofthedriver,oncetheinstallationiscomplete.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.WewillalsorequireanInternetconnectiontodownloadPythonandPyMongo.Oncetheseprerequisitesaremet,wearereadytobegin.

ThefirststepistoinstallPythononthecomputerifitisnotalreadythere.Visithttp://www.python.org/getit/,downloadthelatestversionofPythonforyourplatform,andinstallit.ThestepsfortheinstallationPythonarenotcoveredinthisrecipe.However,beforeyouproceedtothenextsection,Pythonshouldbeavailableonthehostoperatingsystem.

Howtodoit…1. WewillfirstsetupPyMongoontheWindowsplatform.Visit

https://pypi.python.org/pypi/pymongoanddownloadtheMSWindowsInstallerthatisappropriatefortheversionofPythoninstalled.MyPythonversionis2.7,andhence,thiswastheversiondownloaded.

2. Double-clickonthedownloadedinstallerandclickonNext,asshowninthefollowingscreenshot:

3. OnclickingNext,iftherightversionofthePythoninstallationisfound,wecangoaheadwiththeinstallation.WithacouplemoreclicksontheNextbutton,weshouldhavePyMongoinstalled.

Let’sdoasanitytestoftheinstallationasfollows:

1. Fromthecommandprompt,startthePythonshellbytypinginpythonasfollows:

C:\Users\Amol>python

Python2.7.5(default,May152013,22:43:36)[MSCv.150032bit

(Intel)]onwin32

Type"help","copyright","credits"or"license"formoreinformation.

2. WewillthenimportPyMongo;thisshouldhappenwithoutanyerror.Ifwedon’tseeanyimporterror,ourinstallationhasgonethroughsuccessfully:

>>>importpymongo

>>>

Inthissection,wewillseehowtosetupPyMongoonaLinuxsystem.Wewillinstallit

onaDebianflavor,Ubuntu.WewillnotuseUbuntu’sadvancedpackagingtool(apt)toinstallPyMongoforacoupleofreasons:

Ubuntu’sdefaultrepositorymightnothavethelatestreleaseofthedriverTheapttoolisspecifictoDebianLinuxanditsvariants

Therefore,wewillusepip,atooltomanagePythonpackages.ThistoolusesPythonPackageIndex(PyPI)toretrievethedependencies;thisistheofficialrepositoryforthird-partylibrariesinPython.

So,ourinstallationissplitintosectionsasfollows:

Installingpip,ifitisnotalreadyinstalledonUbuntu,usingapt(itwillbedoneinadifferentwayonnon-Debianvariants)UsingpiptoinstallPyMongo;thisstepisthesame,irrespectiveoftheflavorofLinuxyouareusing

Let’sstartbyinstallingpiponUbuntuasfollows:

1. Executethefollowingcommand:

sudoapt-getinstallpython-pip

2. Typeinytoconfirmtheinstallation,anditwilldownloadthepackage.3. Now,installthepackagebyexecutingthefollowingcommand:

amol@Amol-PC:~$sudoapt-getinstallpython-pip

4. Oncethesetupiscomplete,executethefollowingcommandfromtheshelltoinstallPyMongo:

$pipinstallpymongo

5. ThiswillinstallPyMongo.MyPythonversionisthe2.7.xrelease.ForthePython3.xrelease,usepymongo3asthepackagenameinstead:

$pipinstallpymongo3

6. OncetheinstallationofPyMongoiscomplete,wewilldoaquicksanitytest.Fromthecommandpromptoftheoperatingsystem,startthePythonshellbytypinginpython:

amol@Amol-PC:~$python

Python2.7.3(default,Apr102013,05:46:21)

[GCC4.6.3]onlinux2

Type"help","copyright","credits"or"license"formoreinformation.

7. WewillthenimportPyMongo;thiswillhappenwithoutanyerror:

>>>importpymongo

>>>

8. WearenowdonewiththeinstallationofthePyMongosetup.

There’smore…InstallationofPyMongoisjustaprerequisiteforrunningPythoncodethatcanconnecttoMongotoperformtheoperations.Thenextcoupleofrecipes,ExecutingqueryandinsertoperationsusingPyMongoandExecutingupdateanddeleteoperationsusingPyMongo,areallaboutdemonstratingthesebasicoperationsinPythonprogrammingusingPyMongotoconnecttothedatabaseandexecutethem.

ExecutingqueryandinsertoperationsusingPyMongoThisrecipeisallaboutexecutingbasicqueryandinsertoperationsusingPyMongo.ThisissimilartowhatwedidwiththeMongoshellearlierinthebook.

GettingreadyToexecutesimplequeries,weneedtohaveaserverupandrunning.Asimplesinglenodeiswhatwewillneed.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Thedataonwhichwewilloperateneedstobeimportedinthedatabase.ThestepstoimportthedataaregivenintheCreatingtestdatarecipeinChapter2,Command-lineOperationsandIndexes.PythonisexpectedtobeinstalledonthehostoperatingsystemandMongo’sclientforpython,PyMongo,needstobeinstalled.LookatthepreviousrecipetoknowhowtoinstallPyMongoforyourhostoperatingsystem.Also,inthisrecipe,wewillexecuteinsertoperationsandprovideawriteconcerntouse.

Howtodoit…Let’sstartwithsomequeryingforMongofromthePythonshell.ThiswillbeidenticaltowhatwedofromtheMongoshell,exceptthatthisisinthePythonprogramminglanguageasopposedtoJavaScriptthatwehaveintheMongoshell.WecanusethebasicsthatwewillseeheretodeveloplargescaleproductionsystemsthatrunonPythonanduseMongoDBasadatastore.

Let’sgetstartedbyfirststartingthePythonshellfromtheoperatingsystem’scommandprompt.Thefollowingstepsareindependentofthehostoperatingsystem:

1. Typeinthefollowingcommandintheshell,andthePythonshellwillstart:

$python

Python2.7.5(default,May152013,22:43:36)[MSCv.150032bit

(Intel)]onwin32Type"help","copyright","credits"or"license"for

moreinformation.

>>>

2. Then,importthepymongopackageandcreatetheclientasfollows:

>>>importpymongo

>>>client=pymongo.MongoClient('localhost',27017)

Analternativewaytoconnectisasfollows:

>>>client=pymongo.MongoClient('mongodb://localhost:27017')

3. Thisworkswelltooandachievesthesameresult.Nowthatwehavetheclient,ournextstepistogetthedatabaseonwhichwewillperformtheoperations.Now,unlikesomeprogramminglanguageswherewehaveagetDatabase()methodtogetaninstanceofthedatabase,wewillgetareferencetothedatabaseobjectonwhichwewillperformtheoperations(testinthiscase).Wewilldothisinthefollowingway:

>>>db=client.test

Anotheralternativewayisasfollows:

>>>db=client['test']

4. WewillquerythepostalCodescollection.Wewilllimitourresultsto10itemsasfollows:

>>>postCodes=db.postalCodes.find().limit(10)

5. Iterateovertheresultsasfollows.Watchoutfortheindentationoftheprintaftertheforstatement.Thefollowingfragmentshouldprint10documentsthatarereturned:

>>>forpostCodeinpostCodes:print'City:',postCode['city'],',

State:',postCode['state'],',PinCode:',postCode['pincode']

6. Tofindonedocument,executethefollowingcommand:

>>>postCode=db.postalCodes.find_one()

7. Printthestateandcityofthereturnedresultasfollows:

>>>print'City:',postCode['city'],',State:',postCode['state'],

',PinCode:',postCode['pincode']

8. Let’squerythetop10citiesinthestateofGujaratsortedbythenameofthecity,andwewilljustselectthecity,state,andthepincode.ExecutethefollowingqueryfromthePythonshell:

>>>cursor=db.postalCodes.find({'state':'Gujarat'},{'_id':0,

'city':1,'state':1,'pincode':1}).sort('city',

pymongo.ASCENDING).limit(10)

Theprecedingcursor’sresultscanbeprintedinthesamewayinwhichweprintedtheresultsinstep5.

9. Let’ssortthedatawequery.Wewanttosortbythedescendingorderofthestateandthenbytheascendingorderofthecity.Wewillwritethequeryasfollows:

>>>city=db.postalCodes.find().sort([('state',pymongo.DESCENDING),

('city',pymongo.ASCENDING)]).limit(5)

10. Iteratethroughthiscursor;thisshouldprintoutfiveresultsontheconsole.Refertostep5forhowweiterateoveracursorreturnedtoprinttheresults.

11. So,weplayedabittofinddocumentsandcoveredbasicoperationsfromPythonasfarasqueryingMongoDBisconcerned.Now,let’sseeabitabouttheinsertoperation.Wewilluseatestcollectiontoperformtheseoperationsandnotdisturbourpostalcodestestdata.WewilluseapymongoTestcollectionforthispurposeandadddocumentsinalooptoitasfollows:

>>>foriinrange(1,21):db.pymongoTest.insert({'i':i})

12. Theinsertoperationcantakealistofdictionaryobjectsandperformabulkinsert.Sonow,somethinglikethefollowinginsertqueryisperfectlyvalid:

>>>db.pythonTest.insert([{'name':'John'},{'name':'Mark'}])

Anyguessesonthereturnvalue?Inthecaseofasingledocumentinsert,thereturnvalueisthevalueof_idforthenewlycreateddocument.Inthiscase,itisalistofIDs.

13. Let’sexecuteaninsertqueryagain,thistime,withawriteconcernprovided.Executethefollowingwriteconcernwithw=1andj=True:

>>>db.pymongoTest.insert({'name':'Jones'},w=1,j=True)

Howitworks…Weinstantiatedtheclientandthengotthereferencetotheobjectthatwillbeusedtoaccessthedatabaseonwhichwewishtoperformoperationsinstep3.Thereareacoupleofwaystogetthisreference.Thefirstoption(db=client.test)ismoreconvenient,unlessyourdatabasenamehasaspecialcharacter,suchasahyphen(-).Forexample,ifthenameisdb-test,wewouldhavenooptionotherthantousethe[]operatortoaccessthedatabase.Usingeitherofthealternatives,wenowhaveanobjectforthetestdatabaseinthedbvariable.AfterwegottheclientandthedbinstanceinPython,wequeriedtofindthetop10documentsinthenaturalorderfromthecollectioninstep4.Thesyntaxisexactlyidenticaltohowthisquerywouldhavebeenexecutedfromtheshell.Step5simplyprintedouttheresults,10oftheminthiscase.Generally,ifyouneedinstanthelponaparticularclassusingtheclassnameoraninstanceofthisclassfromthePythoninterpreter,simplyexecutedir(<class_name>)ordir(<objectofaclass>);whichgivesalistingoftheattributesandfunctionsdefinedinthemodulepassed.Forexample,dir('pymongo.MongoClient')ordir(client),whereclientisthevariablethatholdsthereferencetoaninstanceofpymongo.MongoClient,canbeusedtogetthelistingofallthesupportedattributesandfunctions.Thehelpfunctionismoreinformativeandprintsoutthemodule’sdocumentation,whichisagreatsourceofreferencejustincaseyouneedinstanthelp.Trytypinginhelp('pymongo.MongoClient')orhelp(client).

Insteps4and5,wequeriedthepostalCodescollection,limitedtheresulttothetop10results,andprintedthem.Thereturnedobjectisoftypepymongo.cursor.Cursorclass.Thenextstepgotjustonedocumentfromthecollectionusingthefind_one()function.ThisissynonymoustothefindOne()methodonthecollectioninvokedfromtheshell.Thevaluereturnedbythisfunctionisaninbuiltdictobject.

Instep8,weexecutedanotherfindtoquerythedata.However,thistimearound,wepassedtwoparameterstoit.Thefirstonewasthequery,whichlookedsimilartohowweexecutefromtheMongoshell.However,thetypeoftheparameterinPythonisdict.Thesecondparameterwasanotherobjectoftypedict.Thisdictionaryisusedtoprovidethefieldstobereturnedintheresult.Avalue1forafieldindicatesthatthevalueistobeselectedandreturnedintheresult.Thisissynonymoustoselectintherelationaldatabase,withafewsetsofcolumnsprovidedexplicitlytobeselected.The_idfieldisselectedbydefault,unlessitisexplicitlysetto0intheselectordictobject.Theselectorprovidedhereis{'_id':0,'city':1,'state':1,'pincode':1},whichselectsthecity,state,andpincodeandsuppressesthe_idfield.Wehavethesortmethodtoo.Thismethodhastwoformats:sort(sort_field,sort_direction)andsort([(sort_field,sort_direction)…(sort_field,sort_direction)]).

Thefirstoneisusedwhenwewanttosortbyonefieldonly.Thesecondrepresentationacceptsalistofpairsofsortfieldsandsortdirectionsandisusedwhenwewanttosortbymultiplefields.Weusedthefirstformatinthequeryinstep8andthesecondformatinourqueryinstep9,aswesortedfirstbystatenameandthenbycity.

Ifwelookatthewayweinvokedsort,itwasinvokedonthecursorinstance.Similarly,

thelimitfunctionwasalsoontheCursorclass.Theevaluationislazyandisdeferreduntiltheiterationisperformedtoretrievetheresultsfromthecursor.Untilthatpoint,thecursorobjectisnotevaluatedontheserver.

Instep12,weinsertedadocument20timesinacollection.Eachinsert,asweseeinthePythonshell,willreturnagenerated_idfield.Intermsofthesyntaxofinsert,itisexactlyidenticaltotheoperationweperformfromtheshell.Theparameterpassedfortheinsertoperationisagainanobjectoftypedict.

Instep13,wepassedalistofdocumentstoinsertinthecollection.Thisinsertsmultipledocumentsinonecalltotheserver;thisisabulkinsert.ThereturnvalueinthiscaseisalistofIDs,oneforeachdocumentinsertedandinthesameorderaspassedintheinputlist.However,asMongoDBdoesn’tsupporttransactions,allinsertswillbeindependentofeachother,andafailureofoneinsertdoesn’tautomaticallyrollbacktheentireoperation.

Addingtothefunctionalitytoinsertmultipledocumentsdemandedanotherparameterforthebehavior.Whenoneoftheinsertsinthegivenlistfails,shouldtheremaininginsertscontinueorshouldtheinsertionstopassoonasthefirsterrorisencountered?Thenameoftheparametertocontrolthisbehavioriscontinue_on_error,anditsdefaultvalueisFalse,thatis,stopassoonasthefirsterrorisencountered.IfthisvalueisTrueandmultipleerrorsoccurduringinsertion,onlythelatesterrorwillbeavailable.Hence,thedefaultoptionisFalse,asthevalueissensible.Let’stakealookatacoupleofexamples.InthePythonshell,executethefollowingcommands:

>>>db.contOnError.drop()

>>>db.contOnError.insert([{'_id':1},{'_id':1},{'_id':2},{'_id':2}])

>>>db.contOnError.count()

Thecountwewillgetis1,whichisforthefirstdocumentwiththe_idfieldas1.Themomentanotherdocumentwiththesamevalueofthe_idfieldisfound,1inthiscase,anerroristhrown,andthebulkinsertstops.Now,executethefollowinginsertoperation:

>>>db.contOnError.drop()

>>>db.contOnError.insert([{'_id':1},{'_id':1},{'_id':2},{'_id':2}],

continue_on_error=True)

>>>db.contOnError.count()

Here,wepassedanadditionalparameter,continue_on_error,whosevalueisTrue.Asaresultofthisparameter,theinsertoperationwillcontinuewiththenextdocumentevenifanintermediateinsertoperationfailed.Thesecondinsertwith_id:1fails;yet,thenextinsertgoesthroughbeforeanotherinsertwith_id:2fails(asonedocumentwiththis_idisalreadypresent).Also,theerrorreportedisforthelastfailure,theonewith_id:2inthiscase.

Anotherparameterischeck_keys,whichchecksforkeynamesthatstartwith$andtheexistenceof.inthekey.Ifoneisfound,itwillraisebson.errors.InvalidDocument.Thus,thefollowinginsertoperationwillfail:

>>>db.pymongoTest.insert({'a.b':1})

Bydefault,thecheckwilltakeplace,unlessyouexplicitlydisableitbysettingthevalue

ofthisparametertoFalse.Thus,thefollowingquerywillpassandreturnanobjectIDoftheinserteddocument:

>>>db.pymongoTest.insert({'a.b':1},check_keys=False)

Step13executedtheinsertoperationbutprovidedawriteoperationtobeusedfortheinserttobeexecuted.

SeealsoTheExecutingupdateanddeleteoperationsusingPyMongorecipe

ExecutingupdateanddeleteoperationsusingPyMongoInthepreviousrecipe,wesawhowtoexecutefindandinsertoperationsinMongoDBusingPyMongo.Inthisrecipe,wewillseehowupdatesanddeletionsworkfromPython.Wewillalsoseewhatatomicfindandupdate/deleteisandhowtoexecutetheseoperations.Wewillthenconcludebyrevisitingfindoperationsandlookatsomeinterestingfunctionsofthecursorobject.

GettingreadyIfyouhavealreadyseenandcompletedthepreviousrecipe,youareallsettogo.Ifnot,itisrecommendedthatyoufirstcompletethepreviousrecipebeforegoingaheadwiththisrecipe.

Beforewegetstarted,let’sdefineasmallfunctionthatiteratesthroughthecursorandshowstheresultsofacursorontheconsole.WewillusethisfunctionwheneverwewanttodisplaytheresultsofaqueryonthepymongoTestscollection.Thefollowingisthefunction’sbody:

>>>defshowResults(cursor):

ifcursor.count()!=0:

foreincursor:

printe

else:

print'Nodocumentsfound'

Also,refertosteps1and2inthepreviousrecipetolearnhowtocreateaconnectiontotheMongoDBserverandcreatethedbobjectusedtoperformtheCRUDoperationonthisdatabase.Also,refertostep11inthepreviousrecipetolearnhowtoinserttherequiredtestdatainthepymongoTestcollection.YoumightconfirmthedatainthiscollectionbyexecutingthefollowingcommandfromthePythonshelloncethedataispresent:

>>>showResults(db.pymongoTest.find())

Forapartoftherecipe,oneisalsoexpectedtoknowandstartareplicasetinstance.RefertotheStartingmultipleinstancesaspartofareplicasetandConnectingtothereplicasetfromtheshelltoqueryandinsertdatarecipesinChapter1,InstallingandStartingtheMongoDBServer.

Howtodoit…1. WewillsetafieldnamedgtTen,specifyingwithaBooleanvalueTrueifthefieldi

hasavaluegreaterthan10.Let’sexecutethefollowingupdatecommand:

>>>db.pymongoTest.update({'i':{'$gt':10}},{'$set':{'gtTen':True}})

{u'updatedExisting':True,u'connectionId':8,u'ok':1.0,u'err':

None,u'n':1}

2. Querythecollectionandviewitsdatabyexecutingthefollowingcommand,andcheckthedatathatgotupdated:

>>>showResults(db.pymongoTest.find())

3. Theresultsdisplayedconfirmthatonlyonedocumentgotupdated.Wewillnowexecutethesameupdateagainbut,thistimearound,wewillupdateallthedocumentsthatmatchtheprovidedquery.ExecutethefollowingupdateoperationfromthePythonshell.Notethatthisupdateisidenticaltotheoneweperformedinstep1,exceptfortheadditionalparametercalledmultiwhosevalueisgivenasTrue.Also,notethevalueofnintheresponse;itis10thistime:

>>>db.pymongoTest.update({'i':{'$gt':10}},{'$set':{'gtTen':True}},

multi=True)

{u'updatedExisting':True,u'connectionId':8,u'ok':1.0,u'err':

None,u'n':10}

4. Executetheoperationweperformedinstep2againtoviewthecontentsinthepymongoTestcollectionandverifythedocumentsupdated.

5. Let’stakealookathowupsertoperationscanbeperformed.Upsertsareupdatesplusinserts.Theyupdateadocumentifoneexists,justasanupdatewilldo;otherwise,itwillinsertanewdocument.Let’stakealookatanexample.Considerthefollowingcommandonadocumentthatdoesn’texistinthecollection:

>>>db.pymongoTest.update({'i':21},{'$set':{'gtTen':True}})

6. Theupdateherewillnotupdateanythingandwillreturnthenumberofupdateddocumentsas0.However,let’sconsiderthatwewanttoupdateadocumentifitexistsorinsertanewdocumentandapplytheupdateonitatomicallyandthenperformanupsertoperation.Inthiscase,theupsertoperationisexecutedasfollows(notetheresponsethatmentionsupsert,ObjectIdofthenewlyinserteddocument,andtheupdatedExistingvalue,whichisFalse):

>>>db.pymongoTest.update({'i':21},{'$set':{'gtTen':True}},

upsert=True)

{u'ok':1.0,u'upserted':ObjectId('52a8b2f47a809beb067ecd8a'),u'err':

None,u'connectionId':8,u'n':1,u'updatedExisting':False}

7. Let’sseehowtodeletedocumentsfromthecollectionusingtheremovemethod:

>>>db.pymongoTest.remove({'i':21})

{u'connectionId':8,u'ok':1.0,u'err':None,u'n':1}

8. Ifwelookatthevalueofnintheprecedingresponse,wewillseethatitis1.This

meansthatonedocumentgotremoved.Thereisanotherwaytoremovethedocumentby_id.Let’sinsertonedocumentinthecollectionandlaterremoveit.Insertthedocumentasfollows:

>>>db.pymongoTest.insert({'i':23,'_id':23})

9. Now,removethisdocumentfromthecollectionasfollows:

>>>db.pymongoTest.remove(23)

{u'connectionId':8,u'ok':1.0,u'err':None,u'n':1}

10. Wewilllookatthefindandmodifyoperationsnow.Wecanlookatthisoperationasawaytofindadocumentandthenupdate/removeit;bothoftheseoperationsareperformedatomically.Oncetheoperationisperformed,thedocumentreturnediseithertheonebeforeoraftertheupdateoperationwasdone(inthecaseofremove,therewillbenodocumentaftertheoperation).Intheabsenceofthisoperation,wecannotguaranteeanatomicfind,updatethedocument,andreturntheresultingdocumentbefore/aftertheupdateinscenarioswheremultipleclientconnectionscouldbeperformingsimilaroperationsonthesamedocument.ThefollowingisanexampleofhowtoperformthesefindandmodifyoperationsinPython:

>>>db.pymongoTest.find_and_modify({'i':20},{'$set':

{'inWords':'Twenty'}})

{u'i':20,u'gtTen':True,u'_id':

ObjectId('52a8a1eb072f651578ed98b2')}

Theprecedingresultshowsusthattheresultingdocumentreturnedistheonebeforetheupdatewasapplied.

11. Executethefollowingfindoperationtoqueryandviewthedocumentthatweupdatedinthepreviousstep.TheresultingdocumentwillcontainthenewlyaddedinWordsfield:

>>>db.pymongoTest.find_one({'i':20})

{u'i':20,u'_id':ObjectId('52aa0cfe072f651578ed98b7'),u'inWords':

u'Twenty'}

12. Wewillexecutethefindandmodifyoperationsagainbut,thistimearound,wewillreturntheupdateddocumentratherthanthedocumentbeforetheupdate,whichwesawinstep9.ExecutethefollowingcommandfromthePythonshell:

>>>db.pymongoTest.find_and_modify({'i':19},{'$set':

{'inWords':'Nineteen'}},new=True)

{u'i':19,u'gtTen':True,u'_id':

ObjectId('52a8a1eb072f651578ed98b1'),u'inWords':u'Nineteen'}

13. WesawhowtoqueryusingPyMongointhepreviousrecipe.Here,wewillcontinuewiththequeryoperation.Wesawhowthesortandlimitfunctionswerechainedtothefindoperation.TheprototypeofthecallonthepostalCodescollectionisasfollows:

db.postalCode.find(..).limit(..).sort(..)

14. Thereisanalternatewaythatachievesthesameresultastheoneachievedearlier.ExecutethefollowingqueryinthePythonshelltoachievethesameresult:

>>>cursor=db.postalCodes.find({'state':'Gujarat'},{'_id':0,

'city':1,'state':1,'pincode':1},limit=10,sort=[('city',

pymongo.ASCENDING)])

15. PrinttheprecedingcursorusingtheshowResultfunctionthatisalreadydefined.16. Torestrictafulltablescanonthecollectionbyquerieswithoutindexes,thereisa

parametercalledmax_scan,whichtakesanintegervalue.Thisvalueofthemax_scanparameterensuresthataquerydoesn’tscanmorethanthevalueprovided.Forinstance,thefollowingqueryensuresthatnomorethan50documentsarescannedtogettheresults.Again,usetheshowResultsfunctiontodisplaytheresultsinthecursor:

>>>showResults(db.postalCodes.find({'state':'AndhraPradesh'},

max_scan=50))

Howitworks…Let’stakealookatwhatwedidinthisrecipe.Westartedbyupdatingthedocumentsinacollectioninstep1.Theupdate,however,updatedonlythefirstmatchingdocumentbydefaultandtherestofthematchingdocumentswerenotupdated.Instep2,weaddedaparametercalledmultiwithavalueTruetoupdatemultipledocumentsaspartofthesameupdateoperation.Notethatallthesedocumentsarenotupdatedatomicallyaspartofonetransaction.IfwelookattheupdatedonefromthePythonshell,wewillseeastrikingresemblancetowhatwewouldhavedonefromtheMongoshell.Ifwewanttonametheargumentsoftheupdateoperation,thenamesoftheparameterarecalledspecanddocument,whichareforthedocumentprovidedasaquerytobeusedtoselectthedocumentsandtoupdatedocumentsrespectively.Forinstance,thefollowingupdateoperationisvalid:

>>>db.pymongoTest.update(spec={'i':{'$gt':10}},document={'$set':

{'gtTen':True}})

Therearesomemoreargumentsthatanupdatefunctiontakes,withmostofthemcarryingthesamemeaningastheinsertfunctionwesawinthepreviousrecipe.Theseparametersarew,wtimeout,j,fsync,andcheck_keys.Refertothepreviousrecipefortheexplanationgivenfortheseparametersusedwiththeinsertfunction.

Instep6,wedidanupsert(updateplusinsert).AllwehadwasanadditionalupsertparameterwiththevalueasTrue.However,whatexactlyhappensinthecaseofanupsert?Mongotriestoupdatethedocumentthatmatchestheprovidedcondition;ifitfindsone,thiswillhavebeenaregularupdate.However,inthiscase(upsertinstep6),thedocumentwasnotfound.Theserverinsertedthedocumentgivenasspec(thefirstparameter)inthecollectionandthenappliedanupdateoperationonitwithboththeseoperationstakingplaceatomically.

Insteps7and8,wesawtheremoveoperation.Thefirstvariantacceptedaqueryandallthematchingdocumentswereremoved.Thesecondvariant,instep8,acceptedoneinteger,whichisthevalueofthe_idfieldtobedeleted.Thisvariantisusefulwheneverweplantodeletebythe_idfield’svalue.Similartoupdate,theremovefunctiontooacceptsotherparametersforthewriteconcern.Thew,wtimeout,j,andfsyncparametershavemeaningssimilartowhatwediscussedinthepreviousrecipewhenweinsertedthedocuments.Refertothepreviousrecipeforadetaileddescriptionoftheseparameters.Thecalltotheremovemethodonthecollectionwithoutanyparameterwillremoveallthedocumentsinthecollection.

Insteps10to12,weexecutedthefindandmodifyoperations.Informationontheseoperationsisprovidedintheprevioussection.Whatwedidn’tseeisthatthisoperationcanalsobeusedtofindandremovedocumentsfromthecollection.AnadditionalparametercalledremoveneedstobeaddedwiththevalueasTrue.Inthefollowingoperation,wewillremovethedocumentwith_idequals31andreturnthedocumentbeforedeletingit:

>>>db.pymongoTest.find_and_modify(query={'_id':31},remove=True)

Notethat,withtheremoveoptionprovided,theparameternamednewisnotsupported,asthereisnothingtoreturnafterthedocumentisdeleted.

Alltheoperationswesawinthisrecipewerefortheclientsconnectedtoastandaloneinstance.If,however,youareconnectedtoareplicaset,theclientisinstantiatedinadifferentway.Also,weareawareofthefactthat,bydefault,wearenotallowedtoquerythesecondarynodesfordata.Weneedtoexplicitlyexecuters.slaveOk()fromtheMongoshellconnectedtoasecondarynodetoqueryit.ThisisdoneinasimilarwayfromaPythonclientaswell.Ifweareconnectedtoasecondarynode,wecannotqueryitbydefault,butthewayinwhichwespecifythatweareoktoqueryonasecondarynodeisslightlydifferent.Thereisaparametercalledslave_okaytoletusqueryfromthesecondarynodewhosevalueisFalsebydefault;ifthevalueisTrue,thequerywillgothroughsuccessfullyandreturnresultsfromasecondarynode.IftheparameterisnotsettoTrue,queryingthesecondarynodewillthrowanexceptionthatstatesthatthenodequeriedisnotamaster.Forinstance,ifourclientisconnectedtoasecondaryinstanceandwewanttoqueryitbasedonthenameofthestate,wewillexecutethefollowingquery:

>>>cursor=db.postalCodes.find({'state':'Maharashtra'},slave_ok=True)

Wewillgetthecursorfortheresultssuccessfullyifthecollectiondoesindeedhavedocumentswiththenameofthestate,Maharashtra.

Anotherparameterthatisbetterleftuntouchedandhasasensibledefaultiscalledtimeout,anditsvaluebydefaultisTrue.NotethatthisvalueisnotanumberforsomesortoftimeoutbutaBooleanvalue.IfthevalueisTrue,thecursoropenedbyaqueryontheserverwillbeauto-closedafter10minutesofinactivityonit.Let’ssay,itisasortofagarbagecollectionoftheserver-sideresources.However,ifthisissettoFalse,itisnolongertheresponsibilityoftheservertocleanitup,buttheresponsibilityoftheclienttocloseit.

Anotherparametercalledtailableisusedtodenotethatthecursorreturnedbyfindisatailablecursor.Explainingwhattailablecursorsareandgivingmoredetailsisnotinthescopeofthisrecipe;thisisexplainedintheCreatingandtailingcappedcollectioncursorsinMongoDBrecipeinChapter5,AdvancedOperations.

Sofarintherecipe,weconnectedtoasinglenodeusingpymongo.MongoClient.However,wecannotusethesameclasstoconnecttoareplicasetbecauseofthefollowingreasons:

WewilljustbeconnectedtooneinstanceToallowustoperformwriteoperations,wewillhavetoconnecttotheprimaryinstanceIftheprimaryinstancegoesdown,therehastobeanautomaticfailovertothenewprimaryinstance

Therefore,toconnecttoareplicasetandaddresstheprecedingthreepoints,wewillusepymongo.MongoReplicaSetClient.Thefollowingisthewayinwhichwecaninitiatetheclient:

>>>client=pymongo.MongoReplicaSetClient('mongodb://localhost:27000',

replicaSet='replSetTest')

>>>

Aswecansee,wejustprovidedonehostfromthereplicasetandthenameofthereplicasetweusedwhenstartingit.Theclientwillautomaticallydiscovertheremaininghostsfromthereplicasetconfiguration.Thehostname(s)thatweprovidedisknownastheseedlist,usingwhichwecanprovidemultipleinstancesinthereplicaset.Thenameoftheparameterthatgivesthehostnamesishosts_or_uri.

However,whataboutreadpreferencesandhowdowespecifythem?Therearesomemoreparametersthatwewillneedtolookatwhileinitiatingtheclient.

>>>frompymongo.read_preferencesimportReadPreference

>>>frompymongoimportMongoReplicaSetClient

>>>client=MongoReplicaSetClient('mongodb://localhost:27000',

replicaSet='replSetTest',read_preference=ReadPreference.NEAREST)

>>>client.read_preference

4

TheprecedingstepsinitializedareplicasetclientwithareadpreferenceNEAREST.Thereisanadditionalparameter,secondary_acceptable_latency_ms,whichgivesthetimeinmilliseconds.Now,thistimewillbeusedbytheclienttoconsideramemberofthereplicasetasacontenderforselectionwhenthereadpreferenceNEARESTisspecified.Aminimumlatencyisfirstcomputedforallthereplicasetinstancesfromthedriver,andalltheinstanceswithalatencynomorethantheprovidedvaluewillbeaddedtothecontenderinstances’listforselectionasthenearestinstancetothedriver.Therewasafairlylongdiscussiononthisbehaviorinthereadpreferencerecipe,andsomecodesnippetsfromaJavaclientwereusedtoexplaintheinternals.Thedefaultvalueforthisparameteris15milliseconds.

Asweknow,readpreferencecanbeprovidedattheclientlevel,atthedatabaselevelthatgetsinheritedfromtheclient,andalsoatthecursorlevel.Bydefault,read_preferenceforaclientinitializedwithoutanexplicitreadpreferenceisPRIMARY(withthevalue0).However,ifwenowgetthedatabaseobjectfromtheclientinitializedearlier,thereadpreferencewillbeNEAREST(withthevalue4).

>>>db=client.test

>>>db.read_preference

4

>>>

Settingthereadpreferenceisassimpleasexecutingthefollowingcommand:

>>>db.read_preference=ReadPreference.PRIMARY_PREFERRED

Again,asthereadpreferencegetsinheritedfromtheclienttothedatabaseobject,itgetsinheritedfromthedatabaseobjecttothecollectionobject,anditwillbeusedasthedefaultvalueforallthequeriesexecutedagainstthatcollection,unlessreadpreferenceisspecifiedexplicitlyinthefindoperation.

Thus,db.pymongoTest.find()willhaveacursor,whichusesthereadpreferenceasPRIMARY_PREFERRED(wejustsetitearliertoPRIMARY_PREFERREDatthedatabase-objectlevel)whereasdb.pymongoTest.find(read_preference=ReadPreference.NEAREST)will

usethereadpreferenceasNEAREST.

WewillnowwrapupthebasicoperationsfromaPythondriverbytryingtodosomecommonoperationsthatwedofromtheMongoshell,suchasgettingallthedatabasenames,gettingalistofcollectionsinadatabase,andcreatinganindexonacollection.

Fromtheshell,wewillexecuteshowdbstoshowallthedatabasenamesintheMongoinstancethatisconnected.FromthePythonclient,wewillexecutethefollowingcommandontheclientinstance:

>>>client.database_names()

[u'local',u'test']

Similarly,toseethelistofcollections,wewilltypeshowcollectionsintheMongoshell.InPython,allthatwewilldoonthedatabaseobjectisasfollows:

>>>db.collection_names()

[u'system.indexes',u'writeConcernTest',u'pymongoTest']

Now,forindexoperations,wewillfirstseewhatindexesarepresentinthepymongoTestcollection.ExecutethefollowingcommandfromthePythonshelltoviewtheindexesonacollection:

>>>db.pymongoTest.index_information()

{u'_id_':{u'key':[(u'_id',1)],u'v':1}}

Wenowwillcreateanindexonkeyx,whichissortedinascendingorderonthepymongoTestcollectionasfollows:

>>>db.pymongoTest.ensure_index([('x',pymongo.ASCENDING)])

u'x_1'

Wecanagainlisttheindexesasfollowstoconfirmthecreationoftheindex:

>>>db.pymongoTest.index_information()

{u'_id_':{u'key':[(u'_id',1)],u'v':1},u'x_1':{u'key':[(u'x',1)],

u'v':1}}

Wecanseethattheindexgotcreated.Generallyspeaking,theformatoftheensure_indexmethodisasfollows:

>>>db.<collectionname>.ensure_index([(<fieldname1>,<orderoffield1>)

….(<fieldnamen>,<orderoffieldn>)])

AggregationinMongousingPyMongoWealreadysawPyMongousingPython’sclientinterfaceforMongoDBintheExecutingqueryandinsertoperationsusingPyMongoandExecutingupdateanddeleteoperationsusingPyMongorecipes.Inthisrecipe,wewillusethepostalcodecollectionandrunanaggregationexampleusingPyMongo.TheintentionofthisrecipeisnottoexplainaggregationbuttoshowhowaggregationcanbeimplementedusingPyMongo.Inthisrecipe,wewillaggregatethedatabasedonthestatenamesandgetthetopfivestatenamesbythenumberofdocumentstheyappearin.Wewillmakeuseofthe$project,$group,$sort,and$limitoperatorsfortheprocess.

GettingreadyToexecutetheaggregationoperations,weneedtohaveaserverupandrunning.Asimplesinglenodeiswhatwewillneed.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Thedataonwhichwewilloperateneedstobeimportedinthedatabase.ThestepstoimportthedataaregivenintheCreatingtestdatarecipeinChapter2,Command-lineOperationsandIndexes.PythonandPyMongoareexpectedtobeinstalled.LookattheInstallingPyMongorecipetoknowhowtoinstallPyMongoforyourhostoperatingsystem.SincethisisawaytoimplementaggregationinPython,thatthereaderisexpectedtobeawareoftheaggregationframeworkonMongoDB.

Howtodoit…Let’stakealookatthestepsindetail:

1. OpenthePythonterminalbytypingthefollowingcommand:

$python

2. OncethePythonshellopens,importPyMongoasfollows:

>>>importpymongo

3. CreateaninstanceofMongoClientasfollows:

>>>client=pymongo.MongoClient('mongodb://localhost:27017')

4. Getthetestdatabase’sobjectasfollows:

>>>db=client.test

5. Now,wewillexecutetheaggregationoperationonthepostalCodescollectionasfollows:

result=db.postalCodes.aggregate(

[

{'$project':{'state':1,'_id':0}},

{'$group':{'_id':'$state','count':{'$sum':1}}},

{'$sort':{'count':-1}},

{'$limit':5}

]

)

6. Typethefollowingcommandtoviewtheresults:

>>>result['result']

Howitworks…Thestepsareprettystraightforward.Weconnectedtothedatabasethatrunsonthelocalhostandcreatedadatabaseobject.Theaggregationoperationweinvokedonthecollectionusingtheaggregatefunctionisverysimilartohowwewillinvokeaggregationfromtheshell.Theobjectinthereturnvalue,result,isanobjectoftypedict;ithastwokeysofinterest.Oneofthekeysiscalledok,whosevaluewillbe1iftheaggregationoperationexecutedsuccessfully.Theotherkeyiscalledresultanditstypeisalist.Inourcase,itwillcontainfivedocumentsthatcontainthenameofthestateandthecountofthenumberoftheiroccurrences.

MapReduceinMongousingPyMongoInthepreviousrecipe,wesawhowtoexecuteaggregationoperationsinMongousingPyMongo.Inthisrecipe,wewillworkonthesameusecaseaswedidfortheaggregationoperation,butusingMapReduce.Theintentistoaggregatethedatabasedonstatenamesandgetthetopfivestatenamesbythenumberofdocumentstheyappearin.

ProgramminglanguagedriversprovideuswithaninterfacetoletusinvokeMapReducejobswritteninJavaScriptontheserver.

GettingreadyToexecutetheMapReduceoperations,weneedtohaveaserverupandrunning.Asimplesinglenodeiswhatwewillneed.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Thedataonwhichwewillbeoperatingneedstobeimportedinthedatabase.ThestepstoimportthedataaregivenintheCreatingtestdatarecipeinChapter2,Command-lineOperationsandIndexes.Pythonisexpectedtobeinstalledonthehostoperatingsystem,andPyMongoalsoneedstobeinstalled.TakealookattheInstallingPyMongorecipetoknowhowtoinstallPyMongoforyourhostoperatingsystem.

Howtodoit…Let’stakealookatthestepsindetail:

1. OpenthePythonterminalbytypinginthefollowingcommand:

>>>python

2. OncethePythonshellopens,importthebsonpackageasfollows:

>>>importbson

3. Now,importthepymongopackageasfollows:

>>>importpymongo

4. CreateaninstanceofMongoClientasfollows:

>>>client=pymongo.MongoClient('mongodb://localhost:27017')

5. Getthetestdatabase’sobjectasfollows:

>>>db=client.test

6. Writethemapperfunctionasfollows:

>>>map=bson.Code('''function(){emit(this.state,1)}''')

7. Writethereducefunctionasfollows:

>>>reduce=bson.Code('''function(key,values){return

Array.sum(values)}''')

8. InvokeMapReduceasfollows(notethattheresultwillbesenttothepymr_outcollection):

>>>db.postalCodes.map_reduce(map=map,reduce=reduce,out='pymr_out')

9. Verifytheresultasfollows:

>>>c=db.pymr_out.find(sort=[('value',pymongo.DESCENDING)],limit=5)

>>>foreleminc:

...printelem

...

{u'_id':u'Maharashtra',u'value':6446.0}

{u'_id':u'Kerala',u'value':4684.0}

{u'_id':u'TamilNadu',u'value':3784.0}

{u'_id':u'AndhraPradesh',u'value':3550.0}

{u'_id':u'Karnataka',u'value':3204.0}

>>>

Howitworks…ApartfromtheregularimportforPyMongo,hereweimportedthebsonpackagetoo.ThisiswherewehavetheCodeclassthatweuseforwritingtheJavaScriptmapandreducefunctions.ItisinstantiatedbypassingtheJavaScriptfunctionbodyasaconstructorargument.

OncetwoinstancesoftheCodeclassareinstantiated,oneformapandoneforreduce,allweneedtodoisinvokethemap_reducefunctiononthecollection.Inthiscase,wepassedthreeparameters:thetwoCodeinstancesforthemapandreducefunctionswithparameternamesmapandreduce,respectively,andonestringvalue,usedtoprovidethenameoftheoutputcollectionintowhichtheresultsarewritten.

Wewon’tbeexplainingtheMapReduceJavaScriptfunctionhere,butitisprettysimple;allitdoesisemitkeysasthenamesofthestatesandvalues,whichisthenumberoftimestheparticularstatenameoccurs.Thisresultingdocumentwiththekeyused,thestate’snameasthe_idfield,andanotherfieldcalledvalue,whichisthesumofthetimestheparticularstate’snamegiveninthe_idfieldappearedinthecollection,areaddedtotheoutputcollection,pymr_outinthiscase.Forexample,intheentirecollection,thestate,Maharashtra,appeared6446times.Thus,thedocumentforthestateofMaharashtrais{u'_id':u'Maharashtra',u'value':6446.0}.Toconfirmwhetherthisisatruevalueornot,youcanexecutethefollowingqueryfromtheMongoshellandseethattheresultisindeed6446:

>db.postalCodes.count({state:'Maharashtra'})

6446

Wearestillnotdoneastherequirementistofindthetopfivestatesbytheiroccurrenceinthecollection.Westillhavejustthestatesandtheiroccurrences,sothefinalstepistosortthedocumentsbythevaluefield,whichisthenumberoftimesthestate’snameoccurredinthedescendingorder,andlimittheresulttofivedocuments.

SeealsoChapter8,IntegrationwithHadoop,fordifferentrecipesonexecutingMapReducejobsonMongoDBusingtheHadoopconnector,whichallowsustowritethemapandreducefunctionsinlanguagessuchasJavaandPython

ExecutingqueryandinsertoperationsusingaJavaclientInthisrecipe,wewilllookatexecutingthequeryandinsertoperationsusingaJavaclientforMongoDB.UnlikethePythonprogramminglanguage,Javacodesnippetscannotbeexecutedfromaninteractiveinterpreter.Thus,wewillhavesomeunittestcasesalreadyimplemented;theirrelevantcodesnippetswillbeshownandexplained.

GettingreadyForthisrecipe,wewillstartastandaloneinstance.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.

Thenextstepistodownloadthemongo-cookbook-javadriverJavaprojectfromthebook’swebsite.ThisrecipeusesaJUnittestcasetotestvariousfeaturesoftheJavaclient.Inthiswholeprocess,wewillmakeuseofsomeofthemostcommonAPIcallsand,thus,learntousethem.

Howtodoit…Toexecutethetestcase,onecaneitherimporttheprojectinanIDEsuchasEclipseandexecutethetestcase,orexecutethetestcasefromthecommandpromptusingMaven.

Thetestcasewearegoingtoexecuteforthisrecipeiscom.packtpub.mongo.cookbook.MongoDriverQueryAndInsertTest.

IfyouareusinganIDE,openthistestclassandexecuteitasaJUnittestcase.IfyouareplanningtouseMaventoexecutethistestcase,gotothecommandprompt,changethedirectorytotherootoftheproject,andexecutethefollowingcommandtoexecutethissingletestcase:

$mvn-Dtest=com.packtpub.mongo.cookbook.MongoDriverQueryAndInsertTesttest

Everythingshouldexecutefine,andthetestcaseshouldsucceediftheJavaSDKandMavenareproperlysetupandtheMongoDBserverisupandrunningandlisteningtoport27017forincomingconnections.

Howitworks…WewillnowopenthetestclassweexecutedandseesomeoftheimportantAPIcallsinthetestmethod.Thesuperclassofourtestclassiscom.packtpub.mongo.cookbook.AbstractMongoTest.

WewillstartbylookingatthegetClientmethodinthisclass.Theclientinstancethatgetscreatedisaninstanceoftypecom.mongodb.MongoClient.Thereareseveraloverloadedconstructorsforthisclass;however,wewillusethefollowingconstructortoinstantiatetheclient:

MongoClientclient=newMongoClient("localhost:27017");

AnothermethodtolookatisgetJavaDriverTestDatabaseinthesameabstractclassthatgetsusthedatabaseinstance.Thisinstanceissynonymoustotheimplicitvariabledbintheshell.Here,inJava,thisclassisaninstanceoftypecom.mongodb.DB.WewillgetaninstanceofthisDBclassbyinvokingthegetDB()methodontheclientinstance.Inourcase,wewanttheDBinstanceforthejavaDriverTestdatabase,whichisasfollows:

getClient().getDB("javaDriverTest");

Oncewegettheinstanceofcom.mongodb.DB,wewilluseittogettheinstanceofcom.mongodb.DBCollection,whichwillbeusedtoperformvariousoperations,findandinsertinourcase,onthecollection.ThegetJavaTestCollectionmethodintheabstracttestclassreturnsoneinstanceofDBCollection.WewillgetaninstanceofDBCollectionclassforthejavaTestcollectionbyinvokingthegetCollection()methodoncom.mongodb.DBasfollows:

getJavaDriverTestDatabase().getCollection("javaTest")

OncewegetaninstanceofDBCollection,wewillbereadytoperformoperationsonit.Inthescopeofthisrecipe,itislimitedtofindandinsertoperations.

Now,wewillopenthemaintestcaseclasscom.packtpub.mongo.cookbook.MongoDriverQueryAndInsertTest.OpenthisclassinanIDEortexteditor.Wewilllookatthemethodsofthisclass.ThefirstmethodthatwewilllookatisfindOneDocument.Here,thelineofourinterestiscollection.findOne(newBasicDBObject("_id",3));thisqueriesthedocumentwiththevalueof_idas3

Thismethodreturnsaninstanceofcom.mongodb.DBObject,whichisakey-valuemapthatreturnsthefieldsofadocumentasakeyandthevalueasthevalueofthiscorrespondingkey.Forinstance,togetthevalueof_idfromthereturnedDBObjectinstance,wewillinvokeresult.get("_id")onthereturnedresult.

OurnextmethodtoinspectisgetDocumentsFromTestCollection.Thistestcaseexecutesafindoperationonthecollectionandgetsallthedocumentsinit.Thecollection.find()callexecutesthefindoperationontheDBCollection’sinstance.Thereturnvalueofthefindoperationiscom.mongodb.DBCursor.Animportantpointtonoteisthatinvokingthefindoperationdoesn’titselfexecutethequerybutjustreturnstheDBCursor’sinstance.Thisisaninexpensiveoperationanddoesn’tconsumeserver-side

resources.TheactualquerygetsexecutedontheserversideonlywhenthehasNextornextmethodisinvokedontheDBCursorinstance.ThehasNext()methodisusedtocheckiftherearemoreresults,andthenext()methodisusedtonavigatetothenextDBObjectintheresult.AnexampleusageoftheDBCursorinstancereturnedtoiteratethroughtheresultsisasfollows:

while(cursor.hasNext()){

DBObjectobject=cursor.next();

//Someoperationonthereturnedobjecttogetthefieldsand

//valuesinthedocument

}

Wewillnowlookattwomethods:withLimitAndSkipandwithQueryProjectionAndSort.Thesemethodsshowushowtosort,limitthenumberofresults,andskipthenumberofinitialresults.Aswecanseeinthefollowingcodesnippet,themethodssort,limit,skip,andchaintoeachother.

DBCursorcursor=collection

.find(null)

.sort(newBasicDBObject("_id",-1))

.limit(2)

.skip(1);

AllthesemethodsreturnaninstanceofDBCursoritself;thisallowsustochainthecalls.ThesemethodsaredefinedintheDBCursorclass,whichchangescertainstatesaccordingtotheoperationtheyperformintheinstanceandhasreturnthisattheendofthemethodtoreturnthesameinstance.

RememberthattheactualoperationisinvokedontheserveronlyuponinvokingthehasNextornextmethodonDBCursor.Invokinganymethodsuchassort,limit,andskipaftertheexecutionofthequeryontheserverwillthrowjava.lang.IllegalStateException.

Weusedtwovariantsofthefindmethod:onethatacceptsoneparameterforthequerytobeexecutedandanotheronethathastwoparameters;thefirstoneisforthequeryandthesecondoneisanotherDBObject,whichisusedfortheprojectionthatwillreturnonlyaselectedsetoffieldsfromthedocumentintheresult.

Thefollowingquery,forinstance,fromthewithQueryProjectionAndSortmethodofthetestcase,selectsallthedocuments,asthefirstargumentisnull,andthereturnedDBCursorinstancewillhavedocumentsthatcontainjustonefieldcalledvalue:

DBCursorcursor=collection

.find(null,newBasicDBObject("value",1).append("_id",0))

.sort(newBasicDBObject("_id",1));

The_idfieldistobeexplicitlysetto0;otherwise,itwillbereturnedbydefault.

Finally,wewilllookattwomoremethodsinthetestcase:insertDataTestandinsertTestDataWithWriteConcern.Wewilluseacoupleofvariantsoftheinsertmethodinthesetwomethods.AlltheinsertmethodsareinvokedontheDBCollectioninstanceandreturnthecom.mongodb.WriteResultinstance.Theresultcanbeusedtoget

theerrorthatoccurredduringthewriteoperationbyinvokingthegetLastError()method,togetthenumberofdocumentsinsertedusingthegetN()method,andgetthewriteconcernoftheoperationamongthesmallnumberofoperations.RefertotheJavadocoftheMongoDBAPIathttps://api.mongodb.org/java/current/formoredetailsonthemethods.Thetwoinsertoperationsthatweperformedareasfollows:

collection.insert(newBasicDBObject("value","HelloWorld"));

collection.insert(newBasicDBObject("value","HelloWorld"),

WriteConcern.JOURNALED);

BothoftheseacceptaDBObjectinstanceforthedocumenttobeinsertedasthefirstparameter.Thesecondmethodallowsustoprovidethewriteconcerntobeusedforthewriteoperation.ThereareinsertmethodsintheDBCollectionclassthatallowsbulkinserttoo.RefertotheJavadocathttps://api.mongodb.org/java/current/formoredetailsonvariousoverloadedversionsoftheinsertmethod.

ExecutingupdateanddeleteoperationsusingaJavaclientInthepreviousrecipe,wesawhowtoexecutethefindandinsertoperationsinMongoDBusingaJavaclient.Inthisrecipe,wewillseehowtheupdateanddeleteoperationsworkfromaJavaclient.

GettingreadyForthisrecipe,wewillstartastandaloneinstance.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.

ThenextstepistodownloadtheJavaprojectmongo-cookbook-javadriverfromthebook’swebsite.ThisrecipeusesaJUnittestcasetotestoutvariousfeaturesoftheJavaclient.Inthiswholeprocess,wewillmakeuseofsomeofthemostcommonAPIcallsand,thus,learnhowtousethem.

Howtodoit…Toexecutethetestcase,onecaneitherimporttheprojectinanIDEsuchasEclipseandexecutethetestcaseorexecutethetestcasefromthecommandpromptusingMaven.

Thetestcasewearegoingtoexecuteforthisrecipeiscom.packtpub.mongo.cookbook.MongoDriverUpdateAndDeleteTest.

IfyouareusinganIDE,openthistestclassandexecuteitasaJUnittestcase.IfyouplantouseMaventoexecutethistestcase,gotothecommandprompt,changethedirectorytotherootoftheproject,andexecutethefollowingcommandtoexecutethissingletestcase:

$mvn-Dtest=com.packtpub.mongo.cookbook.MongoDriverUpdateAndDeleteTest

test

EverythingshouldexecutefineiftheJavaSDKandMavenareproperlysetupandtheMongoDBserverisupandrunningandlisteningtoport27017forincomingconnections.

Howitworks…WewillcreatetestdatafortherecipesusingthesetupUpdateTestData()method.Here,wewillsimplyputdocumentsinthejavaTestcollectioninthejavaDriverTestdatabase.Wewilladd20documentsinthiscollectionwiththevalueofirangingfrom1to20.Thistestdataisusedindifferenttestcasemethodstocreatetestdata.

Let’snowtakealookatthemethodsinthisclass.WewillfirstlookatbasicUpdateTest().Inthismethod,wewillfirstcreatetestdataandthenexecutethefollowingupdatemethod:

collection.update(

newBasicDBObject("i",newBasicDBObject("$gt",10)),

newBasicDBObject("$set",newBasicDBObject("gtTen",true)));

Theupdatemethodheretakestwoarguments:thefirstoneisthequerythatwillbeusedtoselecttheeligibledocumentsfortheupdate,andthesecondparameteristheactualupdate.ThefirstparameterlooksconfusingduetonestedBasicDBObjectinstances;however,itisthe{'i':{'$gt':10}}condition,andthesecondparameteristhe{'$set':{'gtTen':true}}update.Theresultoftheupdateisaninstanceofcom.mongodb.WriteResult.TheinstanceofWriteResulttellsusaboutthenumberofdocumentsthatgotupdated,theerrorthatoccurredwhileexecutingthewriteoperation,andthewriteconcernusedfortheupdate.RefertotheJavadocsoftheWriteConcernclassformoredetails.Thisupdate,bydefault,onlyupdatesthefirstmatchingdocument,andonlyifmultipledocumentsmatchthequery.

ThenextmethodthatwewilllookatismultiUpdateTest,whichwillupdateallthematchingdocumentsforthegivenqueryinsteadofthefirstmatchingdocument.ThemethodweusedonthecollectioninstanceisupdateMulti.TheupdateMultimethodisjustaconvenientmethodtoupdatemultipledocuments.Thefollowingisthecallthatwewillmaketoupdatemultipledocuments:

collection.updateMulti(newBasicDBObject("i",

newBasicDBObject("$gt",10)),

newBasicDBObject("$set",newBasicDBObject("gtTen",true)));

Thenextoperationthatwewillperformistoremovedocuments.ThetestcasemethodtoremovedocumentsisdeleteTest().Thedocumentsareremovedasfollows:

collection.remove(newBasicDBObject(

"i",newBasicDBObject("$gt",10)),

WriteConcern.JOURNALED);

Wehavetwoparametershere.Thefirstoneisthequeryforwhichmatchingdocumentswillberemovedfromthecollection.Notethatallthematchingdocumentswillberemovedbydefault,unlikeinupdate,whereonlythefirstmatchingdocumentwillberemovedbydefault.Thesecondparameteristhewriteconcerntobeusedfortheremoveoperation.

Notethat,whentheserverisstartedona32-bitmachine,journalingisdisabledbydefault.

Usingthejournalingwriteconcernonsuchmachinescausestheoperationtofailwiththefollowingexception:

com.mongodb.CommandFailureException:{"serverUsed":

"localhost/127.0.0.1:27017","connectionId":5,"n":0,"badGLE":{

"getlasterror":1,"j":true},"ok":0.0,"errmsg":"cannotuse'j'

optionwhenahostdoesnothavejournalingenabled","code":2}

Thiswillneedtheservertobestartedwiththe--journaloption.On64-bitmachines,thisisnotnecessary,asjournalingisenabledbydefault.

WewilllookatthefindAndModifyoperationnext.ThetestcasemethodtoperformthisoperationisfindAndModifyTest.Thefollowinglinesofcodeareusedtoperformthisoperation:

DBObjectold=collection.findAndModify(

newBasicDBObject("i",10),

newBasicDBObject("i",100));

Theoperationisthequerythatwillfindthematchingdocumentsandthenupdatethem.ThereturntypeoftheoperationisaninstanceofDBObjectbeforetheupdateisapplied.OneimportantfeatureofthefindAndModifyoperationisthatthefindandupdateoperationsareperformedatomically.

TheprecedingmethodisasimpleversionofthefindAndModifyoperation.Thereisanoverloadedversionofthismethodwiththefollowingsignature:

DBObjectfindAndModify(DBObjectquery,DBObjectfields,DBObjectsort

,booleanremove,DBObjectupdate,booleanreturnNew,booleanupsert)

Let’sseewhattheseparametersareinthefollowingtable:

Parameter Description

queryThefindandmodifyoperationshavetofindandmodifythedocuments.Thisvalueofthisparameteristhequerythatisusedtoquerythedocumentsthatwouldbelatermodified.

fieldsThefindmethodsupportsaprojectionofthefieldsthatitneedstobeselectedintheresultdocument(s).Theparameterheredoesthesamejobofselectingafixedsetoffieldsfromtheresultingdocument.

sort

Ifyouhaven’tnoticedalready,letmetellyouthatthesortmethodcanperformanatomicoperationononlyonedocumentandalsoreturnsonedocument.Thissortfunctioncanbeusedincaseswherethequeryselectsmultipledocuments,andonlythefirstgetschosenfortheoperation.Thesortoperationisappliedontheresultbeforepickingupthefirstdocumenttoupdateit.

removeThisisaBooleanflagthatindicateswhethertoremoveorupdatethedocument.Ifthisvalueistrue,thedocumentwillberemoved.

update

This,unliketheremoveattribute,isnotaBooleanvaluebutaDBObjectinstancethatwilltellwhattheupdateneedstobe.NotethattheremovedBooleanflaggetsprecedenceoverthisparameter.Iftheremoveattributeistrue,theupdatewillnothappenevenifoneisprovided.

returnNew

Thefindoperationreturnsadocument,butwhichone?Theonebeforetheupdatewasexecutedortheoneaftertheupdategetsexecuted?WhenthisBooleanflagisgivenastrue,itreturnsthedocumentaftertheupdateisexecuted.

upsertThisisaBooleanflagagainthat,whentrue,executestheupsertoperation.Itisrelevantonlywhentheintendedoperationisupdate.

Therearemoreoverloadedmethodsofthisoperation.RefertotheJavadocsofcom.mongodb.DBCollectionformoremethods.ThefindAndModifymethodweusedultimatelyinvokesthemethodwediscussedearlierwiththefieldsandsortparametersasnullandtheremainingremove,returnNew,andupsertparametersbeingfalse.

Finally,wewilllookatquerybuildersupportinMongoDBJavaAPI.

AllthequeriesinMongoareDBObjectinstanceswithpossiblymorenestedDBObjectinstancesinthem.Thingsaresimpleforsmallqueries,buttheystartgettinguglyformorecomplicatedqueries.Considerarelativelysimplequerywherewewanttoqueryfordocumentswithi>10andi<15.TheMongoqueryforthisis{$and:[{i:{$gt:10}},{i:{$lt:15}}]}.WritingthisinJavausingBasicDBObjectinstancesispainful,anditlooksasfollows:

DBObjectquery=newBasicDBObject("$and",

newBasicDBObject[]{

newBasicDBObject("i",newBasicDBObject("$gt",10)),

newBasicDBObject("i",newBasicDBObject("$lt",15))

}

);

Thankfully,however,thereisaclasscalledcom.mongodb.QueryBuilderthatisautilityclasstobuildcomplexqueries.Theprecedingqueryisbuiltusingquerybuilderasfollows:

DBObjectquery=

QueryBuilder.start("i").greaterThan(10).and("i").lessThan(15).get();

Thisislesserror-pronewhenwritingaqueryandiseasytoreadaswell.Therearealotofmethodsinthecom.mongodb.QueryBuilderclass,andIwouldlikeyoutogothroughtheJavadocsofthisclass.Thebasicideaistostartconstructionusingthestartmethodandthekey.Wewillthenchainthemethod’scallstoadddifferentconditionsand,whentheadditionofvariousconditionsisdone,thequeryisconstructedusingtheget()method,whichreturnsDBObject.RefertothequeryBuilderSamplemethodinthetestclassforasampleusageofquerybuildersupportofMongoDBJavaAPI.

SeealsoChapter5,AdvancedOperations,toknowsomemoreoperationsusingGridFSandgeospatialindexesandhowtousethemfromtheJavaapplicationwithasmallsampleTheJavadocsforthecurrentversionoftheMongoDBdriverathttps://api.mongodb.org/java/current/

AggregationinMongousingaJavaclientTheintentionofthisrecipeisnottoexplainaggregationbuttoshowhowaggregationcanbeimplementedusingaJavaclientfromaJavaprogram.Inthisrecipe,wewillaggregatethedatabasedonthestatenamesandgetthetopfivestatenamesbythenumberofdocumentstheyappearin.Wewillmakeuseofthe$project,$group,$sort,and$limitoperatorsfortheprocess.

GettingreadyThetestclassusedforthisrecipeiscom.packtpub.mongo.cookbook.MongoAggregationTest.Toexecutetheaggregationoperations,weneedtohaveaserverupandrunning.Asimplesinglenodeiswhatwewillneed.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Thedataonwhichwewilloperateneedstobeimportedinthedatabase.ThestepstoimportthedataaregivenintheCreatingtestdatarecipeinChapter2,Command-lineOperationsandIndexes.Thenextstepistodownloadthemongo-cookbook-javadriverJavaprojectfromthebook’swebsite.ThoughMavencanbeusedtoexecutethetestcase,itisconvenienttoimporttheprojectinanIDEandexecutethetestcaseclass.ItisassumedthatyouarefamiliarwiththeJavaprograminglanguageandcomfortableusingtheIDEintowhichtheprojectwillbeimported.

Howtodoit…Toexecutethetestcase,onecaneitherimporttheprojectinanIDEsuchasEclipseandexecutethetestcaseorexecutethetestcasefromthecommandpromptusingMaven.

IfyouareusinganIDE,openthetestclassandexecuteitasaJUnittestcase.IfyouplantouseMaventoexecutethistestcase,gotothecommandprompt,changethedirectorytotherootoftheproject,andexecutethefollowingcommandtoexecutethissingletestcase:

$mvn-Dtest=com.packtpub.mongo.cookbook.MongoAggregationTesttest

EverythingshouldexecutefineiftheJavaSDKandMavenareproperlysetupandtheMongoDBserverisupandrunningandlisteningtoport27017forincomingconnections.

Howitworks…ThemethodusedtolookataggregationfunctionalityisaggregationTest()inourtestclass.TheaggregationoperationisperformedonMongoDBfromaJavaclientusingtheaggregate()methoddefinedintheDBCollectionclass.Themethodhasthefollowingsignature:

AggregationOutputaggregate(firstOp,additionalOps)

Onlythefirstargumentismandatory;thisformsthefirstoperationinthepipeline.Thesecondargumentisavaragrsargument(avariablenumberofargumentswithzeroormorevalues)thatallowsmorepipelineoperators.Alltheseargumentsareoftypecom.mongodb.DBObject.Ifanyexceptionoccursduringtheexecutionoftheaggregationcommand,theaggregationoperationwillthrowcom.mongodb.MongoExceptionwiththecauseoftheexception.

Thereturntypecom.mongodb.AggregationOutputisusedtogettheresultoftheaggregationoperation.Fromadeveloper’sperspective,wearemoreinterestedintheresultsfieldofthisinstance,whichcanbeaccessedusingtheresults()methodofthereturnedobject.Theresults()methodreturnsanobjectoftypeIterable<DBObject>,whichonecaniteratetogettheresultsoftheaggregation.

Let’slookathowweimplementedtheaggregationpipelineinourtestclass:

AggregationOutputoutput=collection.aggregate(

//{'$project':{'state':1,'_id':0}},

newBasicDBObject("$project",newBasicDBObject("state",1).append("_id",

0)),

//{'$group':{'_id':'$state','count':{'$sum':1}}}

newBasicDBObject("$group",newBasicDBObject("_id","$state")

.append("count",newBasicDBObject("$sum",1))),

//{'$sort':{'count':-1}}

newBasicDBObject("$sort",newBasicDBObject("count",-1)),

//{'$limit':5}

newBasicDBObject("$limit",5)

);

Therearefouroperationsinthepipelineinthefollowingorder.A$projectoperation,followedby$group,$sort,andthen$limit.

Thelasttwooperationslookinefficient;usingthem,wewillsorteverythingbutthenjusttakethetopfiveelements.TheMongoDBserverinsuchscenariosisintelligentenoughtoconsiderthelimitoperationwhilesorting;asaresultofthis,onlythetopfiveresultsneedtobemaintainedratherthansortingalltheresults.

ForVersion2.6ofMongoDB,theaggregationresultcanreturnacursor.Thoughtheprecedingcodesnippetisstillvalid,theAggregationResultobjectisnolongertheonlywaytogettheresultsoftheoperation,butwecanusecom.mongodb.Cursortoiteratetheresults.Also,theprecedingformatisnowdeprecatedinfavoroftheformatthatacceptsalistofpipelineoperatorsratherthanvarargsfortheoperatorstobeused.RefertotheJavadocsofthecom.mongodb.DBCollectionclassandlookforvariousoverloaded

aggregate()methods.

MapReduceinMongousingaJavaclientInthepreviousrecipe,wesawhowtoexecuteaggregationoperationsinMongousingtheJavaclient.Inthisrecipe,wewillworkonthesameusecaseaswedidfortheaggregationoperation,butusingMapReduce.Theintentistoaggregatethedatabasedonthestatenamesandgetthetopfivestatenamesbythenumberofdocumentstheyappearin.

IfyouarenotawareofhowtowriteMapReducecodeforMongofromaprogramminglanguageclientandareseeingitforthefirsttime,youmightbesurprisedtoseehowitisactuallydone.Youmighthaveimaginedthatyouwillbewritingthemapandreducefunctionsintheprogramminglanguageinwhichyouarewritingthecode,Javainthiscase,andthenusingittoexecuteMapReduce.However,youneedtobearinmindthatMapReducejobsrunontheMongoservers,andtheyexecuteJavaScriptfunctions.Hence,irrespectiveoftheprogramminglanguagedriver,theMapReducefunctionsarewritteninJavaScript.TheprogramminglanguagedriversjustactasameansoflettingusinvokeandexecutetheMapReducefunctions(writteninJavaScript)ontheserver.

GettingreadyThetestclassusedforthisrecipeiscom.packtpub.mongo.cookbook.MongoMapReduceTest.ToexecutetheMapReduceoperations,weneedtohaveaserverupandrunning.Asimplesinglenodeiswhatwewillneed.RefertotheSinglenodeinstallationofMongoDBinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Thedataonwhichwewilloperateneedstobeimportedinthedatabase.ThestepstoimportthedataaregivenintheCreatingtestdatarecipeinChapter2,Command-lineOperationsandIndexes.Thenextstepistodownloadthemongo-cookbook-javadriverJavaprojectfromthebook’swebsite.ThoughMavencanbeusedtoexecutethetestcase,itisconvenienttoimporttheprojectinanIDEandexecutethetestcaseclass.ItisassumedthatyouarefamiliarwiththeJavaprograminglanguageandcomfortableusingtheIDEtowhichtheprojectwillbeimported.

Howtodoit…Toexecutethetestcase,onecaneitherimporttheprojectinanIDEsuchasEclipseandexecutethetestcaseorexecutethetestcasefromthecommandpromptusingMaven.

IfyouareusinganIDE,openthetestclassandexecuteitasaJUnittestcase.IfyouplantouseMaventoexecutethistestcase,gotothecommandprompt,changethedirectorytotherootoftheproject,andexecutethefollowingcommandtoexecutethissingletestcase:

$mvn-Dtest=com.packtpub.mongo.cookbook.MongoMapReduceTesttest

EverythingshouldexecutefineiftheJavaSDKandMavenareproperlysetupandtheMongoDBserverisupandrunningandlisteningtoport27017forincomingconnections.

Howitworks…ThetestcasemethodforourMapReducetestismapReduceTest().

MapReduceoperationscanbedoneinMongofromaJavaclientusingthemapReduce()methoddefinedintheDBCollectionclass.Therearealotofoverloadedversions,andyoumightrefertoJavadocsofthecom.mongodb.DBCollectionclassformoredetailsonthevariousflavorsofthismethod,buttheoneweusedisasfollows:

collection.mapReduce(mapper,reducer,outputcollection,query)

Themethodacceptsfourparameters:

Thefirstoneisthemapperfunction,whichisoftypestringandisaJavaScriptcodethatwillbeexecutedontheMongodatabaseserverThesecondoneisthereducerfunction,whichisoftypestringandisaJavaScriptcodethatwillbeexecutedontheMongodatabaseserverThethirdoneisthenameofthecollectiontowhichtheoutputoftheMapReduceexecutionwillbewrittenFinally,itisthequerythatwillbeexecutedbytheserver,andtheresultofthisquerywillbetheinputtotheMapReducejobexecution

SincetheassumptionisthatthereaderiswellversedinMapReduceoperationsfromtheshell,wewon’texplaintheMapReduceJavaScriptfunctionsthatwehaveinthetestcasemethod.However,itisprettysimple,andallitdoesisemitkeysasthenamesofthestatesandvalues,whichisthenumberoftimestheparticularstatenameoccurs.Thisresultingdocumentwiththekeyused,thestate’snameasthe_idfield,andanotherfieldcalledvalue,whichisthesumofthetimestheparticularstate’snamegiveninthe_idfieldappearedinthecollection,areaddedtotheoutputcollection,javaMROutputinthiscase.Forexample,intheentirecollection,thestate,Maharashtra,appeared6446times.Thus,thedocumentforthestateofMaharashtrais{'_id':'Maharashtra','value':6446}.Toconfirmwhetherthisisthetruevalueornot,youcanexecutethefollowingqueryfromtheMongoshellandseethattheresultisindeed6446:

>db.postalCodes.count({state:'Maharashtra'})

6446

Wearestillnotdoneastherequirementistofindthetopfivestatesbytheiroccurrenceinthecollection.Westillhavejustthestatesandtheiroccurrences,sothefinalstepistosortthedocumentsbythevaluefield,whichisthenumberoftimesthestate’snameoccurredindescendingorder,andlimittheresulttofivedocuments.

SeealsoChapter8,IntegrationwithHadoop,fordifferentrecipesonexecutingMapReducejobsonMongoDBusingtheHadoopconnector.ThisallowsustowritethemapandreducefunctionsinlanguagessuchasJavaandPython.

Chapter4.AdministrationInthischapter,wewillcoverthefollowingrecipesrelatedtoMongoDBadministration:

RenamingacollectionViewingcollectionstatsViewingdatabasestatsDisablingthepreallocationofdatafilesManuallypaddingadocumentUnderstandingthemongostatandmongotoputilitiesEstimatingtheworkingsetViewingandkillingthecurrentlyexecutingoperationsUsingprofilertoprofileoperationsSettingupusersinMongoDBUnderstandinginterprocesssecurityinMongoDBModifyingcollectionbehaviorusingthecollModcommandSettingupMongoDBasaWindowsServiceConfiguringareplicasetSteppingdownasaprimaryinstancefromthereplicasetExploringthelocaldatabaseofareplicasetUnderstandingandanalyzingoplogsBuildingtaggedreplicasetsConfiguringthedefaultshardfornonshardedcollectionsManuallysplittingandmigratingchunksPerformingdomain-drivenshardingusingtagsExploringtheconfigdatabaseinashardedsetup

RenamingacollectionHaveyouevercomeacrossascenariowhereyouhavenamedatableinarelationaldatabase,andatalaterpointoftime,feltthatthenamecouldhavebeenbetter?Orperhaps,theorganizationyouworkforwaslateinrealizingthatthetablenamesarereallygettingmessyandwantstoenforcesomestandardsonthenames?Relationaldatabasesdohavesomeproprietarywaystorenamethetables,andadatabaseadmincandothatforyou.

Thisraisesaquestionthough.IntheMongoworld,wherecollectionsaresynonymouswithtables,isthereawaytorenameacollectionafteritiscreated?Inthisrecipe,wewillexplorethisfeatureofMongo,wherewerenameanexistingcollectionwithsomedatainit.

GettingreadyRunningaMongoDBinstanceiswhatwewillneedforperformingthiscollectionrenamingexperiment.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,forhowtostarttheserver.TheoperationswewillbeperformingwouldbefromtheMongoshell.

Howtodoit…Let’stakealookatthestepsindetail:

1. Oncetheserverisstarted,andassumingitislisteningforclientconnectionsonthedefaultport27017,executethefollowingcommandtoconnecttoitfromtheshell:

>mongo

2. Onceconnected,usingthedefaulttestdatabase,letuscreateacollectionwithsometestdata.ThecollectionwewillbeusingisnamedsloppyNamedCollection:

>for(i=0;i<10;i++)db.sloppyNamedCollection.insert({'i':i})

3. Thetestdatawillnowbecreated(wemayverifythedatabyqueryingthesloppyNamedCollectioncollection).

4. RenamethecollectionasneatNamedCollectionusingthefollowingcommand:

>db.sloppyNamedCollection.renameCollection('neatNamedCollection')

{"ok":1}

5. VerifythatthesloppyNamedCollectioncollectionisnolongerpresent,byexecutingthefollowingcommand:

>showcollections

6. Finally,querytheneatNamedCollectioncollectiontoverifythatthedatathatwasoriginallyinsloppyNamedCollectionisindeedpresentinit.SimplyexecutethefollowingcommandontheMongoshell:

>db.neatNamedCollection.find()

Howitworks…Renamingacollectionisprettysimple.ItisaccomplishedwiththerenameCollectionmethod,whichtakestwoarguments.Generally,thefunctionsignatureisasfollows:

>db.<collectiontorename>.renameCollection('<targetnameofthe

collection>',<droptargetifexists>)

Thefirstargumentisthenamebywhichthecollectionistoberenamed.

Thesecondparameterthatwedidn’tuseisaBooleanvaluethattellsthecommandwhethertodropthetargetcollection(ifitexists)ornot.Thisvaluedefaultstofalse,whichmeansthetargetmustnotbedroppedbutmustgiveanerrorinstead.Thisisasensibledefault,elsetheresultswouldbeghastlyifweaccidentallygaveacollectionnamethatexistsanddidn’twishtodropit.Ifhowever,youknowwhatyouaredoingandwantthetargettobedroppedwhilerenamingthecollection,passthesecondparameterastrue.ThenameofthisparameterisdropTarget.Inourcase,thecallwouldhavebeen:

>db.sloppyNamedCollection.renameCollection('neatNamedCollection',true)

Now,asanexercise,trycreatingsloppyNamedCollectionagainandrenameitwithoutthesecondparameter(orfalseasthevalue).YoushouldseeMongocomplainingthatthetargetnamespaceexists.Then,renameitagainwiththesecondparameterastrue.Thistime,therenamingoperationexecutessuccessfully.

Notethattherenameoperationwillkeeptheoriginalandthenewlyrenamedcollectioninthesamedatabase.ThisrenameCollectionmethodisnotenoughtomove/renamethecollectionacrossanotherdatabase.Insuchcases,weneedtoruntherenameCollectioncommandasfollows:

>db.runCommand({renameCollection:"<source_namespace>",to:"

<target_namespace>",dropTarget:<true|false>});

Inourcase,supposewewanttomovesloppyNamedCollectionfromthetestdatabasetonewDatabase,renameitneatNamedCollection,anddropthetargetdatabaseifitexists;wewillexecutethefollowingcommand:

>db.runCommand({renameCollection:"test.sloppyNamedCollection",to:

"newDatabase.neatNamedCollection",dropTarget:true});

Also,notethattherenamecollectionoperationdoesn’tworkonshardedcollections.

ViewingcollectionstatsWhenitcomestotheusageofstorage,oneoftheinterestingstatisticsfromanadministrativepointofviewisperhapsthenumberofdocumentsinacollection,possiblytoestimatefuturespaceandmemoryrequirementsbasedonthegrowthofthedatatogethigh-levelstatisticsofthecollection.

GettingreadyTofindthestatsofthecollection,weneedtohaveaserverupandrunning,andasinglenodeshouldbeok.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,forhowtostarttheserver.Thedataonwhichwewillbeoperatingneedstobeimportedintothedatabase.ThestepstoimportthedataaregivenintheCreatingtestdatarecipeinChapter2,Command-lineOperationsandIndexes.Oncethesestepsarecompleted,weareallsettogoaheadwiththisrecipe.

Howtodoit…WewillbeusingthepostalCodescollectiontoviewthestats.Let’stakealookatthestepsindetail:

1. OpentheMongoshellandconnectittotherunningMongoDBinstance.Inthiscase,startMongoonthedefaultport27017andexecutethefollowingcommand:

$mongo

2. Withthedataimported,createanindexinthepincodefield,ifonedoesn’texist,asfollows:

>db.postalCodes.ensureIndex({'pincode':1})

3. OntheMongoterminal,executethefollowingcommand:

>db.postalCodes.stats()

4. Observetheoutput.Nowexecutethefollowingcommandontheshell:

>db.postalCodes.stats(1024)

{

"ns":"test.postalCodes",

"count":39732,

"size":5561,

"avgObjSize":0.1399627504278667,

"storageSize":16380,

"numExtents":6,

"nindexes":2,

"lastExtentSize":12288,

"paddingFactor":1,

"systemFlags":1,

"userFlags":0,

"totalIndexSize":2243,

"indexSizes":{

"_id_":1261,

"pincode_1":982

},

"ok":1

}

Again,observetheoutput.Wewillnowseewhatthesevaluesmeantousinthenextsection.

Howitworks…Ifweobservetheoutputforthedb.postalCodes.stats()anddb.postalCodes.stats(1024)commands,weseethatthesecondonehasallthefiguresinKBwhereasthefirstoneisinbytes.Theparameterprovidedisknownasscaleandallthefiguresindicatingsizearedividedbythisscale.Inthiscase,aswegavethevalueas1024,wegetallthevaluesinKB;whereasif1024*1024ispassedasthevalueofthescale,thesizeshownwillbeinMB.Forouranalysis,wewillusetheonethatshowsthesizesinKB:

>db.postalCodes.stats(1024)

{

"ns":"test.postalCodes",

"count":39732,

"size":5561,"avgObjSize":0.1399627504278667,

"storageSize":16380,

"numExtents":6,

"nindexes":2,

"lastExtentSize":12288,

"paddingFactor":1,

"systemFlags":1,

"userFlags":0,

"totalIndexSize":2243,

"indexSizes":{

"_id_":1261,

"pincode_1":982

},

"ok":1

}

Thefollowingtableshowsthemeaningoftheimportantfields:

Field Description

ns Thisisthefullyqualifiednameofthecollectionwiththe<database>.<collectionname>format.

count Thisisthenumberofdocumentsinthecollection.

size

Thisistheactualstoragesizeoccupiedbythedocumentsinthecollection.Adding,deleting,orupdatingdocumentsinthecollectioncanchangethisfigure.Thescaleparameteraffectsthisfield’svalueandinourcase,thisvalueisinKBas1024isthescale.

avgObjSize

Thisistheaveragesizeofthedocumentinthecollection.Itissimplythesizefielddividedbythecountofdocumentsinthecollection.Thescaleparameteraffectsthisfield’svalueandinourcase,thisvalueisinKBas1024isthescale.

storageSize

Mongopreallocatesthespaceonthedisktoensurethatthedocumentsinthecollectionarekeptoncontinuouslocationstoprovidebetterperformanceindiskaccess.Thispreallocationfillsupthefileswithzerosandthenstartsallocatingspacetotheseinserteddocuments.Thisfieldrevealsthesizeofthestorageusedbythiscollection.Thisfigurewillgenerallybemuchmorethantheactualsizeofthecollection.Thescaleparameteraffectsthisfield’svalueandinourcase,thisvalueisinKBas1024isthescale.

Aswesaw,Mongopreallocatescontinuousdiskspacetothecollectionsforperformancepurposes.

numExtents However,asthecollectiongrows,newspaceneedstobeallocated.Thisfieldgivesthenumberofsuchcontinuouschunkallocation.Thiscontinuouschunkiscalledextent.

nindexesThisfieldgivesthenumberofindexespresentinthecollection.Thisvaluewouldbe1,evenifwedonotcreateanindexonthecollection,asMongoimplicitlycreatesanindexonthe_idfield.

lastExtentSizeThisisthesizeofthelastextentallocated.Thescaleparameteraffectsthisfield’svalueandinourcase,thisvalueisinKBas1024isthescale.

paddingFactor

Wecanlookatthisfactorasamultipliertotheactualdocumentsizeinordertocomputethestoragesize.Forexample,ifthedocumenttobeinsertedis2KB,withapaddingFactorfieldof1,thesizeallocatedtothedocumentis2KB;thatis,withnopadding.Ontheotherhand,ifthepaddingFactorfieldis1.5,thespaceallocatedtothedocumentwillbe3KB(2*1.5),whichgivesapaddingof1KB.Inourcase,thepaddingFactorfieldis1becausewedidamongoimport.Wewilldiscusspaddingandpaddingfactorinthenextsection.

totalIndexSizeIndexestakeupspacetostore.Thisfieldgivesthetotalsizetakenupbytheindexesonthedisk.Thescaleparameteraffectsthisfield’svalueandinourcase,thisvalueisinKBas1024isthescale.

indexSizes

Thisfielditselfisadocument,withthekeyasthenameoftheindexandthevalueasthesizeoftheindexinquestion.Inourcase,wehadcreatedanindexexplicitlyonthepincodefield.Thus,weseethenameoftheindexasthekeyandthesizeoftheindexondiskasthevalue.Thetotalofthesevaluesofalltheindexesisthesameasthevaluegivenearlier,thatis,totalIndexSize.Thescaleparameteraffectsthisfield’svalueandinourcase,thisvalueisinKBas1024isthescale.

Let’stakealookatthepaddingFactorfield.Documentsareplacedonthestoragedeviceincontinuouslocations.If,however,anupdateoccursthatcausesthesizeofthedocumenttoincrease,Mongoobviouslywillnotbeabletoincreasethedocumentsizeiftherewasnobufferspacekeptafterthedocument.Theonlysolutionistocopytheentiredocumenttowardstheendofthecollectionwiththenecessaryupdatesmadetoit.Thisoperationturnsouttobeexpensive,affectingtheperformanceofsuchupdateoperations.IfthepaddingFactorfieldis1,nopaddingorbufferspaceiskeptbetweentwoconsecutivedocuments,makingitimpossibleforthefirstofthesetwodocumentstogrowonupdates.IfthispaddingFactorfieldismorethan1,therewouldbesomebufferspaceaccommodatingsomesmallsizechangesforthedocuments.ThispaddingFactorfield,however,isnotsetbytheuserandMongoDBcalculatesitforthecollectionoveraperiodoftime.ItthenusesthiscalculatedpaddingFactorfieldtoallocateaspaceforthenewdocumentsinserted.Togetafeelofhowthispaddingfactorchanges,letusdoasmallexercise:

1. ExecutethefollowingcommandintheMongoshell:

>for(i=0;i<10;i++){

db.paddingFactorTest.insert({value:'HelloWorld'})

}

2. NowexecutethefollowingcommandandtakenoteofthepaddingFactorvalue(itwouldbe1):

>db.paddingFactorTest.stats()

3. Wewillnowmakesomeupdatestoletthedocumentgrowinsizeasfollows:

>for(i=0;i<5;i++){

db.paddingFactorTest.update({value:'HelloWorld'},{$push:

{value1:'Value'}},false,true)

}

>db.paddingFactorTest.stats()

QuerythestatsagainandobservethevalueofpaddingFactorthathasgoneslightlyover1,whichshowsthattheMongoDBserveradjustedthisvaluewhileallocatingspaceforadocumentinsertionatalaterpointintime.

WesawhowpaddingFactoraffectsthestorageallocatedtoadocument,butneitherdowehavecontrolonthisvalue,norcanweinstructMongobeforehandonwhatadditionalbufferneedstobeallocatedtoeachdocumentinsertedbasedontheanticipatedgrowthofadocument.Thereis,however,atechniquethatletusachievesthisinawaythatwewillseeintheManuallypaddingadocumentrecipe.

SeealsoTheViewingdatabasestatsrecipetoviewthestatsatadatabaselevel

ViewingdatabasestatsInthepreviousrecipe,wesawhowtoviewsomeimportantstatisticsofacollectionfromanadministrativeperspective.Inthisrecipe,we’llgetanevenclearerpicture;gettingthose(ormostofthose)statisticsatthedatabaselevel.

GettingreadyTofindthestatsofthedatabase,weneedtohaveaserverupandrunning,andasinglenodeshouldbeok.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,forhowtostarttheserver.Thedataonwhichwewouldbeoperatingneedstobeimportedintothedatabase.ThestepstoimportthedataaregivenintheCreatingtestdatarecipeinChapter2,Command-lineOperationsandIndexes.Oncethesestepsarecompleted,weareallsettogoaheadwiththisrecipe.Refertothepreviousrecipe,Viewingcollectionstats,ifyouneedtoseehowtoviewstatsatthecollectionlevel.

Howtodoit…Wewillbeusingthetestdatabaseforthepurposeofthisrecipe.ItalreadyhasthepostalCodescollectioninit.Let’stakealookatthestepsindetail:

1. ConnecttotheserverusingtheMongoshellbytypinginthefollowingcommandfromtheoperatingsystemterminal(itisassumedthattheserverislisteningtoport27017):

$mongo

2. Ontheshell,executethefollowingcommandandobservetheoutput:

>db.stats()

3. Now,executethefollowingcommand,butthistimewiththescaleparameter(observetheoutput):

>db.stats(1024)

{

"db":"test",

"collections":3,

"objects":39738,

"avgObjSize":143.32699179626553,

"dataSize":5562,

"storageSize":16388,

"numExtents":8,

"indexes":2,

"indexSize":2243,

"fileSize":196608,

"nsSizeMB":16,

"dataFileVersion":{

"major":4,

"minor":5

},

"ok":1

}

Howitworks…Letusstartbylookingatthecollectionsfield.IfyoulookcarefullyatthenumberandalsoexecutetheshowcollectionscommandontheMongoshell,youwillfindoneextracollectioninthestatsascomparedtothoseachievedbyexecutingthecommand.Thedifferenceisdenotesonecollection,whichishidden,anditsnameissystem.namespaces.Youmayexecutedb.system.namespaces.find()toviewitscontents.

Gettingbacktotheoutputofstatsoperationonthedatabase,theobjectsfieldintheresulthasaninterestingvaluetoo.IfwefindthecountofdocumentsinthepostalCodescollection,weseethatitis39732.Thecountshownhereis39738,whichmeanstherearesixmoredocuments.Thesesixdocumentscomefromthesystem.namespacesandsystem.indexescollection.Executingacountqueryonthesetwocollectionswillconfirmit.Notethatthetestdatabasedoesn’tcontainanyothercollectionapartfrompostalCodes.Thefigureswillchangeifthedatabasecontainsmorecollectionswithdocumentsinit.

Thescaleparameter,whichisaparametertothestatsfunction,dividesthenumberofbyteswiththegivenscalevalue.Inthiscase,itis1024,andhence,allthevalueswillbeinKB.Let’sanalyzetheoutput:

>db.stats(1024)

{

"db":"test",

"collections":3,

"objects":39738,

"avgObjSize":143.32699179626553,

"dataSize":5562,

"storageSize":16388,

"numExtents":8,

"indexes":2,

"indexSize":2243,

"fileSize":196608,

"nsSizeMB":16,

"dataFileVersion":{

"major":4,

"minor":5

},

"ok":1

}

Thefollowingtableshowsthemeaningoftheimportantfields:

Field Description

db Thisisthenameofthedatabasewhosestatsarebeingviewed.

collections Thisisthetotalnumberofcollectionsinthedatabase.

objects

Thisisthecountofdocumentsacrossallcollectionsinthedatabase.Ifwefindthestatsofacollectionbyexecutingdb.<collection>.stats(),wegetthecountofdocumentsinthecollection.Thisattributeisthesumofcountsofallthecollectionsinthedatabase.

avgObjectSize Thisissimplythesize(inbytes)ofalltheobjectsinallthecollectionsinthedatabase,dividedbythecountofthedocumentsacrossallthecollections.Thisvalueisnotaffectedbythescaleprovidedeventhoughthisisasizefield.

dataSizeThisisthetotalsizeofthedataheldacrossallthecollectionsinthedatabase.Thisvalueisaffectedbythescaleprovided.

storageSizeThisisthetotalamountofstorageallocatedtocollectionsinthisdatabaseforstoringdocuments.Thisvalueisaffectedbythescaleprovided.

numExtentsThisisthecountofalltheextentsinthedatabaseacrossallthecollections.ThisisbasicallythesumofnumExtentsinthecollectionstatsforcollectionsinthisdatabase.

indexes Thisisthesumofindexesacrossallcollectionsinthedatabase.

indexSizeThisisthesize(inbytes)foralltheindexesofallthecollectionsinthedatabase.Thisvalueisaffectedbythescaleprovided.

fileSize

Thisissimplytheadditionofthesizeofallthedatabasefilesyoushouldfindonthefilesystemforthisdatabase.Thefileswillbenamedtest.0,test.1,andsoonforthetestdatabase.Thisvalueisaffectedbythescaleprovided.

nsSizeMB ThisisthesizeofthefileinMBsforthe.nsfileofthedatabase.

AnotherthingtonoteisthevalueofavgObjectSize,andthereissomethingweirdinthisvalue.Unlikethisveryfieldinthecollection’sstats,whichisaffectedbythevalueofthescaleprovided,indatabasestatsthisvalueisalwaysinbytes,whichisprettyconfusingandonecannotreallybesurewhythisisnotscaledaccordingtotheprovidedscale.

SeealsoInstantMongoDB,PacktPublishing(https://www.packtpub.com/big-data-and-business-intelligence/instant-mongodb-instant)

DisablingthepreallocationofdatafilesDatafilesarepreallocatedinMongoandfilledwithzerosevenbeforedataisinsertedintocollectionstopreventdiskfragmentation.Thesedatafilesareallocatedstartingfrom64MBforthefirst,128MBforthesecond,256MBforthethird,andsoon,tillamaximumsizeof2GBafterwhichallfileswouldbe2GB.Thoughthesepreallocateddatafilespreventdiskfragmentation,prepopulatingrequirestimeandconsumesdiskspace.However,preallocatingsuchfilesjustwhenthedataisinsertedcantakeasignificantamountoftimeandthus,Mongopreallocatesanadditionalfileinthebackgroundandkeepsanadditionaldatafileready.If,however,thispreallocationisnotdesiredfor,say,atestdatabase,wherequickstartupisdesiredandlessdiskspaceconsumptionismoreimportant,thispreallocationcanbedisabled.This,however,shouldnotbedoneonproductionsystems.

Howtodoit…Whenstartingaserver,wecanstarttheMongoDBserverwiththe--nopreallocflagtodisablethispreallocation.Forinstance,aserverstartedtolistenonthedefaultportwithpreallocationdisabledwillbestartedasfollows:

$mongod--noprealloc

ManuallypaddingadocumentWithoutgettingtoomuchintotheinternalsofstorage,MongoDBusesmemory-mappedfiles,whichmeansthatthedataisstoredinfilesexactlyasitwouldbeinmemory;itwilluselow-levelOSservicestomapthesepagestomemory.ThedocumentsarestoredincontinuouslocationsinMongodatafilesandtheproblemariseswhenthedocumentgrowsandnolongerfitsinthespace.Insuchscenarios,Mongorewritesthedocumenttowardstheendofthecollectionwiththeupdateddataandclearsupthespacewhereitwasoriginallyplaced(notethatthisspaceisnotreleasedtotheOSasfreespace).

Thisisnotabigproblemforapplications,whichdon’texpectthedocumentstogrowinsize;however,thisisabigperformancehitforthosewhoforeseethisgrowthinthedocumentsizeoveraperiodoftimeandpotentially,alotofsuchdocumentmovements.ThepaddingFactorfield,thatwesawintheViewingcollectionstatsrecipe,getsupdatedoveraperiodoftime,tosomeextent,andallocatessomebufferforthedocumenttogrow.However,thisisonlyoveraperiodoftimeoncealotofdocumentshavealreadybeenmovedacrossthecollectionandtheMongoDBserveradjuststhepaddingsize.Moreover,atthetimeofwriting,thispaddingfactorcannotbesetinanywayforthecollectionbeforehand,basedonyouranticipatedincreaseinthesizeofthedocument,tocounterthisdocument’srewritesbyMongo,andissettoadefaultvalueof1.However,thereisasmalltrickthatdoesletyoudothis,andthatiswhatwewillseeinthisrecipe.Thisisacommonlyusedpracticeforsuchrequirements.

GettingreadyNothingisparticularlyneededforthisrecipe,unlessyouplantotryoutthissimpletechnique;inwhichcase,youwouldneedasingleinstanceupandrunning.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,forhowtostarttheserver.

Howtodoit…Theideaofthistechniqueistoaddsomedummydatatothedocumentthatistobeinserted.Thisdummydata’ssize,inadditiontootherdatainthedocument,isapproximatelythesameastheanticipatedsizeofthedocument.

Forexample,iftheaveragesizeofthedocumentisestimatedtobearound1200bytesoveraperiodoftime,andthereis300bytesofdatapresentinthedocumentwhileinsertingit,wewilladdadummyfieldthatisaround900bytesinsize,sothatthetotaldocumentsizesumsupto1200bytes.

Oncethedocumentisinserted,weunsetthisdummyfield,whichleavesaholeinthefilebetweenthetwoconsecutivedocuments.Thisemptyspacewillthenbeusedwhenthedocumentgrowsoveraperiodoftime,minimizingthedocument’smovements.Thisisnotafoolproofmethod,asanydocumentgrowingbeyondtheanticipatedaveragegrowthwillhavetobecopiedbytheservertotheendofthecollection.Also,documentsnotgrowingtotheanticipatedsizetendtowastediskspace.

Theapplicationscancomeupwithanintelligentstrategyto,perhaps,adjustthesizeofthepaddingfieldbasedonafieldinthedocumenttotakecareoftheseshortcomings.However,thisissomethingthatisuptotheapplicationdevelopers.

Letusnowseeasampleofthisapproach:

1. WedefineasmallfunctionthatwilladdafieldcalledpadFieldwithanarrayofstringvaluestothedocumentasfollows:

functionpadDocument(doc){

doc.padField=[]

for(i=0;i<20;i++){

doc.padField[i]='Dummy'

}

}

ItwilladdanarraycalledpadFieldandastringcalledDummy20times.Thereisnorestrictiononwhattypeyouaddtothedocumentandhowmanytimesitisaddedaslongasitconsumesthespaceyoudesire.Theprecedingcodesnippetisjustasample.

2. Thenextstepistoinsertadocument.Wewilldefineanotherfunctioncalledinsertinthefollowingmanner:

functioninsert(collection,doc){

//1.PadthedocumentwithpadField

padDocument(doc);

//2.Createorstorethe_idfieldthatwouldbeusedlater

if(typeof(doc._id)=='undefined'){

_id=ObjectId()

doc._id=_id

}

else{

_id=doc._id

}

//3.Insertthedocumentwiththepaddedfield

collection.insert(doc)

//4.Removethepaddedfield.Usethesaved_idtofindthedocument

tobeupdated.

collection.update({'_id':_id},{$unset:{'padField':1}})

}

3. WewillnowputthisintoactionbyinsertingadocumentinthetestColcollectioninthefollowingmanner:

insert(db.testCol,{i:1})

4. YoumayquerythetestColcollectionusingthefollowingqueryandcheckwhethertheinserteddocumentexistsornot:

>db.testCol.findOne({i:1})

Notethatonquerying,youwouldnotfindpadFieldinthetestColcollection.However,thespaceonceoccupiedbythearraystaysbetweenthesubsequentlyinserteddocumentsevenifthefieldwasunset.

Howitworks…Theinsertfunctionisself-explanatoryandhascommentsinittotellyouwhatitdoes.Anobviousquestionis,howcanwebesurethisisindeedwhatweintendedtodo?Forthispurpose,weshalldoasmallactivityasfollows.WewillworkonamanualPadTestcollectionforthispurpose.FromtheMongoshell,executethefollowingcommands:

>db.manualPadTest.drop()

>db.manualPadTest.insert({i:1})

>db.manualPadTest.insert({i:2})

>db.manualPadTest.stats()

TakenoteoftheavgObjSizefieldinthestats.Next,executethefollowingcommandsfromtheMongoshell:

>db.manualPadTest.drop()

>insert(db.manualPadTest,{i:1})

>insert(db.manualPadTest,{i:2})

>db.manualPadTest.stats()

TakenoteoftheavgObjSizefieldinthestats.Thisfigureismuchlargerthantheonewesawearlierinaregularinsertwithoutpadding.ThepaddingFactorfield,asweseeinbothcases,stillis1,butthelattercasehasmorebufferforthedocumenttogrow.

Onecatchintheinsertfunctionweusedinthisrecipeisthattheinsertintothecollectionandtheupdatedocumentoperationsarenotatomic.

UnderstandingthemongostatandmongotoputilitiesMostofyoumightfindthesenamessimilartotwopopularUnixcommands,iostatandtop.ForMongoDB,mongostatandmongotoparetwoutilitiesthatdoprettymuchthesamejobasthetwoUnixcommands,andthereisnoprizeforguessingthattheseareusedtomonitortheMongoinstance.

GettingreadyInthisrecipe,wewillbesimulatingsomeoperationsonastandaloneMongoinstancebyrunningascriptthatwillattempttokeepyourserverbusy;then,inanotherterminal,wewillberunningtheseutilitiestomonitorthedbinstance.

Youneedtostartastandaloneserverlisteningtoanyportforclientconnections;inthiscase,wewillsticktothedefault27017.Incaseyouarenotawareofhowtostartastandaloneserver,refertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer.WealsoneedtodownloadtheKeepServerBusy.jsscriptfromthebook’swebsiteandkeepithandyforexecutiononthelocaldrive.Also,itisassumedthatthebindirectoryofyourMongoinstallationispresentinthepathvariableofyouroperatingsystem.Ifnot,thenthesecommandsneedtobeexecutedwiththeabsolutepathoftheexecutablefromtheshell.ThemongostatandmongotoputilitiescomeasstandardwiththeMongoinstallation.

Howtodoit…Let’stakealookatthestepsindetail:

1. StarttheMongoDBserver.Letitlistentothedefaultportforconnections.2. Inaseparateterminal,executetheKeepServerBusy.jsJavaScriptasfollows:

$mongoKeepServerBusy.js--quiet

3. OpenanewOSterminalandexecutethefollowingcommand:

$mongostat

4. CapturetheoutputcontentforsometimeandthenhitCtrl+Ctostopthecommandfromcapturingmorestats.Keeptheterminalopenorcopythestatstoanotherfile.

5. Nowexecutethefollowingcommandfromtheterminal:

$mongotop

6. CapturetheoutputcontentforsometimeandthenhitCtrl+Ctostopthecommandfromcapturingmorestats.Keeptheterminalopenorcopythestatstoanotherfile.

7. HitCtrl+Cintheshell,wheretheKeepServerBusy.jsJavaScriptwasexecuted,tostoptheoperationthatkeepstheserverbusy.

Howitworks…Letusseewhatwehavecapturedfromthesetwoutilities.Westartbyanalyzingmongostat.Onmylaptop,theoutputofthe$mongostatcommandisasfollows:

$mongostat

connectedto:127.0.0.1

insertqueryupdatedeletegetmorecommandflushesmappedvsizeres

faultslockeddb

idxmiss%qr|qwar|awnetInnetOut conntime

100011000179411|00320m808m

54m37test:85.7%00|00|1271k94k

223:24:30

200011326120611|00320m808m

54m113test:83.3%00|00|1339k51k

223:24:31

10001952100011|00320m808m

54m28test:84.4%00|00|1219k51k

223:24:32

771722100011|00320m808m

54m87test:73.0%00|00|1131k51k

223:24:33

9231100079211|00320m808m

54m42 test:83.3%00|00|0206k

51k223:24:34

10001100093411|00320m808m

54m150test:84.6%00|00|1220k51k

223:24:35

10001100092011|00320m808m

54m13test:84.9%00|00|1219k51k

223:24:36

YoumaychoosetolookatwhattheKeepServerBusy.jsscriptisdoingtokeeptheserverbusy.Allitdoesisinsert1000documentsinthemonitoringTestcollection;updatethemonebyonetosetanewkeyinthem;executeafindanditeratethroughallofthem;andfinally,deletethemonebyone.Basically,itisawrite-intensiveoperation.

Theoutputdoeslookuglywiththecontentwrapping,butletusanalyzethefieldsonebyoneandseewhattolookoutfor.Thefollowingtablegivesadescriptionofeachcolumn:

Column(s) Description

insert,query,update,anddelete

Thesearethefirstfourcolumnsindicatingthenumberofinsert,query,update,anddeleteoperationspersecond.Itispersecondasthetimeframeinwhichthesefiguresarecapturedisseparatedby1second,whichisindicatedbythelastcolumn.

getmore

Thisisusedwhenthecursorrunsoutofdataforthequery;itexecutesagetmoreoperationontheservertogetmoreresultsforthequeryexecutedearlier.Thiscolumnshowsthenumberofgetmoreoperationsexecutedinthisgiventimeframeof1second.Inourcase,notmanygetmoreoperationsareexecuted.

command

Thisshowsthenumberofcommandsexecutedontheserverinthegiventimeframeof1second.Inourcase,itwasn’tmuchandwasonly1.Thenumberaftera|is0inourcaseasthiswasinthestandalonemode.Tryexecutingmongostatconnectingtoareplicasetprimaryandsecondary.Youshouldseeslightly

differentfiguresthere.

flushes Thisisthenumberoftimesdatawasflushedtothediskinanintervalof1second.

mapped,vsize,andres

MappedmemoryistheamountofmemorymappedbytheMongoprocesstothedatabase.Thistypicallywillbesameasthesizeofthedatabase.Virtualmemoryontheotherhandisthememoryallocatedtotheentiremongodprocess.Thistypicallywillbemorethantwicethesizeofmappedmemory,especiallywhenjournalingisenabled.TheresidentmemoryisthephysicalmemoryusedbyMongo.AllthesefiguresaregiveninMB.ThetotalamountofphysicalmemorymightbealotmorethanwhatisbeingusedbyMongo,butthatisnotaconcernunlessalotofpagefaultsoccur,(wesawthisintheprecedingpoint).

faults

Thesearethenumberofpagefaultsoccurringpersecond.Thesenumbersshouldbeaslowaspossible.ItindicatesthenumberoftimesMongohadtogotodisktoobtainthedocument/indexthatwasmissinginthemainmemory.ThisproblemisnotasbigaproblemwhenusingSSDforpersistentstorageasitiswhenusingspinningdiskdrives.

locked

Fromversion2.2,allwriteoperationstoacollectionlockthedatabaseinwhichthecollectionisanddonotacquireaglobal-levellock.Thisfieldshowsthedatabasethatwaslockedforthemajorityoftimeinagiventimeinterval.Inourcase,thetestdatabaseislocked.

idxmiss

%

Thisfieldgivesthenumberoftimesaparticularindexwasneededandwasnotpresentinmemory.Thiscausesapagefault,andthediskneedstobeaccessedtogettheindex.Anotherdiskaccessmightbeneededtogetthedocumentaswell.Thisfiguretooshouldbelow.Ahighpercentageofindexmissesissomethingthatwillneedattention.

qr|qw

Thesearethequeued-upreadsandwritesthatarewaitingforthechancetobeexecuted.Ifthisnumbergoesup,itshowsthatthedatabaseisgettingoverwhelmedbythevolumeofreadsandwrites,whicharemorethanitcanhandle.Thepagefaultswithmemorystatsandadatabaselockpercentagearesomeofthestatsthatneedtobeexaminedaswellifthisfigureishigh.Ifthedatasetistoolarge,shardingthecollectioncanimprovetheperformancesignificantly.

ar|awThisisthenumberofactivereadersandwriters(clients).Notsomethingtoworryaboutevenforalargenumber,aslongasotherstatswesawearlierareundercontrol.

netInandnetOut

ThisisthenetworktrafficinandoutoftheMongoDBserverinthegiventimeframe.Thefigureisinbits.Forexample,271kbitmeans271kilobits.

connThisindicatesthenumberofopenconnections.Somethingtokeepawatchontoseethisdoesn’tkeepgettinghigher.

time Thisisthetimeintervalwhenthissamplewascaptured.

Therearesomemorefieldsseenifmongostatisconnectedtoareplicasetprimaryorsecondary.Asanassignment,oncethestatsorastandaloneinstancearecollected,startareplicasetserverandexecutethesamescripttokeeptheserverbusy.Usemongostattoconnecttoprimaryandsecondaryinstancesandseeifdifferentstatsarecaptured.

Apartfrommongostat,wealsousedthemongotoputilitytocapturethestats.Letusseeitsoutputandmakesomesenseoutofit:

$mongotop

connectedto:127.0.0.1

nstotalread

write

2014-01-15T17:55:13

test.monitoringTest899ms1ms898ms

test.system.users0ms0ms0ms

test.system.namespaces0ms0ms0ms

test.system.js0ms0ms0ms

test.system.indexes0ms0ms0ms

nstotalread

write

2014-01-15T17:55:14

test.monitoringTest959ms0ms959ms

test.system.users0ms0ms0ms

test.system.namespaces0ms0ms0ms

test.system.js0ms0ms0ms

test.system.indexes0ms0ms0ms

nstotalread

write

2014-01-15T17:55:15

test.monitoringTest954ms1ms953ms

test.system.users0ms0ms0ms

test.system.namespaces0ms0ms0ms

test.system.js0ms0ms0ms

test.system.indexes0ms0ms0ms

Thereisnotmuchtolookatinthisstat.Weseethetotaltimeforwhichthedatabasewasbusyreadingorwritinginthegivensliceof1second.Thevaluegiveninthetotalwillbethesumofthereadandthewritetime.Ifweactuallycomparethemongotopandmongostatutilitiesforthesametimeslice,thepercentageoftimedurationforwhichthewritewastakingplacewillbeveryclosetothefiguregiveninthepercentagetimethedatabasewaslockedinthemongostat’soutput.

Themongotopcommandacceptsaparameteronthecommandlineasfollows:

$mongotop5

Inthiscase,theintervalafterwhichthestatswillbeprintedoutwillbe5seconds,asagainstthedefaultvalueof1second.

SeealsoTheEstimatingtheworkingsetrecipe,tolearnhowtoestimatetheworkingsetusingtheworkingsetestimatorcommandintroducedinMongo2.4TheViewingandkillingthecurrentlyexecutingoperationsrecipe,tolearnhowtogetthecurrentexecutingoperationsfromtheshellandkillthemifneededTheUsingprofilertoprofileoperationsrecipetolearnhowtousethein-builtprofilingfeatureofMongotologoperation’sexecutiontime

EstimatingtheworkingsetWestartbydefiningwhattheworkingsetis.Itisasubsetofthetotaldatafrequentlyaccessedbytheapplication.Inanapplication,whichstoresinformationoveraperiodoftime,theworkingsetismostlytherecentlyaccesseddata.Theword”recently”issubjective;forsomeitmightbeadayortwo,forothersitmightbeacoupleofmonths.Thisismostlysomethingthatneedstobethoughtofwhiledesigningtheapplicationandsizingthedatabase.TheworkingsetissomethingthatneedstobeintheRAMofthedatabaseservertominimizethepagefaultsandgettheoptimumperformance.

Inthisrecipe,wewillseeawaythatgivestheestimateofyourworkingsetandisafeatureintroducedinMongo2.4.Theword”estimator”isslightlymisleading,astheinitialsizingstillisamanualactivity,andthesystemdesignersneedtobejudiciousabouttheserverconfiguration.Theworkingsetestimatorutilitywewillseenowismoreofareactiveapproach,whichwillkickinoncetheapplicationisupandrunning.Itprovidesmetricsthatcanbeusedbymonitoringtools,andtellsusiftheRAMontheservercanaccommodatetheworkingsetorifthesetoutrunstheavailableRAM.Thisthendemandssomeresizingofthehardwareorscalingofthedatabasehorizontally.

GettingreadyInthisrecipe,wewillbesimulatingsomeoperationsonastandaloneMongoinstance.Weneedtostartastandaloneserverlisteningtoanyportforclientconnections;inthiscase,wewillsticktothedefault27017.Incaseyouarenotawareofhowtostartastandaloneserver,refertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer.ConnecttotheserverfromtheMongoshell.

Howtodoit…Theworkingsetnowisapartoftheserver’sstatusoutput.ThereisafieldcalledworkingSet,whosevalueisadocumentthatgivestheseestimates.

ThisworkingsetisnotavailableaspartofthestandardserverStatuscommandandneedstobedemandedexplicitly.Itisnotanoperationcheaponresources,andthusneedstobemonitoredifitisexecutedfrequently.Frequentinvocationscanhaveadetrimentaleffectontheperformanceoftheserver.

WeneedtorunthefollowingcommandfromtheMongoshelltogettheworkingsetestimates:

>db.runCommand({serverStatus:1,workingSet:1}).workingSet

{

"note":"thisIsAnEstimate",

"pagesInMemory":6188,

"computationTimeMicros":11524,

"overSeconds":3977

}

Howitworks…Therearejustfourfieldsinthisdocumentfortheworkingsetestimate,withthefirstjuststatingintextthatthisisanestimate.ThepagesInMemory,computationTimeMicros,andoverSecondsfieldsaresomethingwewillbemoreinterestedin.

WewilllookattheoverSecondsfieldfirst.ThisisthetimeinsecondsbetweenthefirstandthelastpageloadedbyMongointhememory.Whentheserverisstarted,thisvaluewillobviouslybelessbuteventually,withmoredatabeingaccessedwithtime,morepageswillbeloadedbyMongointhememory.IftheRAMavailableisabundant,thefirstloadedpagewillstayinmemoryandnewpageswillcontinuetoloadasandwhenneeded.Hence,thetimewillalsoincrease,asthedifferencebetweenthemostrecentlyloadedpageandtheoldestpagewillincrease.Ifthistimestayslow,orevendecreases,wecansaythattheoldestandnewestpageinMongowereloadedinjustthenumberofsecondsgivenbythisfigure.ThiscanbeanindicationthatthenumberofpagesaccessedandloadedinmemorybytheMongoDBserverismorethanthosethatcanbeheldinmemory.AsMongousestheleastrecentlyused(LRU)policytoevictapagefromthememorytomakespaceforthenewpage,wepossiblyareriskingevictingpagesthatmightbeneededagain,causingmorepagefaults.

ThisiswherethepagesInMemoryfieldcomesin.Thistellsus,overaperiodoftime,thenumberofpagesMongoloadedinthememory.Eachpagemultipliedbyaround4KBgivesthesizeofdataloadedinthememoryinbytes.Thus,ifalldataisbeingaccessedaftertheserverisstarted,thissizewillbearoundyourdatasize.Thisnumberwillkeepincreasingwithtimehencethisfield,inconjunctionwiththeoverSecondsfield,isanimportantstatistic.

Thefinalfield,computationTimeMicros,givesthetimeinmicrosecondstakenbytheservertogivethisstatisticfortheworkingset.Aswecansee,itisnotanincrediblycheapoperationtoexecuteandthus,thisstatisticshouldbedemandedwithcaution,especiallyonhigh-throughputsystems.

ViewingandkillingthecurrentlyexecutingoperationsInthisrecipe,wewillseehowtoviewthecurrentrunningoperationsandkillsomeoperationsthathavebeenrunningforalongtime.

GettingreadyInthisrecipe,wewillsimulatesomeoperationsonastandaloneMongoinstance.Weneedtostartastandaloneserverlisteningtoanyportforclientconnections;inthiscase,wewillsticktothedefault27017.Ifyouarenotawareofhowtostartastandaloneserver,refertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer.Wealsoneedtostarttwoshellsconnectedtotheserverstarted.Oneshellwillbeusedforbackgroundindexcreation,andtheotherwillbeusedtomonitorthecurrentoperationandthenkillit.

Howtodoit…Unlikeinourtestenvironment,wewillnotbeabletosimulatetheactuallong-runningoperation.We,however,willtrytocreateanindexandhopeittakesalongtimetocreate.Dependingonyourtargethardwareconfiguration,theoperationmaytakesometime.Let’sseethestepsindetail:

1. Tostartthistest,letusexecutethefollowingcommandontheMongoshell:

>db.currentOpTest.drop()

>for(i=1;i<10000000;i++){db.currentOpTest.insert({'i':i})}

Theprecedinginsertionmighttakesometimetoinsert10milliondocuments.

Oncethedocumentsareinserted,wewillexecuteanoperationthatwillcreatetheindexinthebackground.Ifyouwouldliketoknowmoreaboutindexcreation,refertotherecipeBackgroundandforegroundindexcreationfromtheshellinChapter2,Command-lineOperationsandIndexes,butitisnotaprerequisiteforthisrecipe.

2. Createabackgroundindexontheifieldinthedocument.Thisindex-creationoperationiswhatwewillbeviewingfromthecurrentOpoperationandiswhatwewillattempttokillbyusingthekilloperation.Executethefollowingcommandinoneshelltoinitiatethebackgroundindexcreationoperation:

>db.currentOpTest.ensureIndex({i:1},{background:1})

Thistakesafairlylongtime;onmylaptop,ittookwellover100seconds

3. Inthesecondshell,executethefollowingcommandtogetthecurrentexecutingoperations:

>db.currentOp().inprog

4. Takeanoteofthein-progressoperationsandfindtheoneforindexcreation.Inourcase,onthetestmachine,itwastheonlyoneinprogress.Itwillbeanoperationonsystem.indexesandtheoperationwillbeinsert.Thekeystolookoutforintheoutputdocumentarensandoprespectively.Weneedtonotethefirstfield,namelyopid,ofthisoperation.Inthiscase,itis11587458.Thesampleoutputofthecommandisgiveninthenextsection.

5. Killtheoperationfromtheshellusingopid,whichwegotearlier:

>db.killOp(11587458)

Howitworks…Wewillsplitourexplanationintotwosections,thefirstaboutthecurrentoperationdetailsandthesecondaboutkillingtheoperation.

Theindexcreationprocess,inourcase,isthelong-runningoperationthatweintendtokill.Wecreateabigcollectionwithabout10milliondocuments,andinitiateabackgroundindexcreationprocess.

Onexecutingthedb.currentOp()operation,wegetadocumentastheresult,withaninprogfieldwhosevalueisanarrayofotherdocuments,eachrepresentingacurrentlyrunningoperation.Itiscommontogetabiglistofdocumentsonabusysystem.Thefollowingisadocumenttakenfortheindexcreationoperation:

{

"opid":11587458,

"active":true,

"secs_running":31,

"op":"insert",

"ns":"test.system.indexes",

"insert":{

"v":1,

"key":{

"i":1

},

"ns":"test.currentOpTest",

"name":"i_1",

"background":1

},

"client":"127.0.0.1:50895",

"desc":"conn10",

"connectionId":10,

"locks":{

"^":"w",

"^test":"W"

},

"waitingForLock":false,

"msg":"bgindexbuildBackgroundIndexBuildProgress:2214738/10586935

20%",

"progress":{

"done":2214740,

"total":10586935

},

"numYields":3070,

"lockStats":{

"timeLockedMicros":{

"r":NumberLong(0),

"w":NumberLong(53831938)

},

"timeAcquiringMicros":{

"r":NumberLong(0),

"w":NumberLong(31387832)

}

}

}

Wewillseewhatthesefieldsmeaninthefollowingtable:

Field Description

opid ThisisauniqueoperationIDidentifyingtheoperation.ThisistheIDtobeusedtokillanoperation.

active

TheBooleanvalueindicateswhethertheoperationhasstartedornot.Itisfalseonlyifitiswaitingtoacquirethelocktoexecutetheoperation.Thevaluewillbetrueonceitstarts,evenifatapointoftimewhereithasyieldedthelockandisnotexecuting.

secs_running Thisgivesthetimetheoperationisexecutingforinseconds.

opThisindicatesthetypeoftheoperation.Inthecaseofindexcreation,itisinsertedintoasystemcollectionofindexes.Thepossiblevaluesareinsert,query,getmore,update,remove,andcommand.

nsThisisafullyqualifiednamespaceforthetarget.Itwillbeofthe<databasename>.<collectionname>form.

insert Thisshowsthedocumentthatwillbeinsertedinthecollection.

query Thisisafieldthatwillbepresentforoperationsotherthantheinsertandgetmorecommands.

client ThisistheIPaddress/hostnameandtheportoftheclientwhoinitiatedtheoperation.

desc Thisisthedescriptionoftheclient,mostlytheclient’sconnectionname.

connectionId Thisistheidentifieroftheclientconnectionfromwhichtherequestoriginated.

locks

Thisisadocumentcontainingthelocksheldforthisoperation.Thedocumentshowsthelocksheldfortheoperationbeinganalyzedforvariousdatabases.The^indicatesgloballockand^testindicatesthelockonthetestdatabase.Thevalueshereareinteresting.Thevalueof^isw(lowercase).Thismeansthatitisnotanexclusivewritelock,andmultipledatabasescanwriteconcurrently.Itisalockheldatthedatabaselevel.^testhasavalueW,whichisaglobalwritelock.Thismeansthatthewritelockonthetestdatabaseisexclusiveandnootheroperationonanydatabasecanoccurwhenthislockisheld.TheprecedingoutputisforVersion2.4ofMongo.

waitingForLock

Thisfieldindicateswhethertheoperationiswaitingforalocktobeacquired.Forinstance,iftheprecedingindexcreationwasnotabackgroundprocess,otheroperationsonthisdatabasewouldqueueupforthelocktobeacquired.Thisflagforthoseoperationswillthenbetrue.

msgThisisahuman-readablemessagefortheoperation.Inthiscase,wedoseeapercentageofoperationcomplete,asthisisanindexcreationoperation.

progress

Thisisthestateoftheoperation.Thetotalgivesthetotalnumberofdocumentsinthecollectionanddonegivesthenumbersindexedsofar.Inthiscase,thecollectionalreadyhadsomemoredocuments(over10milliondocuments).Thepercentageofoperationcompletediscomputedfromthesefigures.

numYields

Thisisthenumberoftimestheprocesshasyieldedthelocktoallowotheroperationstoexecute.Asthisisabackgroundindexcreationprocess,thisnumberwillkeeponincreasingastheserveryieldsitfrequentlytoletotheroperationsexecute.Haditbeenaforegroundprocess,thelockwouldneverbeyieldedtilltheoperationcompletes.

lockStats

Thisdocumenthasmorenesteddocumentsgivingstatsofthetotaltimethisoperationhasheldthereadorwritelock,andalsothetimeitwaitedtoacquirethelock.Thefollowingarethepossiblevalues:

r:Thisisthetimelockedforaspecific(databaselevel)readlockw:Thisisthetimelockedforaspecific(databaselevel)writelockR:ThisisthetimelockedforglobalreadlockW:Thisisthetimelockedforglobalwritelock

Ifyouhaveareplicaset,therewillbemanymoregetmoreoperationsonoplogontheprimaryfromsecondary.

Toseeifsystemoperationsareexecuted,weneedtopassatruevalueastheparametertothecurrentOpfunctioncallasfollows:

>db.currentOp(true)

Next,wewillseehowtokilltheuser-initiatedoperationusingthekillOpfunction.Theoperationissimplycalledasfollows:

>db.killOp(<operationid>)

Inourcase,theindexcreationprocesshadtheprocessID11587458andthusitwillbekilledasfollows:

>db.killOp(11587458)

Onkillinganyoperation,irrespectiveofwhetherthegivenoperationIDexistsornot,weseethefollowingmessageontheconsole:

{"info":"attemptingtokillop"}

Thus,seeingthismessagedoesn’tmeanthattheoperationwaskilled.Itjustmeansthattheoperation,ifitexists,willbeattempted.

IfanoperationcannotbekilledimmediatelyandifthekillOpcommandisissuedforit,thekillPendingfieldincurrentOpwillstartappearingforthegivenoperation.Forexample,executethefollowingqueryontheshell:

>db.currentOpTest.find({$where:'sleep(100000)'})

Thiswillnotreturn,andthethreadexecutingthequerywillsleepfor100seconds.ThisisanoperationthatcannotbekilledusingkillOp.TryexecutingcurrentOpfromanothershell(donottabforautocompletion;yourshellmayjusthang),gettheoperationID,andthenkillitusingthekillOpcommand.YoushouldseethattheprocesswillstillberunningifyouexecutethecurrentOpcommand,butthedocumentfortheprocessdetailswillnowcontainanewkey,killPending,statingthatthekillforthisoperationisrequestedbutpending.

UsingprofilertoprofileoperationsInthisrecipe,wewilllookatMongo’sin-builtprofilerthatwillbeusedtoprofiletheoperationsexecutedontheMongoDBserver.Itisautilitythatisusedtologalloperationsortheslowonesandthatcanbeusedtoanalyzetheperformanceoftheserver.

GettingreadyInthisrecipe,wewillbeperformingsomeoperationsonastandaloneMongoinstanceandprofilingthem.Weneedtostartastandaloneserverlisteningtoanyportforclientconnections;inthiscase,wewillsticktothedefault27017.Ifyouarenotawareofhowtostartastandaloneserver,refertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer.Wealsoneedtostartashellthatwillbeusedtoperformquerying,enableprofiling,andviewtheprofilingoperation.

Howtodoit…1. Oncetheserverisstartedandtheshellisconnectedtoit,executethefollowingtoget

thecurrentprofilinglevel:

>db.getProfilingLevel()

Thedefaultlevelshouldbe0(noprofilingifwehavenotsetitearlier)

2. Letussettheprofilinglevelto1(logslowoperationsonly)andlogalltheoperationsslowerthan50ms.Executethefollowingcommandontheshell:

>db.setProfilingLevel(1,50)

3. Now,letusexecuteaninsertoperationintoacollectionandthenexecuteacoupleofqueries:

>db.profilingTest.insert({i:1})

>db.profilingTest.find()

>db.profilingTest.find({$where:'sleep(70)'})

4. Nowexecutethequeryonthefollowingcollectionasfollows:

>db.system.profile.find().pretty()

Howitworks…Profilingissomethingthatwillnotbeenabledbydefault.Ifyouarehappywiththeperformanceofthedatabase,thereisnoreasontoenabletheprofiler.Itisonlywhenonefeelsthatthereissomeroomforimprovementandwantstotargetsomeexpensiveoperationstakingplace.Animportantquestionis,whatclassifiesanoperationtobeslow?Theansweris,itvariesfromapplicationtoapplication.Bydefault,inMongo,slowmeansanyoperationabove100ms.However,whilesettingtheprofilinglevel,youmaychoosethethresholdvalue.

Therearethreepossiblevaluesforprofilinglevels:

0:Disableprofiling1:Enableprofilingforslowoperationswherethethresholdvalueforanoperationtobeclassifiedas”slow”isprovidedwiththecallwhilesettingtheprofilinglevel2:Profilealloperations

Whileprofilingalloperationsmightnotbeaverygoodideaandmightnotcommonlybeused,asweshallsoonsee,settingthevalueto1withathresholdprovidedtoitisagoodwaytomonitorslowoperations.

Ifwelookatthestepsweexecuted,weseethatwecangetthecurrentprofilinglevelbyexecutingthedb.getProfilingLevel()operation.Togetmoreinformation,suchaswhatvalueissetasathresholdfortheslowoperations,wecanexecutedb.getProfilingStatus(),whichreturnsadocumentwiththeprofilinglevelandthethresholdvalueforslowoperations.

Forsettingtheprofilinglevel,wecallthedb.setProfilingLevel()method.Inourcase,wesetitforloggingalloperationstakingmorethan50msasdb.setProfilingLevel(1,50).

Todisableprofiling,simplyexecutedb.setProfilingLevel(0).

Whatwedonextisexecutethreeoperations;onetoinsertadocument,onetofindalldocuments,andfinally,afindthatcallssleepwithavalueof70mstoslowitdown.

Thefinalstepistoseetheseprofiledoperationsthatareloggedinthesystem.profilecollection.Weexecuteafindoperationtoseetheoperationslogged.Formyexecution,theinsertandthefinalfindoperationwiththesleepwerelogged.

Obviously,thisprofilinghassomeoverheadbutanegligibleone.Hence,wewillnotenableitbydefault,butonlywhenwewanttoprofileslowoperations.Also,anotherquestionis,willthisprofilingcollectionincreaseoveraperiodoftime?Theanswerisno,asthisisacappedcollection.Cappedcollectionsarefixed-sizecollections,whichpreserveinsertionordersandactascircularqueuesfillinginthenewdocumentsanddiscardingtheoldestwhenitgetsfull.Aqueryonsystem.namespacesshouldshowthestats.Thequeryexecutionwillshowthefollowingoutputforthesystem.profilecollection:

{"name":"test.system.profile","options":{"capped":true,"size":1048576}}

Aswesee,thesizeofthecollectionis1MB,whichisincrediblysmall.Settingtheprofilinglevelto2willthuseasilyoverwritethedataonbusysystems.Onemayalsochoosetoexplicitlycreateacollection,withthenamesystem.profile,asacappedcollectionofanysizeyouprefer,shouldyouchoosetoretainmoreoperationsinit.Tocreateacappedcollectionexplicitly,youmayexecutethefollowingqueryfromtheMongoshell:

>db.createCollection('system.profile',{capped:1,size:1048576})

Obviously,thesizechosenisarbitrary,andyouarefreetoallocateanysizetothiscollection,basedonhowfrequentlythedatagetsfilledandhowmuchprofilingdatayouwanttokeepbeforeitgetsoverwritten.

Asthisisacappedcollection,andtheinsertionorderispreserved,aquerywiththesortorder{$natural:-1}willbeperfectlyfineandveryefficientatfindingoperationsinthereverseorderofexecutiontime.

Finally,wewilltakealookatthedocumentthatgotinsertedinthesystem.profilecollectionandseewhichoperationsithaslogged:

{

"op":"query",

"ns":"test.profilingTest",

"query":{

"$where":"sleep(70)"

},

"ntoreturn":0,

"ntoskip":0,

"nscanned":1,

"keyUpdates":0,

"numYield":0,

"lockStats":{

"timeLockedMicros":{

"r":NumberLong(188796),

"w":NumberLong(0)

},

"timeAcquiringMicros":{

"r":NumberLong(5),

"w":NumberLong(6)

}

},

"nreturned":0,

"responseLength":20,

"millis":188,

"ts":ISODate("2014-01-27T17:37:02.482Z"),

"client":"127.0.0.1",

"allUsers":[],

"user":""

}

Asweseeintheprecedingdocument,thereareindeedsomeinterestingstats.Letuslookatsomeoftheminthefollowingtable.Someofthesefieldsareidenticaltothefieldsweseewhenweexecutethedb.currentOp()operationfromtheshell:

Field Description

opThisistheoperationthatgotexecuted.Inthiscase,itwasafindoperationandthus,itisaqueryinthiscase.

nsThisisthefullyqualifiednameofthecollectiononwhichtheoperationwasperformed.Itwillbeinthe<database>.<collectionname>format.

query Thisshowsthequerythatgotexecutedontheserver.

nscannedThishasasimilarmeaningtoexplainplanintherelationaldatabase.Itisthetotalnumberofdocumentsandindexentriesscanned.

numYields Thisisthenumberoftimesthelockwasyieldedwhentheoperationwasexecuted.

lockStatsThishassomeinterestingstatsforthetimetakentoacquirethelockandthetimeforwhichthelockwasheld.

nreturned Thisisthenumberofdocumentsreturned.

responseLength Thisisthelengthoftheresponseinbytes.

millis Mostimportantofall,thisisthetimetakeninmillisecondstoexecutetheoperation.

ts Thisisthetimewhentheoperationwasexecuted.

client Thisisthehostname/IPaddressoftheclientwhoexecutedtheoperation.

SettingupusersinMongoDBSecurityisoneofthecornerstonesofanyenterprise-levelsystem.Notalwayswillyoufindasysteminacompletelysafeandsecureenvironmenttoallowunauthenticateduseraccesstoit.Apartfromtestenvironments,almosteveryproductionenvironmentrequiresproperaccessrightsandperhaps,anauditofthesystemaccesstoo.Mongosecurityhasmultipleaspects:

Accessrightsfortheendusersaccessingthesystem.Therewillbemultipleroles,suchasadmin,read-onlyusers,andread/writenonadministrativeusers.Authenticationofthenodesthatareaddedtothereplicaset.Inareplicaset,oneshouldonlybeallowedtoaddauthenticatedsystems.Theintegrityofthesystemwillbecompromisedifanyunauthenticatednodeisaddedtothereplicaset.Encryptionofthedatathatistransmittedacrossthewirebetweenthenodesofthereplicasets,oreventheclientandtheserver(orthemongosprocessinthecaseofashardedsetup).

Inthisrecipeandthenextone,wewillbelookingathowtoaddressthefirsttwopointsmentionedintheprecedingbulletlist.Thelastpoint,aboutencryptingthedatabeingtransmittedonthewire,isnotsupportedbydefaultbythecommunityeditionofMongo,anditwillneedarebuildoftheMongodatabasewiththessloptionenabled.

NoteAllthestepsareexecutedontheMongoDBserverVersion2.4.6,andalltheexplanationsholdtrueforthisversion.TherearequiteafewchangesrelatedtothecontentwediscussherethatarepresentinVersion2.6ofMongoDB.Any2.6-specificdetailswillbementionedasandwhenneeded.

GettingreadyInthisrecipe,wewillbesettingupusersforastandaloneMongoinstance.Weneedtostartastandaloneserverlisteningtoanyportforclientconnections;inthiscase,wewillsticktothedefault27017.Ifyouarenotawareofhowtostartastandaloneserver,refertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer.Wealsoneedtostartashellthatwillbeusedforthisadminoperation.Forareplicaset,wewillonlybeconnectedtoaprimaryinstanceandwillperformtheseoperations.

Howtodoit…Wewilladdanadminuser,aread-onlyuserforthetestdatabase,andaread-writeuserforthetestdatabaseinthisrecipe.

Thefollowingareassumedatthispoint:

Theserverisstarted,upandrunningandweareconnectedtoitfromtheshell.Theserverisstartedwithoutanyspecialcommand-lineargumentotherthanthosementionedinChapter1,InstallingandStartingtheMongoDBServer,forstartingasinglenode.Thus,wehavefullaccesstotheserverforanyuser.

Let’sgetstarted:

1. Thefirststepistocreateanadminuser.Notethat,tillVersion2.4ofMongoDB,themethodnameisaddUser.However,inVersion2.6ofMongoDB,themethodiscreateUser.Wewilllookatbothmethodsofcreatingtheusers.Executesteps3and4ifyouareworkingonaMongoDBserver,Version2.4andsteps5and6ifyouareworkingonaMongoDBserver,Version2.6.

2. ExecutethefollowingcommandintheMongoshelltoswitchtotheadmindatabase:

>useadmin

3. Intheadmindatabase,wewilladdausercalledadminandthepasswordasadmin:

>db.addUser('admin','admin')

{

"user":"admin",

"readOnly":false,

"pwd":"7c67ef13bbd4cae106d959320af3f704",

"_id":ObjectId("52ea98ef2d00f6e6fb1fcdba")

}

4. Wewillnowswitchtothetestdatabaseasfollows:

>usetest

5. Inthetestdatabase,wewillcreatetwousers,aread-onlyusercalledread_userandaread/writeusercalledwrite_user.Thepasswordforboththeseusersisthesameastheirusernames.

6. Executethefollowingcommandstocreatetheseusers:

>db.addUser({user:'read_user',pwd:'read_user',roles:['read']})

{

"user":"read_user",

"pwd":"60477dd7460977860674077dc0039102",

"roles":[

"read"

],

"_id":ObjectId("52ee29012d00f6e6fb1fcdbc")

}

>db.addUser({user:'write_user',pwd:'write_user',roles:

['readWrite']})

{

"user":"write_user",

"pwd":"7944cf3480b0eabbf0cff4498ed9652b",

"roles":[

"readWrite"

],

"_id":ObjectId("52ee292c2d00f6e6fb1fcdbd")

}

7. WewilllookathowtocreateusersintheadminandtestdatabasesinVersion2.6ofMongoDB.ThesestepsareidenticaltoVersion2.4ofMongoDB,exceptforthenameofthemethods.Therearesomeadditionalfeaturesforthismethodthatwewillseeindetailinthenextsection.First,westartbycreatingtheadminuserintheadmindatabase,asfollows:

>useadmin

>db.createUser({

user:'admin',pwd:'admin',

customData:{desc:'Theadminuserforadmindb'},

roles:['readWrite','dbAdmin',clusterAdmin']

}

)

8. Wewilladdread_userandwrite_usertothetestdatabase.Toaddtheusers,executethefollowingcommandsfromtheMongoshell:

>usetest

>db.createUser({

user:'read_user',pwd:'read_user',

customData:{desc:'Thereadonlyuserfortestdatabase'},

roles:['read']

}

)

>db.createUser({

user:'write_user',pwd:'write_user',

customData:{desc:'Theread/writeuserfortestdatabase'},

roles:['readWrite']

}

)

9. NowshutdowntheMongoDBserverandtheclosetheshelltoo.RestarttheMongoDBserverbutwiththe--authoptiononthecommandline,asfollows:

$mongod..<otheroptionsasprovidedearlier>--auth

10. NowconnecttotheserverfromthenewlyopenedMongoshellandexecutethefollowingcommand:

>db.testAuth.find()

ThetestAuthcollectionneednotexist,butyoushouldseeanerrorstatingthatwearenotauthorizedtoquerythecollection

11. Wewillnowloginfromtheshellusingread_userasfollows:

>db.auth('read_user','read_user')

12. Wewillnowexecutethesamefindoperationasfollows(notethatthefindoperationshouldnotgiveanerrorandmaynotreturnanyresults,dependingonwhetherthecollectionexistsornot):

>db.testAuth.find()

13. Nowwewilltrytoinsertadocumentasfollows(notethatweshouldgetanerrorstatingthatyouarenotauthorizedtoinsertdatainthiscollection):

>db.testAuth.insert({i:1})

14. Wewillnowlogoutandloginagain,butwithawriteuserasfollows.Notethedifferenceinthewayweloginthistimearound,asagainstthepreviousinstance.Weareprovidingadocumentastheparametertotheauthfunction,whereas,inthepreviouscase,wepassedtwoparametersfortheusernameandpassword.

>db.logout()

>db.auth({user:'write_user',pwd:'write_user'})

15. Nowexecutetheinsertoperationagainasfollows(thistime,itshouldwork):

>db.testAuth.insert({i:1})

16. Nowexecutethefollowingcommandontheshell.Youshouldgettheunauthorizederror:

>db.serverStatus()

17. Wewillnowswitchtotheadmindatabase.Wearecurrentlyconnectedtotheserverusingwrite_user,whichhasread/writepermissionsonthetestdatabase.FromtheMongoshell,trytoexecutethefollowingcommands:

>useadmin

>showcollections

18. ClosetheMongoshelloropenanewshell,asfollows,fromtheoperatingsystem’sconsole.Thisshouldtakeusdirectlytotheadmindatabase:

$mongo-uadmin-padminadmin

19. Nowexecutethefollowingontheshell.Itshouldshowusthecollectionsintheadmindatabase:

>showcollections

20. Tryandexecutethefollowingoperation:

>db.serverStatus()

ExecutethisstepifyouareonVersion2.4ofMongoDBandcreatetheadminuserusingthedb.addUser('<username>','<password>').

21. Switchtothetestdatabaseandexecutetheinsertandfindoperationsasfollows:

>usetest

>db.testAuth.insert({i:1})

>db.testAuth.find()

Howitworks…Weexecutedalotofstepsandnowwewilltakeacloserlookatthem.

Initially,theserverisstartedwithoutthe--authoption;hence,nosecurityisenforcedbydefault.

Version2.4ofMongoDBiswherewecreateauserintheadmindatabaseusingtheaddUser(<userName>,<password>)formofthemethod.Thiscreatesauserintheadmindatabase;thisspecialuserhasread/writeaccesstoallthedatabasesandcanrunadmincommands,suchasdb.serverStatus(),andotherreplicationandsharding-relatedcommands.Alluserscreatedindatabasesotherthanadmin,whetherreadorwrite,willonlybeabletoaccessthecollectionsintheirrespectivedatabases.

Inversion2.6,however,wecreatetheadminuserusingthedb.createUsermethod.Letustakeacloserlookatthismethodfirst.ThesignatureofthemethodtocreatetheuseriscreateUser(user,writeConcern).Thefirstparameteristheuser,whichactuallyisaJSONdocument,andthesecondparameteristhewriteconcerntouseforusercreation.TheJSONdocumentfortheuserhasthefollowingformat:

{

'user':<username>,

'pwd':<password>,

'customData':{<JSONdocumentprovidinganyuserspecificdata>}

'roles':[<rolesoftheuser>]

}

Therolesprovidedherecanbeprovidedasfollows,assumingthatthecurrentdatabasewhentheuseriscreatedistestontheshell:

[{'role':'read','db':'reports'},'readWrite']

ThisgivestheuserthatisbeingcreatedreadaccesstothedbreportsandreadWriteaccesstothetestdatabase.Letusseethecompleteusercreationcallforthetestuser:

>usetest

>db.createUser({

user:'test',pwd:'test',

customData:{desc:'readaccessonreportsandreadWriteaccessontest'},

roles:[

{role:'read',db:'reports'},

'readWrite'

]

}

)

Thewriteconcern,whichisanoptionalparameter,canbeprovidedastheJSONdocument.Somesamplevaluesare{w:1}and{w:'majority'}.

Comingbacktotheadminusercreation,wecreatedtheuserinstep4usingthecreateUsermethodandgavethreerolestothisuserintheadmindatabase.

Insteps4and6,wecreatedthereadandread/writeusersinthetestdatabaseusingthe

addUsermethodforversion2.4andthecreateUsermethodforversion2.6.TheJSONdocumentforthecreationofauserinversion2.4isidenticaltotheuserJSONdocumentinversion2.6,exceptforacoupleofdifferences.First,thereisnocustomDatafieldandsecond,therolesarraycontainsstringvaluesonlyfortheuserroles.

TheJSONdocumentfortheuserinversion2.4hasthefollowingformat:

{

'user':<username>,

'pwd':<password>,

'roles':[<stringvaluesforrolesoftheuser>]

}

WeshutdowntheMongoDBserveraftertheadminreadandread-writeusercreation,andrestartitwiththe--authoption.

Onstartingtheserveragain,weconnecttoitfromtheshell,whichisinstep9,butunauthenticated.Here,wetrytoexecuteafindqueryonacollectioninthetestdatabase;thisfails,asweareunauthenticated.Thisshowsthattheservernowrequiresappropriatecredentialstoexecuteoperationsonit.Insteps10to11,weloginusingread_userandtrytoexecuteafindoperationfirst,whichsucceeds,andthenaninsertoperation,whichdoesn’t,astheuserhasreadprivilegesonly.Thewaytoauthenticateauserisbyinvokingdb.auth(<username>,<password>)fromtheshell,anddb.logout()willlogoutthecurrentloggedinuser.

Insteps13to15,wedemonstratethatwecanperforminsertoperationsusingwrite_user,butadminoperationssuchasdb.serverStatus()cannotbeexecutedastheseoperationsexecuteadminCommandontheserver.Thismeansthatanonadminuserisnotpermittedtoinvoketheseoperations.Similarly,whenwechangethedatabasetoadmin,thewrite_user,whichisfromthetestdatabase,isnotpermittedtoperformanyoperationssuchasgettingalistofcollectionsoranyoperationtoqueryacollectionintheadmindatabase.

Insteps16to19,welogintotheshellusingtheadminusertotheadmindatabase.Previously,weloggedintothedatabaseusingtheauthmethod;inthiscase,weusedthe-uand-poptionstoprovidetheusernameandthepassword.Wealsoprovidedthenameofthedatabasetoconnectto,whichisadmininthiscase.Here,weareabletoviewthecollectionsontheadmindatabaseandalsoexecuteadminoperationssuchasgettingtheserverstatus.Inversion2.6,executingthedb.serverStatuscallispossible,astheuserisgiventheclusterAdminrole.

Instep18,weareabletoswitchtoanyotherdatabaseandexecuteread/writeoperations.Thisisaspecialprivilegefortheusersoftheadmindatabase,whichnootheruserhas.Thisispossiblebecausewecreatedtheuserinversion2.4usingtheversion2.2styleofusercreation,db.addUser(<username>,<password>).Theadminusercreatedinversion2.6isnotabletoquerythetestdatabaseasitwouldneedappropriatereadandread/writeprivilegesontherespectivedatabasestoperformtheseoperations.

Onefinalthingtonote;apartfromwritingtoacollection,auserwithwriteprivilegescanalsocreateindexesonthecollectioninwhichhehaswriteaccess.

There’smore…Inthisrecipe,wesawhowwecancreatedifferentusersandwhatpermissionstheyhave,restrictingsomesetsofoperations.Inthenextrecipe,wewillseehowwecanhaveauthenticationdoneattheprocesslevel.Thatis,howoneMongoinstancecanauthenticateitselfforbeingaddedtoareplicaset.

Seealsohttp://docs.mongodb.org/manual/reference/built-in-roles/togetdetailsofvariousin-builtroleshttp://docs.mongodb.org/manual/core/authorization/#user-defined-rolestolearnmoreaboutdefiningcustomuserroles

UnderstandinginterprocesssecurityinMongoDBInthepreviousrecipe,wesawhowauthenticationcanbeenforcedforausertobeloggedinbeforeallowinganyoperationsonMongo.Inthisrecipe,wewilllookatinterprocesssecurity.Bytheterminterprocesssecurity,wedon’tmeanencryptingthecommunicationbutonlyensuringthatthenode,whichisaddedtoareplicaset,isauthenticatedbeforebeingaddedtothereplicaset.

GettingreadyInthisrecipe,wewillbestartingmultipleMongoinstancesaspartofareplicaset.Thus,youmighthavetorefertotheStartingmultipleinstancesaspartofareplicasetrecipeinChapter1,InstallingandStartingtheMongoDBServer,ifyouarenotawareofhowtostartareplicaset.Apartfromthat,inthisrecipe,allwewillbelookingatishowtogenerateakeyfiletobeused,andthebehaviorwhenanunauthenticatednodeisaddedtothereplicaset.

Howtodoit…Tosettheground,wewillbestartingthreeinstances,eachlisteningtoports27000,27001,and27002respectively.Thefirsttwowillbestartedbyprovidingthemwithapathtothekeyfilewhilethethirdwillnotreceivethis.Later,wewilltryaddingthesethreeinstancestothesamereplicaset.Letustakealookatthestepsindetail:

1. Letusgeneratethekeyfilefirst.Thereisnothingspectacularaboutgeneratingthekeyfile.Thisisassimpleashavingafilewith6to1024charactersfromthebase64characterset.OntheLinuxfilesystem,youmaychoosetogeneratepseudorandombytesusingopenssl,andencodethemtobase64.Thefollowingcommandwillgenerate500randombytes,andthesebyteswillthenbebase64encodedandwrittentokeyfile:

$opensslrand–base64500>keyfile

2. OnaUnixfilesystem,thekeyfileshouldnothavepermissionsforworldandgroup,andthus,afteritiscreated,weshouldexecutethefollowingcommand:

$chmod400keyfile

3. Notgivingwritepermissiontothecreatorensuresthatwedon’toverwritethecontentsaccidentally.OnaWindowsplatform,however,openssldoesn’tcomeoutoftheboxandthus,youhavetodownloadit.Thearchiveisextractedandthebinfolderisaddedtotheoperatingsystem’spathvariable.ForWindows,wecandownloadopensslfromhttp://gnuwin32.sourceforge.net/packages/openssl.htm.

4. Youmayevenchoosenottogeneratethekeyfileusingtheapproachmentionedearlier(thatis,usingopenssl)andcantakeaneasywayoutbyjusttypingplaintextinthekeyfilefromanytexteditorofyourchoice.However,notethatthecharacters\r,\n,andspacesarestrippedoffbyMongoandtheremainingtextisconsideredasthekey.Forexample,wemaycreateafilewiththefollowingcontentaddedtothekeyfile.Again,thefilewillbenamedkeyfile:

somecontentaddedtothekeyfilefromtheeditorwithoutspaces

Usinganyapproachmentionedearlier,wewouldnowhaveakeyfileinplacethatwillbeusedforthenextstepsoftherecipe

5. WewillnowsecuretheMongoprocessesbystartingtheMongoinstanceasfollows.IwillbestartingtheMongoinstancesonWindows;mykeyfileIDnamedkeyfileisplacedonc:\MongoDB,andthedatapathsarec:\MongoDB\data\c1,c:\MongoDB\data\c2,andc:\MongoDB\data\c3respectively,forthethreeinstances.

6. Startthefirstinstancelisteningtoport27000asfollows:

C:\>mongod--dbpathc:\MongoDB\data\c1--port27000--auth--keyFile

c:\MongoDB\keyfile--replSetsecureSet--smallfiles--oplogSize100

7. Similarly,startthesecondserverlisteningtoport27001asfollows:

C:\>mongod--dbpathc:\MongoDB\data\c2--port27001--auth--keyFile

c:\MongoDB\keyfile--replSetsecureSet--smallfiles--oplogSize100

8. Thethirdinstancewillbestarted,butwithoutthe--authand--keyFileoptions,listeningtoport27002asfollows:

C:\>mongod--dbpathc:\MongoDB\data\c3--port27002--replSet

secureSet--smallfiles--oplogSize100

9. WethenstartaMongoshellandconnectittoport27000,whichisthefirstinstancestarted.FromtheMongoshell,wetypethefollowingcommand:

>rs.initiate()

10. Inafewseconds,thereplicasetwillbeinitiatedwithjustoneinstanceinit.Wewillnowtrytoaddtwonewinstancestothisreplicaset.First,theonelisteningonport27001,asfollows(youwillneedtoaddtheappropriatehostname;Amol-PCisthehostnameinmycase):

>rs.add({_id:1,host:'Amol-PC:27001'})

11. Wewillthenexecutethefollowingcommandtoconfirmthestatusofthenewlyaddedinstance,byexecutingrs.status().Itshouldsooncomeupasasecondary.

12. Wewillnowfinallytryandaddaninstancethatwasstartedwithoutthe--authand--keyFileoptions,asfollows:

>rs.add({_id:2,host:'Amol-PC:27002'})

13. Thisshouldaddtheinstancetothereplicaset,butexecutingrs.status()willshowthestatusoftheinstanceasUNKNOWN.Theserverlogsfortheinstancerunningon27002shouldshowsomeauthenticationerrorsaswell.

14. Wewillfinallyhavetorestartthisinstance.However,thistimeweprovidethe--authand--keyFileoptionsasfollows:

C:\>mongod--dbpathc:\MongoDB\data\c3--port27002--replSet

secureSet--smallfiles--oplogSize100--auth--keyFile

c:\MongoDB\keyfile

15. Oncetheserverisstarted,connecttoitfromtheshellagainandtypeinrs.status().Inafewmoments,itshouldcomeupasasecondaryinstance.

There’smore…Inthisrecipe,weexploredinterprocesssecuritytopreventunauthenticatednodesfrombeingaddedtothemongoreplicaset.Westillhaven’tencryptedthedatathatisbeingsentoverthewiretoensureit’sdeliveredsecurely.InAppendix,ConceptsforReference,wewillseehowtobuildtheMongoDBserverfromthesourceandhowtoenableencryptionofthecontentsoverthewire.

ModifyingcollectionbehaviorusingthecollModcommandThisisacommandthatwillbeexecutedtochangethebehaviorofacollectioninMongo.Itcanbethoughtofasacollection-modifyingoperation(itisnotmentionedanywhereofficiallythough).

Forapartofthisrecipe,knowingaboutTTLindexesisrequired.

GettingreadyInthisrecipe,wewillbeexecutingthecollModoperationonacollection.Weneedtostartastandaloneserverlisteningtoanyportforclientconnections;inthiscase,wewillsticktothedefault27017.Ifyouarenotawareofhowtostartastandaloneserver,refertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer.Wealsoneedtostartashellthatwillbeusedforthisadministration.ItishighlyrecommendedyoutakealookattheExpiringdocumentsafterafixedintervalusingtheTTLindexandExpiringdocumentsatagiventimeusingtheTTLindexrecipesinChapter2,Command-lineOperationsandIndexes,ifyouarenotawareofthem.

Howtodoit…ThecollModoperationcanbeusedtodoafewthings:

1. Letuschangethespaceallocationonthediskforthenewdocumentbeingadded.AcollectionneedstoexisttoexecutethecollModcommand.Youcantrytoexecutethiscommandagainstanyexistingcollection.Inourcase,IamassumingwehaveacollectioninplacecalledpowerOfTwoCol,whichwascreatedusingthefollowingcommandfromtheMongoshell:

>db.createCollection('powerOfTwoCol')

2. Oncethecollectionisinplace/created,executethefollowingcommand:

>db.runCommand({collMod:'powerOfTwoCol',usePowerOf2Sizes:1})

3. LetusnowchangethesettingsoftheTTLindex.AssumingwehaveacollectionwithaTTLindex,aswesawinChapter2,Command-lineOperationsandIndexes,wecandothatbyexecutingthefollowingcommand:

>db.ttlTest.getIndexes()

4. Tochangetheexpirytimeto800msfrom300ms,executethefollowingcommand:

>db.runCommand({collMod:'ttlTest',index:{keyPattern:

{createDate:1},expireAfterSeconds:800}})

Howitworks…ThecollModcommandalwayshasthe{collMod:<nameofthecollection>,<collmodoperation>}format.Therearetwopossibleoperationscurrentlysupportedthatwewillsee.Wewillbreakourexplanationintotwoparts.

First,wewillseewhathappensbysettingusePowerOf2Sizes.Ifacollectionisheavyonupdatesandthedocumentsgrowinsize,itwillbemovedonthediskwhenitcannolongergrowwhereitisplaced.Thiscausesaholetobeleftonthediskspaceforthecollectionattheplacewherethedocumentoriginallywas.Mongousestheseholestoaccommodatenewdocumentswhereverpossible.However,byusingtheusePowerOf2Sizessetting,Mongoallocatesdiskspaceinnumbersbythepoweroftwo(32,64,128,256,…),withtheminimumvaluebeing32.Thissettingdoesuseafewmoreextraspacesascomparedtoanormaldocumentwithoutthissetting,asthediskspaceusedisroundedalwaysbythepoweroftwo.However,inthelongterm,whenthedocumentsgetupdatedfrequentlyandgrowinsize,thediskusageisbetter,soistheperformance,asdocumentmovementisreduced.Thus,ifyouforeseethispatternofdocumentsgrowinginsizewithtime,settingthisoptionmightbeagoodidea.However,forpatternswheredocumentsarejustinsertedandrarelyupdated,wearebetteroffwiththedefaultsettings(tillversion2.4).Also,ifthecollectionalreadyhasdatawhenthisoptionisset,thesubsequentallocationforthenewdocumentswouldbebythepoweroftwo,withoutaffectingtheexistingdocuments.

FromVersion2.6ofMongoDB,theusePowerOf2Sizesstrategyisthedefaultoptionforallcollectionsandthus,usePowerOf2Sizes:falseistheonlysensibleoptiontouseinthecollModoperation.Whenstartingtheserver,anewserverstartupparameternewCollectionUsePowerOf2Sizesisavailableanddefaultstothevaluetrue.ThisoptioncanbeusedtodisabletheusePowerOf2Sizessettingbyprovidingthevaluefalsetoit.Settingthisvaluetofalsewillensurethatthesizeallocatedtoanewdocumentwillusethestrategythatisfollowedtillversion2.4bydefault,whichprovidesspacethatisneededbythedocumenttimesthepaddingfactor.

ThesecondoperationbyusingcollModistochangetheTTLindex.IfaTTLindexhasalreadybeencreatedandthetimetoliveneedstobechangedaftercreation,weusethecollModcommand.Thisoperation-specificfieldisasfollows:

{index:{keyPattern:<thefieldonwhichtheindexwasoriginallycreated>,

expireAfterSeconds:<newtimetobeusedforTTLoftheindex>}}

ThekeyPatternisthefieldonwhichtheTTLindexiscreated,andexpireAfterSecondswillcontainthenewtimetobechangedto.Onsuccessfulexecution,weshouldseethefollowingoutputintheshell:

{"expireAfterSeconds_old":300,"expireAfterSeconds_new":800,"ok":1

}

SettingupMongoDBasaWindowsServiceWindowsServicesarelong-runningapplicationsthatruninthebackground,justlikedaemonthreads.Databasesaregoodcandidatesforsuchservices,wherebytheystartandstopwhenthehostmachinesstartandstop(youmay,however,choosetomanuallystart/stopaservice).Manydatabasevendorsdoprovideafeaturetostartthedatabaseasaservice,wheninstalledontheserver.MongoDBalsoletsyoudothat,andthatiswhatwewillseeinthisrecipe.

GettingreadyRefertotheSinglenodeinstallationofMongoDBwithoptionsfromtheconfigfilerecipeinChapter1,InstallingandStartingtheMongoDBServer,togetinformationonhowtostarttheMongoDBserverusinganexternalconfigurationfile.Asinthiscase,Mongoisrunasaservice,itcannotbeprovidedwithcommand-linearguments,andconfiguringitfromaconfigurationfileistheonlyalternative.RefertotheprerequisitesoftheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer.Thisisallwewillneedforthisrecipe.

Howtodoit…Let’stakealookatthestepsindetail:

1. Wewillfirstcreateaconfigfilewiththreeconfigurationvalues,namely,theport,dbpath,andlogfilepath.Wenamethefilemongo.confandkeepitinc:\conf\mongo.confwiththefollowingthreeentriesinit(youmaychooseanypathfortheconfigfilelocation,database,andlogs):

port=27000

dbpath=c:\data\mongo\db

logpath=c:\logs\mongo.log

2. ExecutethefollowingstepsfromtheWindowsterminal,whichyoumayneedtoexecuteasanadministrator.InWindows7,executethefollowingsteps:

1. PresstheWindowskeyonyourkeyboard.2. IntheSearchprogramsandfilesspace,typecmd.3. Intheprograms,thecommandpromptprogramwillbeseen.Right-clickonit

andselectRunasadministrator.

3. Intheshell,executethefollowingcommand:

C:\>mongod--configc:\conf\mongo.conf–install

Thelogprintedoutontheconsoleshouldconfirmthattheserviceisinstalledproperly

4. Theservicecanbestartedfromtheconsoleasfollows:

C:\>netstartMongoDB

5. Theservicecanbestoppedasfollows:

C:\>netstopMongoDB

6. Typeservices.mscintheRunwindow(Windowsbutton+R).Intheopenedmanagementconsole,searchfortheMongoDBservice.Weshouldseeitasfollows:

7. Theserviceisautomatic,thatis,itwillbestartedwhentheoperatingsystemstarts.Itcanbechangedtomanualbyright-clickingontheserviceandclickingonproperties.

8. Toremoveaservice,weneedtoexecutethefollowingcommandfromthecommandprompt:

C:\>mongod--remove

9. Therearemoreoptionsavailablethatcanbeusedtoconfigurethenameoftheservice,displayname,description,andtheuseraccountthatisusedtoruntheservice.Thesecanbeprovidedascommand-linearguments.Executethefollowingcommandtoseethepossibleoptions,andtakealookattheWindowsServiceControlManageroptions:

C:\>mongod--help

ConfiguringareplicasetWehavehadagooddiscussiononwhatareplicasetisandhowtostartasimplereplicaset,intheStartingmultipleinstancesaspartofareplicasetrecipeinChapter1,InstallingandStartingtheMongoDBServer.IntheUnderstandinginterprocesssecurityinMongoDBrecipe,wesawhowtostartareplicasetwithinterprocessauthentication.Tobehonest,thatisprettymuchwhatwedowhilesettingupastandardreplicaset.However,thereareafewconfigurationsthatonemustknow;onemustalsobeawareofhowtheyaffectthereplicaset’sbehavior.Notethatwearestillnotdiscussingtagawarereplicationinthisrecipe;itwillbetakenuplaterinthischapterasaseparaterecipeBuildingtaggedreplicasets.

GettingreadyRefertotherecipeStartingmultipleinstancesaspartofareplicasetinChapter1,InstallingandStartingtheMongoDBServer,fortheprerequisitesandtoknowaboutthereplicasetbasics.Goaheadandsetupasimplethree-nodereplicasetonyourcomputer,asmentionedintherecipe.

Beforewegoaheadwiththeconfigurations,wewillseewhatelectionsareinareplicasetandhowtheyworkfromahighlevel.Itisgoodtoknowaboutelectionsbecausesomeoftheconfigurationoptionsaffectthevotingprocessintheelections.

ElectionsinareplicasetAMongoreplicasethasoneprimaryinstanceandmultiplesecondaryinstances.Allwriteshappenonlythroughtheprimaryinstanceandarereplicatedtothesecondaryinstances.Readoperationscanhappenfromsecondaryinstances,dependingonthereadpreference.RefertotheReadpreferenceforqueryingsectioninAppendix,ConceptsforReference,toknowwhatreadpreferenceis.However,iftheprimarygoesdownorisnotreachableforsomereason,thereplicasetbecomesunavailableforwrites.AMongoreplicasethasafeaturetoautomaticallyfailovertoasecondary,bypromotingittoaprimaryandmakingthesetavailabletoclientsforbothreadandwriteoperations.Thereplicasetremainsunavailableforthatbriefmomenttillanewprimarycomesup.

Allthissoundsgood,butthequestionis,whodecideswhatthenewprimaryinstancewillbe?Theprocessofchoosinganewprimaryhappensthroughanelection.Wheneveranysecondarydetectsthatitcannotreachouttoaprimary,itasksallthereplicasetnodesintheinstancetoelectthemselvesasthenewprimary.

Allothernodesinthereplicasetthatreceivethisrequestfortheelectionoftheprimarywillperformcertainchecksbeforetheyvoteayestothesecondaryrequestinganelection.Let’stakealookatthesteps:

1. Theywillfirstcheckwhethertheexistingprimaryisreachable.Thisisnecessarybecausethesecondaryrequestingthere-electionisnotabletoreachtheprimary,possiblybecauseofanetworkpartition,inwhichcaseitshouldnotbeallowedtobecomeaprimary.Insuchacase,theinstancereceivingtherequestwillvoteano.

2. Secondly,theinstancewillcheckthestateofreplicatingitselfwiththesecondaryrequestingtheelection.Ifitfindsthattherequestingsecondaryisbehinditselfinthereplicateddata,itwouldvoteano.

3. Finally,theprimaryisnotreachable,butsomeinstancewithhigherprioritythanthesecondaryrequestingthere-electionisreachablefromit.Thisagainispossibleifthesecondaryrequestingthere-electioncan’treachouttothesecondarywithhigherpriority,possiblyduetoanetworkpartition.Inthisscenario,theinstancereceivingtherequestforelectionwillvoteano.

Theprecedingchecksareprettymuchwhatwillbehappening(notnecessarilyintheordermentionedhere)duringthere-election.Ifthesecheckspass,theinstancevotesayes.

Theelectionisvoidifevenasingleinstancevotesno.However,ifnoneoftheinstanceshavevotedno,thenthesecondarythatrequeststheelectionwillbecomeanewprimaryifitreceivesayesfromthemajorityofinstances.Iftheelectionbecomesvoid,therewillbeare-electionwiththesamesecondaryoranyotherinstancerequestinganelectionwiththeprecedingmentionedprocess,tillanewprimaryiselected.

Nowthatwehaveanideaaboutelectionsinareplicasetandtheterminologies,letuslookatsomereplicasetconfigurations.Afewoftheseoptionsarerelatedtovotes,andwestartbylookingattheseoptionsfirst.

BasicconfigurationforareplicasetFromChapter1,InstallingandStartingtheMongoDBServer,whenwesetupareplicaset,wehaveaconfigurationsimilartothefollowingone.Thebasicreplicasetconfigurationforathree-membersetisasfollows:

{

"_id":"replSet",

"members":[

{

"_id":0,

"host":"Amol-PC:27000"

},

{

"_id":1,

"host":"Amol-PC:27001"

},

{

"_id":2,

"host":"Amol-PC:27002"

}

]

}

Wewillnotberepeatingtheentireconfigurationinthestepsinthefollowingsections.Alltheflagswementionwillbeaddedtothedocumentofaparticularmemberinthemembersarray.Forexample,intheprecedingexample,ifanodewith_idas2istobemadeanarbiter,wewillhavethefollowingconfigurationforitintheconfigurationdocumentshownearlier:

{

"_id":2,

"host":"Amol-PC:27002"

"arbiterOnly":true

}

Generally,thestepstoreconfigureareplicasetthathasalreadybeensetupareasfollows:

1. Assigntheconfigurationdocumenttoavariable.Ifthereplicasetisalreadyconfigured,itcanbeobtainedusingthers.conf()callfromtheshellasfollows:

>varconf=rs.conf()

2. Themembersfieldinthedocumentisanarrayofdocumentsforeachindividual

memberofareplicaset.Toaddanewpropertytoaparticularmember,weneedtoexecutethefollowingcommand.Forinstance,ifwewanttoaddthevoteskeyandsetitsvalueto2forthethirdmemberofthereplicaset(index2inthearray),weexecutethefollowingcommand:

>conf.members[2].votes=2

3. JustchangingtheJSONdocumentwon’tchangethereplicaset.Weneedtoreconfigureitasfollowsifthereplicasetisalreadyinplace:

>rs.reconfig(conf)

4. Iftheconfigurationisdoneforthefirsttime,wewillcallthefollowingcommand:

>rs.initiate(conf)

Forallthestepsgiveninthenextsection,youneedtofollowtheprecedingstepstoreconfigureorinitiatethereplicaset,unlesssomeotherstepsarementionedexplicitly.

Howtodoit…Inthisrecipe,wewilllookatsomeofthepossibleconfigurationsthatcanbeusedinareplicaset.Theexplanationherewillbeminimalwithalltheexplanationsdoneasusualinthenextsection.

1. Thefirstconfigurationisanarbiteroptionthatisusedtoconfigureareplicasetmemberasamemberthatholdsnodatabutonlyhasrightstovote.Thefollowingkeyneedstobeaddedtotheconfigurationofthememberwhowillbemadeanarbiter:

{_id:...,'arbiterOnly':true}

2. Onethingtorememberregardingthisconfigurationisthatonceareplicasetisinitiated,noexistingmembercanbechangedtoanarbiterfromanonarbiternodeandviceversa.However,wecanaddanarbitertoanexistingreplicasetusingthehelperfunctionrs.addArb(<hostname>:<port>).Forexample,toaddanarbiterlisteningtoport27004toanexistingreplicaset,thefollowingcommandwasexecutedonmymachine:

>rs.addArb('Amol-PC:27004')

Whentheserverstartstolistentoport27004,andrs.status()isexecutedfromtheMongoshell,weseethatstateandstrStateforthismemberare7andARBITERrespectively.

3. Thenextoption,votes,affectsthenumberofvotesamembergetsintheelection.Bydefault,allmembersgetonevoteeach.Thisoptioncanbeusedtochangethenumberofvotesaparticularmembergets.Itcanbesetasfollows:

{_id:...,'votes':<numberofvotes>}

Thevotesofexistingmembersofareplicasetcanbechangedandthereplicasetcanbereconfiguredusingrs.reconfig().

Thoughtheoptionvotesisavailable,whichcanpotentiallychangethenumberofvotestoformamajority,itusuallydoesn’taddmuchvalueandisnotarecommendedoptiontouseinproduction.

4. Thenextreplicasetconfigurationoptioniscalledpriority.Itdeterminestheeligibilityofareplicasetmembertobecomeaprimary(ornottobecomeaprimary).Theoptionissetasfollows:

{_id:...,'priority':<prioritynumber>}

5. Ahighernumberindicatesmorelikelihoodofbecomingaprimary.Theprimarywillalwaysbetheonewiththehighestpriorityamongthemembersaliveinareplicaset.Settingthisoptioninanalreadyconfiguredreplicasetwilltriggeranelection.

6. Settingthepriorityoptionto0willensurethatamemberwillneverbecomeaprimary.

7. Thenextoptionwelookatishidden.Settingthevalueofthisoptiontotrueensures

thatthereplicasetmemberishidden.Theoptionissetasfollows:

{_id:...,'hidden':<true/false>}

Onethingtokeepinmindisthat,whenareplicasetmemberishidden,itsprioritytooshouldbemade0toensureitdoesn’tbecomeprimary.Thoughthisseemsredundant,asofthecurrentversion,thevalueorpriorityneedstobesetexplicitly.

8. Whenaprogramminglanguageclientconnectstoareplicaset,itwillnotbeabletodiscoverhiddenmembers.However,afterexecutingrs.status()fromtheshell,themember’sstatuswouldbevisible.

9. ThenextoptionwewilllookatistheslaveDelayoption.Thisoptionisusedtosetthelagintimefortheslavefromtheprimaryofthereplicaset.Theoptionissetasfollows:

{_id:...,'slaveDelay':<numberofsecondstolag>}

10. Likethehiddenmember,slavedelayedmemberstooshouldhavethepriorityoptionsetto0toensuretheydon’teverbecomeprimary.Thisneedstobesetexplicitly.

11. ThefinalconfigurationoptionwewillbelookingatisbuildIndexes.Thisvalueifnotspecified.Bydefault,thevalueistrue,whichindicatesthatifanindexiscreatedontheprimary,itneedstobereplicatedonthesecondarytoo.Theoptionissetasfollows:

{_id:...,'buildIndexes':<true/false>}

12. IfthevalueofbuildIndexesissettofalse,thepriorityissetto0toensuretheydon’teverbecomeprimary.Thisneedstobesetexplicitly.Also,thisoptioncannotbesetafterthereplicasetisinitiated.Justlikeanarbiternode,thisneedstobesetwhenthereplicasetisbeingcreatedorwhenanewmembernodeisbeingaddedtothereplicaset.

Howitworks…Inthissection,wewillexplainandunderstandthesignificanceofdifferenttypesofmembersandtheconfigurationoptionswesawintheprevioussection.

AreplicasetmemberasanarbiterTheEnglishmeaningoftheword”arbiter”isajudgewhoresolvesadispute.Inthecaseofreplicasets,thearbiternodeispresentjusttovoteinthecaseofelectionsandnottoreplicateanydata.Thisis,infact,aprettycommonscenarioduetothefactthatthataMongoreplicasetneedstohaveatleastthreeinstances(andpreferablyanoddnumberofinstances,threeormore).Alotofapplicationsdonotneedtomaintainthreecopiesofdataandarehappywithjusttwoinstances,oneprimaryandasecondarywiththedata.

Considerthescenariowhereonlytwoinstancesarepresentinthereplicaset.Whentheprimarygoesdown,thesecondaryinstancecannotformapropermajoritybecauseitonlyhas50percentofthevotes(itsownvotes)andthus,itcannotbecomeaprimary.Ifamajorityofthesecondaryinstancesgodown,thentheprimaryinstancestepsdownfromtheprimaryandbecomesasecondary,thusmakingthereplicasetunavailableforwrites.Thus,atwo-nodereplicasetisuseless,asitdoesn’tstayavailableevenwhenanyoftheinstancesgodown.Itdefeatsthepurposeofsettingupareplicasetandthus,aminimumofthreeinstancesareneededinareplicaset.

Arbiterscomeinhandyinsuchscenarios.Wesetupareplicasetinstancewiththreeinstances,withonlytwohavingdataandoneactingasanarbiter.Weneednotmaintainthreecopiesofdataatthesametime;weeliminatetheproblemweface,bysettingupatwo-instancereplicaset.

PriorityofreplicasetmembersThisisanoptionwhoseuseisenforcedbyotheroptionsaswell,thoughitcanbeusedonitsowninsomecases.Theoptionsthatenforceitsusagearehidden,slaveDelay,andbuildIndexes,wherewedon’twantthememberwithoneofthesethreeoptionstoeverbemadeprimary.Wewilllookattheseoptionssoon.

Somemorepossibleusecases,whereweneverwantareplicasettobecomeaprimary,areasfollows:

Whenthehardwareconfigurationofamemberisnotabletodealwiththewriteandreadrequests,shoulditbecomeaprimary;andtheonlyreasonitisbeingputinthereisforreplicatingthedata.Wehaveamultidatacentersetup,whereonereplicasetinstanceispresentinanotherdatacenterforthesakeofgeographicallydistributingthedatafordisasterrecoverypurposes.Ideally,thenetworklatencybetweentheapplicationserverhostingtheapplicationandthedatabaseshouldbeminimalforoptimumperformance.Thiscanbeachievedifboththeservers(theapplicationserveranddatabaseserver)areinthesamedatacenter.Notchangingthepriorityofthereplicasetinstanceinanotherdatacentermakesitequallyeligibleforbeingchosenasaprimary,thuscompromisingtheapplication’sperformanceiftheserverfromanotherdatacentergetschosenasthe

primary.Insuchscenarios,wecansettheprioritytobe0fortheserverintheseconddatacenter,andamanualcutoverwillbeneededbytheadministratortofailovertoanotherdatacenter,shouldanemergencyarise.

Inboththesescenarios,wecanalsohavetherespectivemembershiddensothattheapplicationclientdoesn’thaveaviewofthesemembersinthefirstplace.

Justaswesetthepriorityto0tonotallowonetobetheprimary,wecanalsobebiasedtowardsonememberbeingtheprimary,wheneveritisavailable,bysettingitsprioritytoavaluegreaterthanone,becausethedefaultvalueofthepriorityfieldis1.

Supposewehaveascenariowhere,forbudgetreasons,wehaveoneofthemembersstoringdataonSSDsandtheremainingdataonspinningdisks.WewillideallywantthememberwithSSDstobetheprimary,whenevertheprimaryserverisupandrunning.Itisonlywhenitisnotavailablethatwewillwantanothermembertobecomeaprimary.Insuchscenarios,wecansetthepriorityofthememberrunningonSSDtoavaluegreaterthan1.Thevaluedoesn’treallymatteraslongasitisgreaterthantherest;thatis,settingitto1.5or2makesnodifferenceaslongasthepriorityoftheothermembersisless.

Hidden,votes,slavedelayed,andbuildindexconfigurationsThetermhiddenforareplicasetnodeisforanapplicationclientthatisconnectedtothereplicasetandnotforanadministrator.Foranadministrator,itisequallyimportantforthehiddenmemberstobemonitoredandthus,theirstateisseeninthers.status()response.Hiddenmembersparticipateinelectionstoo,justlikeallothermembers.

Thoughvotesisanoptionthatisnotarecommendedsolutiontoaproblem,thereisaninterestingbehaviorthatneedstobementioned.Supposeyouhaveathree-memberreplicaset.Witheachinstanceofthereplicasethavingonevotebydefault,wehaveatotalofthreevotesinthereplicaset.Forareplicasettoallowwrites,amajorityofvotingmembersshouldbeup.However,thecalculationofamajoritydoesn’thappenusingthenumberofmembersupbutbythetotalnumberofvotes.Letusseehow.

Bydefault,withonevoteeach,ifoneofthemembersisdown,wehavetwooutofatotalofthreevotesavailable,andthus,thereplicasetcontinuestooperate.However,ifwehaveonememberwiththenumberofvotessetto2,wenowhaveatotaloffourvotes(1+1+2)inthereplicaset.Ifthismembergoesdown,eventhoughitissecondary,theprimarywillautomaticallystepdown,andthereplicasetwillbeleftwithnoprimary,thusnotallowingwrites.Thishappensbecausetwooutoffourpossiblevotesarenowgoneandwenolongerhaveamajorityofthevotesavailable.Ifthismemberwithtwovotesisaprimary,thenagainnomajoritycanbeformedastherearejustamaximumoftwovotesoutoffouravailable,andaprimarywon’tbeelected.Thusingeneral,asaruleofthumb,ifyouaretemptedtousethisvotesconfigurationoptionforyourusecase,thinkagain,asyoumayverywelluseotheroptionssuchaspriorityandarbiterOnlytoaddresstheseusecases.

FromVersion2.6ofMongoDB,thevotesoptionisdeprecated,andthefollowingmessagegetsprintedinthelogs:

[rsMgr]WARNING:Havingmorethan1voteonasinglereplicasetmemberis

[rsMgr]deprecated,asitcausesissueswithmajoritywriteconcern.For

[rsMgr]moreinformation,seehttp://dochub.mongodb.org/core/replica-set-

votes-deprecated

Thus,itisrecommendednottousethisoptionandpreferanalternativeconfigurationoption;insomefutureversionofMongoDB,itmightnotevenbesupported.

FortheslaveDelayoption,themostcommonusecaseistoensurethatthedatainamemberataparticularpointoftimelagsbehindtheprimarybytheprovidednumberofseconds.Itcanberestoredifsomeunforeseenerrorhappens,say,ahumanerroneouslyupdatingsomedata.Remember,thelongerthetimedelay,thelongerthetimewegettorecover,butatthecostofpossiblystaledata.

Finally,we’llseethebuildIndexesoption.Thisisusefulincaseswherewehaveareplicasetmemberwithnonproductionstandardhardwareandthecostofmaintainingtheindexesisnotworthit.Youmaychoosetosetthisoptionformemberswherenoqueriesareexecutedonthem.Obviously,ifyousetthisoption,theycanneverbecomeprimarymembersandthus,thepriorityoptionisenforcedtobesetto0.

There’smore…Youcanachievesomeinterestingthingsusingtagsinreplicasets.ThiswillbediscussedinalaterrecipeafterwelearnabouttagsintheBuildingtaggedreplicasetsrecipe.

SteppingdownasaprimaryinstancefromthereplicasetTherearetimeswhen,formaintenanceactivityduringbusinesshours,weneedtotakeaserveroutfromthereplicaset,performthemaintenance,andputitbackinthereplicaset.Iftheservertobeworkeduponistheprimary,wesomehowneedtostepdownfromtheprimarymemberposition,conductare-election,andensurethatitdoesn’tgetre-electedforaminimumgiventimeframe.Aftertheserverbecomesasecondaryoncethestepdownoperationissuccessful,wecantakeitoutofthereplicaset,performthemaintenanceactivity,andputitbackinthereplicaset.

GettingreadyRefertotheStartingmultipleinstancesaspartofareplicasetrecipeinChapter1,InstallingandStartingtheMongoDBServer,fortheprerequisitesandtoknowaboutreplicasetbasics.Setupasimplethree-nodereplicasetonyourcomputerasmentionedintherecipe.

Howtodoit…Assumingthatatthispointoftimewehaveareplicasetupandrunning,performthefollowingsteps:

1. Executethefollowingcommandfromtheshellconnectedtooneofthereplicasetmembersandseewhichinstanceiscurrentlytheprimary:

>rs.status()

2. ConnecttothatprimaryinstancefromtheMongoshellandexecutethefollowingcommandontheshell:

>rs.stepDown()

3. Theshellshouldreconnectagain,andyoushouldseethattheinstanceconnectedto,whichwasinitiallyaprimaryinstance,nowbecomessecondary.Executethefollowingcommandfromtheshellsothatanewprimaryisnowre-elected:

>rs.status()

4. Youmaynowconnecttotheprimary,modifythereplicasetconfiguration,andgoaheadwiththeadministrationontheservers.

Howitworks…Thestepswesawintheprevioussectionareprettysimple,butthereareacoupleofinterestingthingsthatwewillsee.

Thers.stepDown()methoddidnothaveanyparameter.Thefunctioncaninfacttakeanumericvalue,thenumberofsecondsforwhichtheinstancesteppeddownwon’tparticipateintheelectionsandwon’tbecomeaprimary;thedefaultvalueforthisis60seconds.

Anotherinterestingthingtotryoutis,whatiftheinstancethatwasaskedtostepdownhasahigherprioritythanotherinstances?Well,itturnsoutthattheprioritydoesn’tmatterwhenyoustepdown.Theinstancesteppeddownwillnotbecomeprimary,nomatterwhat,fortheprovidednumberofseconds.However,ifthepriorityissetfortheinstancesteppeddown,anditishigherthanothers,thenafterthetimegiventostepdownelapses,anelectionwillhappen,andtheinstancewiththehigherprioritywillbecomeprimaryagain.

ExploringthelocaldatabaseofareplicasetInthisrecipe,wewillexplorethelocaldatabasefromareplicaset’sperspective.Thelocaldatabasemaycontaincollectionsthatarenotspecifictoreplicasets,butwewillfocusonlyonthereplica-set-specificcollectionsandtrytotakealookatwhat’sinthemandwhattheymean.

GettingreadyRefertotheStartingmultipleinstancesaspartofareplicasetrecipeinChapter1,InstallingandStartingtheMongoDBServer,fortheprerequisitesandtoknowaboutreplicasetbasics.Goaheadandsetupasimplethree-nodereplicasetonyourcomputer,asmentionedintherecipe.

Howtodoit…1. Withthereplicasetupandrunning,weneedtoopenashellconnectedtotheprimary.

Youmayrandomlyconnecttoanyonemember,executers.status(),andthendeterminetheprimary.

2. Withtheshellopened,firstswitchtothelocaldatabaseandthenviewthecollectionsinthelocaldatabaseasfollows:

>uselocal

switchedtodblocal

>showcollections

3. Youshouldfindacollectioncalledme.Queryingthiscollectionshouldshowusadocumentthatcontainsthehostnameoftheservertowhichwearecurrentlyconnected:

>db.me.findOne()

Therewillbetwofields,hostnameand_id.Takenoteofthe_idfield;itisimportant.

4. Wewillnowquerytheslavescollectionasfollows:

>db.slaves.find().pretty()

5. Takeanoteofthefieldspresentinthesedocuments.6. Thenextcollectiontolookatisreplset.minvalid.Youwillhavetoconnecttoa

secondarymemberfromtheshelltoexecutethefollowingquery.Switchtothelocaldatabasefirstasfollows:

>uselocal

switchedtodblocal

>db.replset.minvalid.find()

Thiscollectionjustcontainsasingledocumentwithakeytsandavalue,whichisthetimestampforthetimethesecondaryweareconnectedtoissynchronized.Notedownthistime.

7. Fromtheshellintheprimary,insertadocumentinanycollection.Wewillusethedatabasetest.Executethefollowingcommandsfromtheshelloftheprimarymember:

>usetest

switchedtodbtest

>db.replTest.insert({i:1})

8. Querythesecondaryagainasfollows:

>db.replset.minvalid.find()

Weseethatthetimeagainstthetsfieldhasnowincrementedcorrespondingtothetimeatwhichthisreplicationhappenedfromprimarytosecondary.Withaslavedelayednode,youwillseethistimegettingupdatedonlyafterthedelayperiodhas

elapsed.

9. Finally,wewillseethesystem.replsetcollection.Thiscollectioniswherethereplicasetconfigurationisstored.Executethefollowingcommand:

>db.system.replset.find().pretty()

Actually,whenweexecuters.conf(),thefollowingquerygetsexecuted:

>db.getSisterDB("local").system.replset.findOne()

Howitworks…Thelocaldatabaseisaspecialdatabasethatisusedtoholdthereplicationandinstance-specificdetailsinit.Thisisanonreplicateddatabase.Trycreatingacollectionofyourowninthelocaldatabaseandinsertsomedatainit;itwillnotbereplicatedtothesecondarynodes.

ThisdatabasegivesusaviewofthedatastoredbyMongoforinternaluse.However,asanadministrator,itisgoodtoknowaboutthesecollectionsandthetypeofdatainthem.

Mostofthecollectionsareprettystraightforward.Wewilltakeacloserlookattheslavescollection.Let’stakealookatthefollowingexample:

{

"_id":ObjectId("52f138169da4944dff694e26"),

"config":{

"_id":1,

"host":"Amol-PC:27001"

},

"ns":"local.oplog.rs",

"syncedTo":Timestamp(1391928970,1)

}

Thiscollectioncontainsthedocumentforallthesecondarymembersthathavesynchedfromit.The_idfieldhereisnotarandomlychosenID,buthasthesamevalueasthe_idfieldofthedocumentinthemecollectionoftherespectivesecondarymembernodes.Fromtheshellofthesecondary,executethedb.me.findOne()queryinthelocaldatabaseandweshouldseethatthe_idfieldthereshouldmatchthe_idfieldofthedocumentpresentintheslavescollection.

Theconfigdocumentweseegivesthehostnameofthesecondaryinstancethatwearereferringto.Notethattheportandotherconfigurationoptionsofthereplicasetmemberarenotpresentinthisdocument.Finally,thesyncedTotimetellsuswhattimearesecondaryinstancesaresynceduptowiththeprimary.Wesawthereplset.minvalidcollectiononthesecondary,whichtellsthetimetowhichitissyncedwiththeprimary.ThisvalueinsyncedTointheprimarywillbethesameasinreplset.minvalidintherespectivesecondary.

SeealsoTheUnderstandingandanalyzingoplogsrecipe

UnderstandingandanalyzingoplogsOplogisaspecialcollectionandformsthebackboneoftheMongoDBreplication.Whenanywriteoperationorconfigurationchangesaredoneonthereplicaset’sprimary,theyarewrittentotheoplogontheprimary.Allthesecondarymembersthentailthiscollectiontogetthechangestobereplicated.TailingissynonymouswiththetailcommandinUnixandcanonlybedoneonaspecialtypeofcollectioncalledcappedcollections.Cappedcollectionsarefixedsizecollectionsthatmaintaintheinsertionorderjustlikeaqueue.Whenthecollection’sallocatedspacebecomesfull,theoldestdataisoverwritten.Ifyouarenotawareofcappedcollectionsandwhattailablecursorsare,refertotheCreatingandtailingcappedcollectioncursorsinMongoDBrecipeinChapter5,AdvancedOperations,formoredetails.

Oplogisacappedcollectionpresentinthenonreplicateddatabasecalledlocal.Inthepreviousrecipe,wesawwhatalocaldatabaseisandwhatcollectionsarepresentinit.Oplogissomethingwedidn’tdiscussinthepreviousrecipe,asitdemandsalotmoreexplanationandadedicatedrecipeisneededtodoitjustice.

GettingreadyRefertotheStartingmultipleinstancesaspartofareplicasetrecipeinChapter1,InstallingandStartingtheMongoDBServer,fortheprerequisitesandtoknowaboutthereplicasetbasics.Goaheadandsetupasimplethree-nodereplicasetonyourcomputerasmentionedintherecipe.Openashellandconnecttotheprimarymemberofthereplicaset.YouwillneedtostarttheMongoshellandconnecttotheprimaryinstance.

Howtodoit…1. Executethefollowingcommandsafterconnectingtoaprimaryfromtheshelltoget

thetimestampofthelastoperationpresentinoplog.Weareinterestedinlookingattheoperationsafterthistime.

>usetest

>local=db.getSisterDB('local')

>varcutoff=local.oplog.rs.find().sort({ts:-1}).limit(1).next().ts

2. Executethefollowingcommandfromtheshell.Keeptheoutputintheshellorcopyitsomewhere.Wewillanalyzeitlater.

>local.system.namespaces.findOne({name:'local.oplog.rs'})

3. Insert10documentsasfollows:

>for(i=0;i<10;i++)db.oplogTest.insert({'i':i})

4. Executethefollowingupdateoperationtosetastringvalueforalldocumentswiththevalueofigreaterthan5,whichare6,7,8,and9inourcase.Itisamultiupdateoperation:

>db.oplogTest.update({i:{$gt:5}},{$set:{val:'str'}},false,true)

5. Nowcreatetheindexasfollows:

>db.oplogTest.ensureIndex({i:1},{background:1})

6. Executethefollowingqueryonoplogasfollows:

>local.oplog.rs.find({ts:{$gt:cutoff}}).pretty()

Howitworks…Forthoseawareofmessaginganditsterminologies,oplogcanbelookedatasatopicinthemessagingworldwithoneproducer,whichistheprimaryinstance,andmultipleconsumers,whicharethesecondaryinstances.Theprimaryinstancewritestoanoplogallthecontentsthatneedtobereplicated.Thus,anycreate,update,anddeleteoperations,aswellasanyreconfigurationsonthereplicasetswillbewrittentotheoplog;andthesecondaryinstanceswilltail(continuouslyreadthecontentsoftheoplogbeingaddedtoit,whichissimilartoatailcommandwithan-foptioninUnix)thecollectiontogetdocumentswrittenbytheprimary.IfthesecondaryhasaslaveDelayconfigured,itwillnotreaddocumentsformorethanthemaximumtimeminustheslaveDelaytimefromtheoplog.

Westartedbysavinganinstanceofthelocaldatabaseinthevariablecalledlocalandidentifiedacutofftimethatwewillusetoqueryalltheoperationswewillperforminthisrecipefromtheoplog.

Executingaqueryonthesystem.namespacescollectioninthelocaldatabaseshowsusthatthecollectionisacappedcollectionwithafixedsize.Forperformancereasons,cappedcollectionsareallocatedcontinuousspaceonthefilesystemandthisspaceispreallocated.ThesizeallocatedbytheserverisdependentontheOSandCPUarchitecture.Whilestartingtheserver,theoplogSizeoptioncanbeprovidedtomentionthesizeoftheoplog.Thedefaultsaregenerallygoodenoughformostcases;however,fordevelopmentpurposes,onemaychoosetooverridethisvaluewithasmallervalue.Oplogsarecappedcollectionsthatneedtobepreallocatedaspaceonthedisk.Thispreallocationnotonlytakestimewhenthereplicasetisfirstinitialized,butalsotakesupafixedamountofdiskspace.Fordevelopmentpurposes,wegenerallystartmultipleMongoDBprocessesaspartofthesamereplicasetonthesamemachineandwantthemtobeupandrunningasquicklyaspossiblewithminimalresourceusage.Also,havingtheentireoploginmemorybecomespossibleiftheoplogsizeissmall.Forallthesereasons,itisadvisabletostartlocalinstancesfordevelopmentpurposeswithasmalloplogsize.

Weperformedsomeoperations,suchasinsert10documentsandupdatefourdocuments,usingamultiupdateoperation,andcreatedanindex.Ifwequerytheoplogforentriesafterthecutoffwecomputedearlier,wesee10documentsforeachinsertinit.Thedocumentlooksasfollows:

{

"ts":Timestamp(1392402144,1),

"h":NumberLong("-4661965417977826137"),

"v":2,

"op":"i",

"ns":"test.oplogTest",

"o":{

"_id":ObjectId("52fe5edfd473d2f623718f51"),

"i":0

}

}

Asseeninthepreviousexample,wefirstlookatthethreefields,namelyop,ns,ando.Thesefieldsstandfortheoperation,thefullyqualifiednameofthecollectionintowhichthedataisbeinginserted,andtheactualobjecttobeinserted.Theoperationistandsfortheinsertoperation.Notethatthevalueofo,whichisthedocumenttobeinserted,containsthe_idfieldthatgotgeneratedontheprimary.Weshouldsee10suchdocuments,oneforeachinsert.Whatisinterestingistoseewhathappensonamultiupdateoperation.Theprimaryputsfourdocuments,oneforeachofthemaffectedbytheupdates.Inthiscase,theopvalueisu,fortheupdate,andthequeryusedtomatchthedocumentisnotthesameaswegaveintheupdatefunction;rather,itisaquerythatuniquelyfindsadocumentbasedonthe_idfield.Asthereisanindexalreadyinplaceforthe_idfield(createdautomaticallyforeachcollection),thisoperationtofindthedocumenttobeupdatedisnotexpensive.Thevalueoftheofieldisthesameasthedocumentwepassedtotheupdatefunctionfromtheshell.Thesampledocumentintheoplogfortheupdateisasfollows:

{

"ts":Timestamp(1392402620,1),

"h":NumberLong("-7543933489976433166"),

"v":2,

"op":"u",

"ns":"test.oplogTest",

"o2":{

"_id":ObjectId("52fe5edfd473d2f623718f57")

},

"o":{

"$set":{

"val":"str"

}

}

}

Theupdateintheoplogisthesameastheoneweprovided,becausethe$setoperationisidempotent,whichmeansyoumayapplyanoperationsafelyanynumberoftimes.

However,anupdateusingthe$incoperatorisnotidempotent.Letusexecutethefollowingupdatequery:

>db.oplogTest.update({i:9},{$inc:{i:1}})

Inthiscase,theoplogwillhavethefollowingoutputasthevalueofo:

"o":{

"$set":{

"i":10

}

}

ThisnonidempotentoperationisputintooplogbyMongosmartly,asanidempotentoperationwiththevalueofisettoavaluethatisexpectedtobeaftertheincrementoperationonce.Thus,itissafetoreplayanoploganynumberoftimeswithoutcorruptingthedata.

Finally,wecanseethattheindexcreationprocessisputintheoplogasaninsertoperationinthesystem.indexescollection.However,thereissomethingtorememberduringindexcreationtillVersion2.4ofMongoDB.Anindexcreation,whetherforegroundorbackgroundontheprimary,isalwayscreatedintheforegroundonasecondaryandthus,forthatperiod,replicationwillnothappenonthatsecondaryinstance.Forlargecollections,indexcreationcantakehoursandthus,thesizeoftheoplogisveryimportanttoletthesecondarycatchupfromwhereithasn’treplicatedsincetheindexcreationstarted.However,sinceversion2.6,indexcreationinitiatedinthebackgroundontheprimarywillalsobebuiltinthebackgroundonsecondaryinstances.

Formoredetailsontheindexcreationonreplicasets,visithttp://docs.mongodb.org/master/tutorial/build-indexes-on-replica-sets/.

BuildingtaggedreplicasetsIntheStartingmultipleinstancesaspartofareplicasetrecipeinChapter1,InstallingandStartingtheMongoDBServer,wesawhowtosetupasimplereplicasetandwhatthepurposeofareplicasetis.WealsohaveagooddealofexplanationinAppendix,ConceptsforReference,onwhatwriteconcernisandwhyitisused.Whatwesawaboutwriteconcernsisthattheyofferaminimumlevelguaranteeforacertainwriteoperation.However,withtheconceptoftagsandwriteconcerns,wecandefineavarietyofrulesandconditionsthatmustbesatisfiedbeforeawriteoperationisdeemedsuccessfulandaresponseissenttotheuser.

Considersomecommonusecases:

Anapplicationwantsawriteoperationtobepropagatedtoatleastoneserverineachofitsdatacenters.Thisensuresthat,intheeventofadatacentershutdown,otherdatacenterswillhavethedatathatwaswrittenbytheapplication.Iftherearen’tmultipledatacenters,atleastonememberofareplicasetiskeptonadifferentrack.Forinstance,iftherack’spowersupplygoesdown,thereplicasetwillstillbeavailable(notnecessarilyforwrites)asatleastonememberisrunningonadifferentrack.Insuchscenarios,wewouldwantthewritetobepropagatedtoatleasttworacksbeforerespondingtotheclientwithasuccessfulwrite.Itispossiblethatareportingapplicationqueriesagroupofsecondaryinstancesofareplicasettogeneratesomereportsregularly(suchasecondarymightbeconfiguredtoneverbecomeprimary).Aftereachwrite,wewanttoensurethatthewriteoperationisreplicatedtoatleastonereportingreplicamember,beforeacknowledgingthewriteassuccessful.

Theprecedingusecasesareafewofthecommonusecasesthatariseandarenotaddressedusingsimplewriteconcernsthatwehaveseenearlier.Weneedadifferentmechanismtocatertotheserequirements;replicasetswithtagsarewhatweneed.

Obviouslythenextquestionis,Whatexactlyaretags?Letustakeanexampleofablog.Variouspostsinthebloghavedifferenttagsattachedtothem.Thesetagsallowustoeasilysearch,group,andrelatepoststogether.Tagsareuser-definedtextswithsomemeaningattachedtoit.Ifwedrawananalogybetweenablogpostandthereplicasetmembers,justasweattachtagstoapost,wecanattachtagstoeachreplicasetmember.Forexample,inamulti-datacenterscenariowithtworeplicasetmembersindatacenter1(dc1)andonememberindatacenter2(dc2),wecanhavethefollowingtagsassignedtothemembers.Thenameofthekeyandthevalueassignedtothetagarearbitrary,andtheyarechosenduringthedesigningoftheapplication.

Youmayevenchoosetoassignanytags,forexample,totheadministratorwhosetuptheserver,ifyoureallyfinditusefultoaddressyourusecase.

Replicasetmember Tag

Replicasetmember1 {'datacentre':'dc1','rack':'rack-dc1-1'}

Replicasetmember2 {'datacentre':'dc1','rack':'rack-dc1-2'}

Replicasetmember3 {'datacentre':'dc2','rack':'rack-dc2-2'}

Thisisgoodenoughtolaythefoundationofwhatreplicasettagsare.Inthisrecipe,wewillseehowtoassigntagstoreplicasetmembersand,moreimportantly,howtomakeuseofthemtoaddresssomeofthesampleusecaseswesawearlier.

GettingreadyRefertotheStartingmultipleinstancesaspartofareplicasetrecipeinChapter1,InstallingandStartingtheMongoDBServer,fortheprerequisitesandtoknowaboutreplicasetbasics.Goaheadandsetupasimplethree-nodereplicasetonyourcomputer,asmentionedintherecipe.Openashellandconnecttotheprimarymemberofthereplicaset.

Ifyouneedtoknowaboutwriteconcerns,refertotheoverviewofwriteconcernsAppendix,ConceptsforReference.

Forthepurposeofinsertingintothedatabase,wewillusePython,asitgivesusaninteractiveinterfacesuchastheMongoshell.RefertotheInstallingPyMongorecipeinChapter3,ProgrammingLanguageDrivers,forstepsonhowtoinstallPyMongo.TheMongoshellwouldhavebeenthemostidealcandidateforthedemonstrationoftheinsertoperations,buttherearecertainlimitationsaroundtheusageoftheshellwithourcustomwriteconcern.Technically,anyprogramminglanguagewiththewriteconcernsmentionedintherecipeforinsertoperationswillworkfine.

Howtodoit…1. Withthereplicasetstarted,wewilladdtagstoitandreconfigureusingthefollowing

commandsthatareexecutedfromtheMongoshell:

>varconf=rs.conf()

>conf[0].members.tags={'datacentre':'dc1','rack':'rack-dc1-1'}

>conf[1].members.tags={'datacentre':'dc1','rack':'rack-dc1-2'}

>conf[2].members.priority=0

>conf[2].members.tags={'datacentre':'dc2','rack':'rack-dc2-1'}

2. Withthereplicasettagsset(notethatwehavenotyetreconfiguredthereplicaset),weneedtodefinesomecustomwriteconcerns.First,wedefineonethatwillensurethatthedatagetsreplicatedatleasttooneserverineachdatacenter.ExecutethefollowingcommandsintheMongoshellagain:

>conf.settings={'getLastErrorModes':{'MultiDC':{datacentre:2}}}

>rs.reconfig(conf)

3. StartthePythonshellandexecutethefollowingcommands:

>>>importpymongo

>>>client=

pymongo.MongoReplicaSetClient('localhost:27000,localhost:27001',

replicaSet='replSetTest')

>>>db=client.test

4. Wewillnowexecutethefollowinginsertquery:

>>>db.multiDCTest.insert({'i':1},w='MultiDC',wtimeout=5000)

5. Theprecedinginsertquerygoesthroughsuccessfully,andObjectIdwillbeprintedout.YoumayquerythecollectiontoconfirmfromeithertheMongoshellorthePythonshell.

6. Asourprimaryisoneoftheserversindatacenter1,wewillnowstoptheserverlisteningtoport27002,whichistheonewithpriority0andtaggedtobeinadifferentdatacenter.

7. Oncetheserverisstopped(youmayconfirmusingthers.status()helperfunctionfromtheMongoshell),executethefollowinginsertqueryagain;thisinsertshouldthrowanerrorfortimeout:

>>>db.multiDCTest.insert({'i':2},w='MultiDC',wtimeout=5000)

8. RestartthestoppedMongoDBserver.9. Similarly,wecanachieverackawarenessbyensuringthatthewritepropagatesat

leasttworacks(inanydatacenter)bydefininganewconfigurationfromtheMongoshellasfollows

{'MultiRack':{rack:2}}

10. Thesettingsvalueoftheconfobjectwillthenbeasfollows.Onceset,reconfigurethereplicasetagainusingrs.reconfig(conf)fromtheMongoshellasfollows:

{

'getLastErrorModes':{

'MultiDC':{datacentre:2},

'MultiRack':{rack:2}

}

}

WesawWriteConcernusedwithreplicasettagstoachievefunctionalitysuchasdatacenterandrackawareness.Letusseehowwecanusereplicasettagswithreadoperations.

11. Wewillseehowtomakeuseofreplicasettagswithreadpreference.Letusreconfigurethesetbyaddingonemoretagtomarkasecondarymemberthatwillbeusedtoexecutesomehourlystatsreporting.

12. ExecutethefollowingstepstoreconfigurethesetfromtheMongoshell:

>varconf=rs.conf()

>conf.members[2].tags.type='reports'

>rs.reconfig(conf)

13. Thiswillconfigurethesamememberwithpriority0and1inadifferentdatacenterwithanadditionaltagcalledtypewiththevaluereports.

14. WenowgobacktothePythonshellandexecutethefollowingcommands:

>>>curs=

db.multiDCTest.find(read_preference=pymongo.ReadPreference.SECONDARY,

tag_sets=[{'type':'reports'}])

>>>curs.next()

15. Theprecedingexecutionshouldshowusonedocumentfromthecollection(aswehadinserteddatainthistestcollectionintheprevioussteps).

16. Stoptheinstancethatwetaggedforreporting,thatis,theserverlisteningtoconnectionsonport27002,andexecutethefollowingcommandonthePythonshellagain:

>>>curs=

db.multiDCTest.find(read_preference=pymongo.ReadPreference.SECONDARY,

tag_sets=[{'type':'reports'}])

>>>curs.next()

Thistimearound,theexecutionshouldfailandstatethatnosecondarywasfoundwiththerequiredtagsets.

Howitworks…Inthisrecipe,wedidalotofoperationsontaggedreplicasetsandsawhowtheycanaffectwriteoperationsusingWriteConcernandreadoperationsusingReadPreference.Letuslookattheminsomedetailnow.

WriteConcernintaggedreplicasetsWesetupareplicasetthatwasupandrunning,whichwereconfiguredtoaddtags.Wetaggedthefirsttwoserversindatacenter1andindifferentracks(withtheserversrunningandlisteningtoports27000and27001forclientconnections),andthethirdoneindatacenter2(withtheserverlisteningtoport27002forclientconnections).Wealsoensuredthatthememberindatacenter2doesn’tbecomeaprimarybysettingitspriorityto0.

Ourfirstobjectiveistoensurethatwriteoperationstothereplicasetgetreplicatedtoatleastonememberinthetwodatacenters.Toensurethis,wedefineawriteconcernasfollows:

{'MultiDC':{datacentre:2}}

Here,wefirstdefinethenameofthewriteconcernasMultiDC.Thevalue,whichisaJSONobject,hasonekeywiththenamedatacenter,whichisthesameasthekeyusedforthetagweattachedtothereplicaset,andthevalueisthenumber2,whichwillbelookedatasthenumberofdistinctvaluesofthegiventagthatshouldacknowledgethewritebeforeitisdeemedsuccessful.

Forinstance,inourcase,whenthewritecomestoserver1indatacenter1,thenumberofdistinctvaluesofthedatacentretagis1.Ifthewriteoperationgetsreplicatedtothesecondserver,thenumberstillstays1,asthevalueofthedatacentretagisthesameasthefirstmember.Itisonlywhenthethirdserveracknowledgesthewriteoperationthatthewritesatisfiesthedefinedconditionofreplicatingthewritetotwodistinctvaluesofthedatacentretaginthereplicaset.Notethatthevaluecanonlybeanumberandcannothavesomethingsuchas{datacentre:'dc1'}.Thisdefinitionisinvalidandanerrorwillbethrownwhilereconfiguringthereplicaset.

However,weneedtoregisterthiswriteconcernsomewherewiththeserver.ThisisdoneinthefinalstepoftheconfigurationbysettingthesettingsvalueinconfigurationJSON.ThevaluetosetisgetLastErrorModes.ThevalueofgetLastErrorModesisaJSONdocumentwithallpossiblewriteconcernsdefinedinit.Welaterdefineonemorewriteconcernforwritespropagatedtoatleasttworacks.ThisisconceptuallyinlinewiththeMultiDCwriteconcernandthus,wewillnotbediscussingitindetailhere.Aftersettingalltherequiredtagsandsettings,wereconfigurethereplicasetforthechangestotakeeffect.

Oncereconfigured,weperformsomewriteoperationsusingtheMultiDCwriteconcern.Whentwomembersintwodistinctdatacentersareavailable,thewritegoesthroughsuccessfully.However,whentheserverintheseconddatacentergoesdown,thewriteoperationtimesoutandthrowsanexceptiontotheclientinitiatingthewrite.Thisdemonstratesthatthewriteoperationwillsucceedorfailasperhowweintended.

Wejustsawhowthesecustomtagscanbeusedtoaddresssomeinterestingusecasesthatarenotsupportedbytheproductimplicitly,asfaraswriteoperationsareconcerned.Similartowriteoperations,readoperationscantakefulladvantageofthesetagstoaddresssomeusecases,suchasreadingfromafixedsetofsecondarymembersthataretaggedwithaparticularvalue.

ReadPreferenceintaggedreplicasetsWeaddedanothercustomtagannotatingamembertobeusedforreportingpurposes.Wethenfiredaqueryoperationwiththereadpreferencetoqueryasecondaryandprovidedthetagsetsthatshouldbelookedforbeforeconsideringthememberasacandidateforareadoperation.Rememberthatwhenusingaprimaryasthereadpreference,wecannotusetags,andthatisthereasonweexplicitlyspecifiedthevalueofread_preferencetoSECONDARY.

ConfiguringthedefaultshardfornonshardedcollectionsIntheStartingasimpleshardedenvironmentoftwoshardsrecipeinChapter1,InstallingandStartingtheMongoDBServer,wesetupasimpletwo-shardserver.IntheConnectingtoashardfromtheMongoshellandperformingoperationsrecipeinChapter1,InstallingandStartingtheMongoDBServer,weaddeddatatoapersoncollectionthatwassharded.However,foranycollectionthatisnotsharded,allthedocumentsendupononeshardcalledtheprimaryshard.Thissituationisacceptableforsmalldatabaseswitharelativelysmallnumberofcollections.However,if,thedatabasesizeincreasesandatthesametime,thenumberofunshardedcollectionsincreaseweendupoverloadingaparticularshard(theprimaryshardforadatabase)withalotofdatafromtheseunshardedcollections.Allqueryoperationsforsuchunshardedcollections,aswellasthoseonthecollectionswhoseparticularrangeintheshardresideonthisserverinstance,willbedirectedtothis.Insuchascenario,wecanhavetheprimaryshardofadatabasechangedtosomeotherinstancesothattheseunshardedcollectionsgetbalancedoutacrossdifferentinstances.Inthisrecipe,wewillseehowtoviewthisprimaryshardandchangeittosomeotherserverwheneverneeded.

GettingreadyRefertotheStartingasimpleshardedenvironmentoftwoshardsrecipeinChapter1,InstallingandStartingtheMongoDBServer,tosetupandstartashardedenvironment.Fromtheshell,connecttothestartedmongosprocess.Also,assumingthatthetwoshardserversarelisteningtothe27000and27001ports,connectfromtheshelltothesetwoprocesses.Sowehaveatotalofthreeshellsopened,oneconnectedtothemongosprocessandtwototheseindividualshards.

Weareusingthetestdatabaseforthisrecipe,andshardinghastobeenabledonthisdatabase.Ifit’snot,thenyouneedtoexecutethefollowingcommandsontheshellconnectedtothemongosprocess:

mongos>usetest

mongos>sh.enableSharding('test')

Howtodoit…1. Fromtheshellconnectedtothemongosprocess,executethefollowingtwo

commands:

mongos>db.testCol.insert({i:1})

mongos>sh.status()

2. Inthedatabases,lookoutforthetestdatabaseandtakenoteoftheprimary.Supposethatthefollowingisapart(showingthepartunderdatabasesonly)oftheoutputofsh.status():

databases:

{"_id":"admin","partitioned":false,"primary":"config"}

{"_id":"test","partitioned":true,"primary":"shard0000"}

3. Theseconddocumentunderthedatabasesshowsusthatthetestdatabaseisenabledforsharding(becausepartitionedistrue)andtheprimaryshardisshard0000.

4. Theprimaryshard,whichisshard0000inourcase,isthemongodprocesslisteningtoport27000.Opentheshellconnectedtothisprocessandexecutethefollowingquery:

>db.testCol.find()

5. Nowconnecttoanothermongodprocesslisteningtoport27001andexecutethefollowingqueryagain:

>db.testCol.find()

6. Notethatthedatawillbefoundonlyontheprimaryshardandnotonanyothershard.

7. ExecutethefollowingcommandfromtheMongosshell:

mongos>useadmin

mongos>db.runCommand({movePrimary:'test',to:'shard0001'})

8. ExecutethefollowingcommandagainfromtheMongoshellconnectedtothemongosprocess:

mongos>sh.status()

9. Fromtheshellconnectedtothemongosprocessesrunningonports27000and27001,executethefollowingquery:

>db.testCol.find()

Howitworks…Westartedashardedsetupandconnectedtoitfromthemongosprocess.WestartedbyinsertingadocumentinthetestColcollectionthatisnotenabledforshardinginthetestdatabase,whichisnotenabledforshardingaswell.Insuchcases,thedataliesonashardcalledtheprimaryshard.Donotmistakethisfortheprimaryofareplicaset.Thisisashard(thatitselfcanbeareplicaset),anditistheshardchosenbydefaultforalldatabasesandcollectionsforwhichshardingisnotenabled.

Whenweaddthedatatoanonshardedcollection,itisseenonlyontheshardthatisprimary.Executingsh.status()tellsustheprimaryshard.Tochangetheprimary,weneedtoexecuteacommandfromtheadmindatabasefromtheshellconnectedtothemongosprocess.Thecommandisasfollows:

db.runCommand({movePrimary:'<databasewhoseprimaryshardistobe

changed>',to:'<targetshard>'})

Oncetheprimaryshardischanged,allexistingdatainnonshardeddatabasesandcollectionsismigratedtothenewprimary,andallsubsequentwritestononshardedcollectionswillgotothisshard.

Usethiscommandwithcaution,asitwillmigratealltheunshardedcollectionstothenewprimary,whichmaytaketimeforbigcollections.

ManuallysplittingandmigratingchunksThoughMongoDBdoesagoodjobbydefaultofsplittingandmigratingchunksacrossshardstomaintainthebalance,undersomecircumstances,suchasasmallnumberofdocumentsorarelativelylargenumberofsmalldocuments,wheretheautomaticbalancerdoesn’tsplitthecollection,anadministratormightwanttosplitandmigratethechunksmanually.Inthisrecipe,wewillseehowtosplitandmigratethecollectionmanuallyacrossshards.Again,forthisrecipe,wewillsetupasimpleshardaswesawinChapter1,InstallingandStartingtheMongoDBServer.

GettingreadyRefertotheStartingasimpleshardedenvironmentoftwoshardsrecipeinChapter1,InstallingandStartingtheMongoDBServer,tosetupandstartashardedenvironment.Itispreferredtostartacleanenvironmentwithoutanydatainit.Fromtheshell,connecttothestartedmongosprocess.

Howtodoit…1. ConnecttothemongosprocessfromtheMongoshellandenableshardingonthetest

databaseandthesplitAndMoveTestcollectionasfollows:

>sh.enableSharding('test')

>sh.shardCollection('test.splitAndMoveTest',{_id:1},false)

2. Letusloadthedatainthecollectionasfollows:

>for(i=1;i<=10000;i++)db.splitAndMoveTest.insert({_id:i})

3. Oncethedataisloaded,executethefollowingcommand:

>db.splitAndMoveTest.find().explain()

Notethatthenumberofdocumentsinthetwoshardsintheplan.Thevaluetolookoutforinthetwodocumentsundertheshardskeyistheresultoftheexplainplan.Withinthesetwodocuments,thefieldtolookoutforisn.

4. Executethefollowingcommandstoseethesplitsofthecollection:

>config=db.getSisterDB('config')

>config.chunks.find({ns:'test.splitAndMoveTest'}).pretty()

5. Splitthechunkintotwoat5000,asfollows:

>sh.splitAt('test.splitAndMoveTest',{_id:5000})

6. Splittingitdoesn’tmigrateittothesecondserver.Seeexactlywhathappenswiththechunks,byexecutingthefollowingqueryagain:

>config.chunks.find({ns:'test.splitAndMoveTest'}).pretty()

7. Wewillnowmovethesecondchunktothesecondshard:

>sh.moveChunk('test.splitAndMoveTest',{_id:5001},'shard0001')

8. Executethefollowingqueryagainandconfirmthemigration:

>config.chunks.find({ns:'test.splitAndMoveTest'}).pretty()

9. Alternatively,thefollowingexplainplanwillshowasplitofabout50-50percent:

>db.splitAndMoveTest.find().explain()

Howitworks…Wesimulateasmalldataloadbyaddingmonotonicallyincreasingnumbersanddiscoverthatthenumbersarenotsplitacrosstwoshardsevenly,byviewingthequeryplan.Thisisnotaproblem,asthechunksizeneedstoreachaparticularthreshold,64MBbydefault,beforethebalancerdecidestomigratethechunksacrosstheshardstomaintainbalance.Thisisprettyperfectbecause,intherealworld,whenthedatasizegetshuge,wewillseethateventually,overaperiodoftime,theshardsarewellbalanced.

However,undersomecircumstances,whentheadministrationdecidestosplitandmigratethechunks;itispossibletodoitmanually.Thetwohelperfunctionssh.splitAtandsh.moveChunkaretheretodothiswork.Letuslookattheirsignaturesandseewhattheydo.

Thesh.splitAtfunctiontakestwoparameters.Thefirstparameteristhenamespace,whichhastheformat<database>.<collectionname>andthesecondparameteristhequerythatactsasthesplitpointtosplitthechunkintotwo,possiblytwouneven,portions,dependingonwherethegivendocumentisinthechunk.Thereisanothermethodnamedsh.splitFindthatwilltryandsplitthechunkintwoequalportions.

However,splittingdoesn’tmeanthechunkmovestoanothershard.Itjustbreaksonebigchunkintotwo,butthedatastaysonthesameshard.ItisaninexpensiveoperationthatinvolvesupdatingtheconfigDB.

Thenextoperationweexecuteistomigratethechunktoadifferentshardafterwesplititintotwo.Thesh.MoveChunkoperationisusedjusttodothat.Thisfunctiontakesthreeparameters.Thefirstoneisagainthenamespaceofthecollection,whichhastheformat<database>.<collectionname>;thesecondparameterisaqueryadocument,whosechunkwouldbemigrated;andthethirdparameteristhedestinationchunk.

Oncethemigrationisdone,thequery’splanshowsusthatthedataissplitintotwochunks.

Performingdomain-drivenshardingusingtagsTheStartingasimpleshardedenvironmentoftwoshardsandConnectingtoashardfromtheMongoshellandperformingoperationsrecipesinChapter1,InstallingandStartingtheMongoDBServer,explainedhowtostartasimpletwo-servershardandtheninsertdatainacollectionafterchoosingashardkey.Thedatathatgetsshardedismoretechnical,wherethedatachunkiskepttoamanageablesizebyMongo,bysplittingitintomultiplechunksandmigratingthechunksacrossshardstokeepthechunkdistributionevenacrossshards.However,whatifwewanttheshardingtobemoredomain-oriented?Supposewehaveadatabaseforstoringpostaladdressesandweshardbasedonpostalcodes,whereweknowthepostalcoderangeofacity.Whatwecandoistagtheshardserversaccordingtothecitynameasthetag,addtheshardrange(postalcodes),andassociatethisrangewiththetag.

Thisway,wecanstatewhichserverscancontainthepostaladdressesofwhichcities.Forinstance,weknowthatforMumbai,beingthemostpopulouscity,thenumberofaddresseswouldbehugeandthusweaddtwoshardsforMumbai.Ontheotherhand,oneshardshouldbeenoughtocopewiththevolumesofPune;sofornowwetagjustoneshard.Inthisrecipe,wewillseehowtoachievethisusecaseusingtag-awaresharding.Ifthedescriptionisconfusing,don’tworry;wewillseehowtoimplementwhatwejustdiscussed.

GettingreadyRefertotheStartingasimpleshardedenvironmentoftwoshardsrecipesinChapter1,InstallingandStartingtheMongoDBServer,forhowtostartasimpleshard.However,forthisrecipe,wewilladdanadditionalshard.So,wewillnowstartthreeMongoDBserverslisteningtoports27000,27001,and27002.Again,itisrecommendedtostartoffwithacleandatabase.Forthepurposeofthisrecipe,wewillbeusingtheuserAddresscollectiontostorethedata.

Howtodoit…1. Assumingthatwehavethreeshardsupandrunning;letusexecutethefollowing

commands:

mongos>sh.addShardTag('shard0000','Mumbai')

mongos>sh.addShardTag('shard0001','Mumbai')

mongos>sh.addShardTag('shard0002','Pune')

2. Withthetagsdefined,letusdefinetherangeofthepincodesthatwillmaptoatag,asfollows:

mongos>sh.addTagRange('test.userAddress',{pincode:400001},

{pincode:400999},'Mumbai')

mongos>sh.addTagRange('test.userAddress',{pincode:411001},

{pincode:411999},'Pune')

3. EnableshardingforthetestdatabaseanduserAddresscollectionasfollows:

mongos>sh.enableSharding('test')

mongos>sh.shardCollection('test.userAddress',{pincode:1})

4. InsertthefollowingdocumentsintheuserAddresscollection:

mongos>db.userAddress.insert({_id:1,name:'Varad',city:'Pune',

pincode:411001})

mongos>db.userAddress.insert({_id:2,name:'Rajesh',city:'Mumbai',

pincode:400067})

mongos>db.userAddress.insert({_id:3,name:'Ashish',city:'Mumbai',

pincode:400101})

5. Executethefollowingexplainplans:

mongos>db.userAddress.find({city:'Pune'}).explain()

mongos>db.userAddress.find({city:'Mumbai'}).explain()

Howitworks…Supposewewanttopartitiondatadrivenbydomaininashard;wecanusetag-awaresharding.Itisanexcellentmechanismthatletsustagtheshardsandthensplitthedatarangeacrossshardsidentifiedbythetags.Wedon’treallyhavetobotherabouttheactualmachinesandtheiraddresseshostingtheshard.Tagsactasagoodabstraction,intheway,wecantagashardwithmultipletagsandonetagcanbeappliedtomultipleshards.

Inourcase,wehavethreeshardsandweapplytagstoeachofthemusingthesh.addShardTagmethod.ThismethodtakestheshardID,whichwecanseeinthesh.statuscallwiththe“shards”key.Thissh.addShardTagcanbeusedtokeepaddingtagstoashard.Similarly,thereisahelpermethodsh.removeShardTagtoremoveanassignmentofthetagfromtheshard.Boththesemethodstaketwoparameters,thefirstoneistheshardIDandthesecondoneofthetagtoremove.

Oncethetaggingisdone,weassigntherangeofthevaluesoftheshardkeytothetag.Thesh.addTagRangemethodisusedtodothat.Itacceptsfourparameters;thefirstoneisthenamespace(thefullyqualifiednameofthecollection),thesecondandthirdparametersarethestartandendvaluesoftherangeforthisshardkey,andthefourthparameteristhetagnameoftheshardshostingtherangebeingadded.Forexample,thecallsh.addTagRange('test.userAddress',{pincode:400001},{pincode:400999},

'Mumbai')saysweareaddingtheshardrangefrom400001to400999forthetest.userAddresscollectionandthisrangewillbestoredintheshardstaggedasMumbai.

Oncethetaggingandaddingtagrangesaredone,weenableshardingonthedatabaseandcollection,andadddatatoitfromMumbaiandPunewiththerespectivepincodes.Wethenqueryandexplaintheplan,toseethatthedatadidindeedresideontheshardswehavetaggedforPuneandMumbaicity.Wecanalsoaddnewshardstothisshardedsetupandaccordinglytagthenewshard.Thebalancerwillthenaccordinglybalancethedatabasedonthevalueithastagged.Forinstance,iftheaddressesinPuneincrease,thusoverloadingashard,wecanaddanewshardwiththetagasPune.ThepostaladdressesforPunewillthenbeshardedacrossthesetwoserverinstancestaggedforPunecity.

ExploringtheconfigdatabaseinashardedsetupTheconfigdatabaseisthebackboneofashardedsetupinMongo.Itstoresallthemetadataoftheshardsetupandhasadedicatedmongodprocessrunningforit.Whenamongosprocessisstarted,weprovideitwiththeconfigserver’sURL.Inthisrecipe,wewilltakealookatsomecollectionsintheconfigdatabaseanddivedeepintotheircontentandsignificance.

GettingreadyWewillhaveashardedsetupforthisrecipe.RefertotheStartingasimpleshardedenvironmentoftwoshardsinChapter1,InstallingandStartingtheMongoDBServer,forhowtostartasimpleshard.Additionally,connecttothemongosprocessfromashell.

Howtodoit…1. Fromtheconsoleconnectedtothemongosprocess,switchtotheconfigdatabaseand

executeviewallcollectionsasfollows:

mongos>useconfig

mongos>showcollections

2. Fromthelistofallcollections,wewillvisitafew.Westartwiththedatabasescollection.Thiskeepsatrackofallthedatabasesonthisshard.Executethefollowingcommandfromtheshell:

mongos>db.databases.find()

Thecontentoftheresultisprettystraightforward.Thevalueofthe_idfieldisforthedatabase.Thevalueofthefieldpartitionedtellsuswhethershardingisenabledforthedatabaseornot;trueindicatesitisenabledandthefieldprimarygivestheprimaryshardwherethedataofnonshardedcollectionsresides.

3. Thenextcollectionwewillvisitiscollections.Executethefollowingcommandfromtheshell:

mongos>db.collections.find().pretty()

Thiscollection,unlikethedatabasescollectionwesawearlier,containsonlythosecollectionsforwhichwehaveenabledsharding.The_idfieldgivesthenamespaceofthecollectioninthe<database>.<collectionname>format,thekeyfieldgivestheshardkey,andtheuniquefieldindicateswhethertheshardkeyisuniqueofnot.Thesethreefieldscomeasthethreeparametersofthesh.shardCollectionfunctioninthatveryorder.

4. Next,welookatthechunkscollection.Executethefollowingcommandontheshell.Ifthedatabasewascleanwhenwestartedthisrecipe,wewon’thavealotofdatainthis:

mongos>db.chunks.find().pretty()

5. Wethenlookatthetagscollection.Executethefollowingquery:

mongos>db.tags.find().pretty()

6. Letusquerythemongoscollectionasfollows.Thisisasimplecollectiongivingthelistofallmongosinstancesconnectedtotheshard,withthedetailssuchasthehostnameandportonwhichthemongosinstanceisrunning,whichformsthe_idfield,andtheversionandfigures,suchasforhowmuchtimetheprocesshasbeenupandrunninginseconds.

mongos>db.mongos.find().pretty()

7. Finally,welookattheversioncollection.Executethefollowingquery(notethatitisnotsimilartootherqueriesweexecute):

mongos>db.getCollection('version').findOne()

Howitworks…Wesawthecollectionsanddatabasescollectionwhenwequeriedthem;theyareprettysimple.Letuslookatthecollectioncalledchunks.Thefollowingisasampledocumentfromthiscollection:

{

"_id":"test.userAddress-pincode_400001.0",

"lastmod":Timestamp(1,3),

"lastmodEpoch":ObjectId("53026514c902396300fd4812"),

"ns":"test.userAddress",

"min":{

"pincode":400001

},

"max":{

"pincode":411001

},

"shard":"shard0000"

}

Thefieldsofinterestarens(thenamespaceofthecollection),min(theminimumvaluepresentinthechunk),max(themaximumvaluepresentinthechunk),andshard(theshardonwhichthischunklies).Thevalueofthechunksizeis64MBbydefault.Thiscanbeseeninthesettingscollection.Executedb.settings.find()fromtheshellandlookatthevalueofthefieldvalue,whichisthesizeofthechunkinMB.Chunksarerestrictedtothissmallsizetoeasethemigrationprocessacrossshardsifneeded.Whenthesizeofthechunkexceedsthisthreshold,theMongoDBserverfindsasuitablepointintheexistingchunktobreakitintotwo,andaddsanewentryinthischunk’scollection.Thisoperationiscalledsplittingandisinexpensive,asthedatastayswhereitis.Itisjustlogicallysplitintomultiplechunks.ThebalanceronMongotriestokeepthechunksacrossshardsbalanced,andthemomentitseessomeimbalance,itmigratesthesechunkstoadifferentshard,whichisexpensiveandalsodependslargelyonthenetworkbandwidth.Ifweexecutesh.status(),theimplementationactuallyqueriesthecollectionswesawearlierandprintstheformattedresult.

Chapter5.AdvancedOperationsInthischapter,wewillcoverthefollowingrecipes:

AtomicfindandmodifyoperationsImplementingatomiccountersinMongoDBImplementingserver-sidescriptsCreatingandtailingcappedcollectioncursorsinMongoDBConvertinganormalcollectiontoacappedcollectionStoringbinarydatainMongoDBStoringlargedatainMongousingGridFSStoringdatatoGridFSfromaJavaclientStoringdatatoGridFSfromaPythonclientImplementingtriggersinMongoDBusingoplogExecutingflatplane(2D)geospatialqueriesinMongousinggeospatialindexesSphericalindexesandGeoJSON-compliantdatainMongoDBImplementingafull-textsearchinMongoDBIntegratingMongoDBwithElasticsearchforafull-textsearch

IntroductionInChapter2,Command-lineOperationsandIndexes,wesawhowtoperformbasicoperationsfromtheshelltoquery,update,andinsertdocuments.Wealsoexploreddifferenttypesofindexesandindexcreation.Inthischapter,wegoaheadandseesomeoftheadvancedfeaturesofMongo,suchasGridFS,geospatialindexes,andfull-textsearch.Otherrecipeswewillseeincludeanintroductiontocappedcollectionsandtheirusesandimplementingserver-sidescriptsinMongoDB.

AtomicfindandmodifyoperationsInChapter2,Command-lineOperationsandIndexes,wehadsomerecipesthatexplainedvariousCRUDoperationsthatweperforminMongoDB.Therewasoneconceptthatwedidn’tcoverthatis,atomicallyfindingandmodifyingdocuments.Modifyconsistsofbothupdateanddeleteoperations.Inthisrecipe,wewillseefindandmodifyoperationsinsomedetailand,inthenextrecipe,ImplementingatomiccountersinMongoDB,wewillputthemtouseinimplementingcounters.

GettingreadyRefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,andstartasingleinstanceofMongoDB.Thatistheonlyprerequisiteforthisrecipe.StartaMongoshellandconnecttothestartedserver.

Howtodoit…WewilltestadocumentintheatomicOperationsTestcollectionasfollows:

1. ExecutethefollowingcommandsfromtheMongoshell:

>db.atomicOperationsTest.drop()

>db.atomicOperationsTest.insert({i:1})

2. ExecutethefollowingcommandsfromtheMongoshellandobservetheoutput:

>db.atomicOperationsTest.findAndModify({

query:{i:1},

update:{$set:{text:'TestString'}},

new:false

}

)

3. Wewillexecuteanotheronethistime,butwithslightlydifferentparameters;observetheoutputthistimearound:

>db.atomicOperationsTest.findAndModify({

query:{i:1},

update:{$set:{text:'UpdatedString'}},fields:{i:1,text:1,

_id:0},

new:true

}

)

4. Wewillexecuteanotherupdatethistimethatwillupsertthedocumentasfollows:

>db.atomicOperationsTest.findAndModify({

query:{i:2},

update:{$set:{text:'TestString'}},

fields:{i:1,text:1,_id:0},

upsert:true,

new:true

}

)

5. Nowquerythecollectiononce,asfollows,andseethedocumentspresent:

>db.atomicOperationsTest.find().pretty()

6. Wewillfinallyexecutethedeleteoperationasfollows:

>db.atomicOperationsTest.findAndModify({

query:{i:2},

remove:true,

fields:{i:1,text:1,_id:0},

new:false

}

)

Howitworks…IfweperformthefindandupdateoperationsindependentlybyfirstfindingthedocumentandthenupdatingitinMongoDB,theresultsmightnotbeasexpectedastheremightbeaninterleavingupdatebetweenthefindandtheupdateoperationsthatwillchangethedocumentstate.Insomeofthespecificusecases,suchasimplementingatomiccounters,thisisnotacceptableandthus,weneedawaytoatomicallyfind,update,andreturnadocument.Thereturnedvalueiseithertheonebeforetheupdateisappliedoraftertheupdateisapplied,andthisisdecidedbytheinvokingclient.

Nowthatwehaveexecutedthestepsintheprecedingsection,letusseewhatweactuallydidandwhatallthesefieldsintheJSONdocument,whicharepassedasparameterstothefindAndModifyoperation,mean.Startingwithstep2,wegaveadocumentasaparametertothefindAndModifyfunctionthatcontainsthefollowingfields.Thefieldsquery,update,andnewareusedtospecifythequerythatwillbeusedtofindthedocument,theupdatethatwillbeappliedtoit,andaBooleanvaluethatwillbeusedtospecifywhetherthedocumentreturnedbytheoperationistheoneaftertheupdateisappliedorbeforeitwasapplied.Inthiscase,thevalueofthenewflagisfalse.Theresultingdocumentreturnedistheonebeforetheupdateisapplied.

Instep3,weactuallyaddedanewfieldtothedocument,passedasaparametercalledfields,thatisusedtoselectalimitedsetoffieldsfromtheresultingdocumentreturned.Also,thevalueofthenewfieldistrue,whichindicatesthatwewanttheupdateddocument;thatis,theoneaftertheupdateoperationisexecutedandnottheonebeforetheupdate.

Instep4,theparametercontainedanewfieldcalledupsert,whichupserted(update+insert)thedocument.Thatis,ifthedocumentwiththegivenqueryisfound,itisupdated;otherwise,anewoneiscreatedandupdated.Ifthedocumentdidn’texistandanupserthappens,havingthevalueofthenewparameterasfalsewillreturnnull.Thisisbecausetherewasnothingpresentbeforetheupdateoperationwasexecuted.

Finally,instep6,theparameterdidn’thavetheupdatefieldbuthadtheremovefieldwiththevaluetrue,indicatingthatthedocumentistoberemoved.Also,thevalueofthenewfieldwasfalse,whichmeansthatweexpectthedocumentthatgotdeleted.

SeealsoTheImplementingatomiccountersinMongoDBrecipe,toseehowtoimplementtheusecasethatisusedtodevelopanatomiccounterinMongo

ImplementingatomiccountersinMongoDBAtomiccountersareanecessityforalargenumberofusecases.Mongodoesn’thaveabuilt-infeatureforatomiccounters;nevertheless,itcanbeeasilyimplementedusingsomeofitscoolofferings.Infact,implementingitismerelyacoupleoflinesofcode.RefertothepreviousrecipetoknowwhatatomicfindandmodifyoperationsareinMongo.

GettingreadyRefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,andstartasingleinstanceofMongo.Thisistheonlyprerequisiteforthisrecipe.StartaMongoshellandconnecttothestartedserver.

Howtodoit…1. ExecutethefollowingpieceofcodefromtheMongoshell:

>functiongetNextSequence(counterId){

returndb.counters.findAndModify(

{

query:{_id:counterId},

update:{$inc:{count:1}},

upsert:true,

fields:{count:1,_id:0},

new:true

}

).count

}

2. Now,fromtheshell,invokethefollowingcommands:

>getNextSequence('PostsCounter')

>getNextSequence('PostsCounter')

>getNextSequence('ProfileCounter')

Howitworks…ThefunctionisassimpleasafindAndModifyoperationonacollectionusedtostoreallthecounters.ThecounterIDisthe_idfieldofthedocumentstored,andthevalueofthecounterisstoredinthecountfield.ThedocumentpassedtofindAndModifyacceptsthequerythatuniquelyidentifiesthedocumentstoringthecurrentcount,whichisaqueryusingthe_idfield.Theupdateoperationisan$incoperationthatwillincrementthevalueofthecountfieldbyone,butwhatifthedocumentdoesn’texist?Thiswillhappenduringthefirstinvocationofthecounter.Totakecareofthisscenario,wewillbesettingtheupsertflagtotrue,whichatomicallyeitherupdatesthedocumentfieldorcreatesone.Thevalue,thus,willalwaysstartwithone,andtherearenowaysinthisfunctionbywhichwecanhaveanyuser-definedstartnumberforthesequenceandacustom-incrementedstep.Toaddresssuchrequirements,wewillhavetospecificallyaddadocumentwiththeinitializedvaluestothecounterscollection.Finally,weareinterestedinthestateofthecounterafterthevalueisincremented.Hence,wesetthevalueofthenewfieldastrue.

Oninvokingthismethodthreetimes,aswedid,weshouldseethefollowinginthecounterscollection.Simplyexecutethefollowingquery:

>db.counters.find()

{"_id":"PostsCounter","count":2}

{"_id":"ProfileCounter","count":1}

Usingthissmallfunction,wehavenowimplementedatomiccountersinMongo.

WecanstoresuchcommoncodeontheMongoserver,whichwillbeavailableforexecutioninotherfunctions.

SeealsoTheImplementingserver-sidescriptsrecipetoseehowwecanstoreJavaScriptfunctionsontheMongoserver

Implementingserver-sidescriptsInthisrecipe,wewillseehowtowriteserver-storedJavaScriptthatissimilartostoredproceduresinrelationaldatabases.Thisisacommonusecase,whereotherpiecesofcoderequireaccesstothesecommonfunctionsandwehavetheminonecentralplace.Thefunctionfordemopurposeissimple;wewilladdtwonumbers.Therearetwopartstothisrecipe.First,we’llseehowtoloadthescriptsfromthecollectionsontheclient-sideJavaScriptshellandthen,wewillseehowtoexecutethesefunctionsontheserver.

NoteThedocumentationspecificallymentionsthatitisnotrecommendedtouseserver-sidescripts.Securityisoneconcernthoughifthedataisnotproperlyauditedand,hence,needtobecarefulaboutwhatfunctionsaredefined.SincethelaunchofMongo2.4,theserver-sideJavaScriptengineisV8,whichcanexecutemultiplethreadsinparallel,asopposedtotheenginepriortoVersion2.4ofMongo,whichexecutesonlyonethreadatatime.

GettingreadyRefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,andstartasingleinstanceofMongo.Thisistheonlyprerequisiteforthisrecipe.StartaMongoshellandconnecttothestartedserver.

Howtodoit…1. Createanewfunctioncalledaddandsaveittothedb.system.jscollectionas

follows.Thecurrentdatabaseshouldbetest:

>usetest

>db.system.js.save({_id:'add',value:function(num1,num2){return

num1+num2}})

2. Nowthatthisfunctionisdefined,loadallthefunctionsasfollows:

>db.loadServerScripts()

3. Invokeaddandseeifitworks:

>add(1,2)

4. Wewillusetheaddfunctionandexecutethisontheserversideinstead.Executethefollowingcommandsfromtheshell:

>usetest

>db.eval('returnadd(1,2)')

5. Executethefollowingcommands:

>usetest1

>db.eval('returnadd(1,2)')

Howitworks…Thesystem.jscollectionisaplainoldcollectionjustlikeanyothercollection.Weaddanewserver-sideJavaScriptusingthesavefunctioninthiscollection.Thesavefunctionisjustaconveniencefunctionthatinsertsthedocumentifitisnotpresentorupdatesanexistingone.Theobjectiveistoaddanewdocumenttothiscollection,whichyoumayaddusinginsertorupsert.

ThesecretliesintheloadServerScriptsmethod.Themethodexecutesthefollowinglineofcode:

this.system.js.find().forEach(function(u){eval(u._id+"="+u.value);});

ThisevaluatesJavaScriptusingtheevalfunction,anditassignsthefunctiondefinedinthevalueattributeofthedocumenttoavariablenamedwiththenamegiveninthe_idfieldofthedocument,foreachdocumentpresentinthesystem.jscollection.

Forexample,ifthe{_id:'add',value:function(num1,num2){returnnum1+num2}}documentispresentinthesystem.jscollection,thefunctiongiveninthevaluefieldofthedocumentwillbeassignedtothevariablenamedasaddinthecurrentshell.Theaddvalueisgiveninthe_idfieldofthedocument.

Thesescriptsdonotreallyexecuteontheserver,buttheirdefinitionisstoredontheserverinacollection.TheloadServerScriptsmethodjustinstantiatessomevariablesinthecurrentshellandmakesthosefunctionsavailableforinvocation.ItistheJavaScriptinterpreteroftheshellthatexecutesthesefunctionsandnottheserver.Thesystem.jscollectionisdefinedinthescopeofthedatabase,butonceloaded,theseareJavaScriptfunctionsdefinedintheshellandhence,thefunctionsareavailablethroughoutthescopeoftheshell,irrespectiveofthedatabasecurrentlyactive.

Asfarassecurityisconcerned,iftheshellisconnectedtotheserverwithsecurityenabled,theuserinvokingloadServerScriptsmusthaveprivilegestoreadthecollectionsinthedatabase.Formoredetailsonenablingsecurityandvariousrolesausercanhave,refertotheSettingupusersinMongoDBrecipeinChapter4,Administration.Aswesawearlier,theloadServerScriptsfunctionreadsdatafromthesystem.jscollectionandthus,iftheuserdoesn’thaveprivilegestoreadfromthecollection,thefunctioninvocationwillfail.Apartfromthat,thefunctionsexecutedfromtheshellafterbeingloadedshouldhaveappropriateprivileges.Forinstance,ifafunctioninserts/updatesinanycollection,theusershouldhavereadandwriteprivilegesonthatparticularcollectionaccessedfromthefunction.

Executingscriptsontheserverisperhapswhatonewouldexpecttobetheserver-sidescript.asagainstexecutingintheconnectedshell.Inthiscase,thefunctionsareevaluatedontheserver’sJavaScriptengineandthesecuritychecksaremorestringentaslong-runningfunctionscanholdlockshavingdetrimentaleffectsontheperformance.ThewrappertoinvoketheexecutionoftheJavaScriptcodeontheserversideisthedb.evalfunction,acceptingthecodetoevaluateontheserversidealongwiththeparameters,ifany.

Beforeevaluatingthefunction,thewriteoperationtakesagloballock.Thiscanbeskippedifthenolockparameterisused.Forinstance,theprecedingaddfunctioncanbeinvokedasfollows,insteadofcallingdb.eval,andwillachievethesameresults.Additionally,weprovidedthenolockfieldtoinstructtheservernottoacquirethegloballockbeforeevaluatingthefunction.Ifthefunctionperformsanyread/writeoperationsonthecollection,itwillacquirelocksasusual,andthisfielddoesn’taffectthisbehavior.

>db.runCommand({eval:function(num1,num2){returnnum1+num2},args:[1,

2],nolock:true})

Ifsecurityisenabledontheserver,theinvokinguserneedstohavefourroles,namely,userAdminAnyDatabase,dbAdminAnyDatabase,readWriteAnyDatabase,andclusterAdmin(ontheadmindatabase),tosuccessfullyinvokethedb.evalfunction.

Programminglanguagesdoprovideawayfortheinvocationofsuchserver-sidescriptsaswellusingtheevalfunction.Forinstance,inJavaAPI,thecom.mongodb.DBclasshastheevalmethodtoinvokeserver-sideJavaScriptcode.Suchserver-sideexecutionsarehighlyusefulwhenwewanttoavoidunnecessarynetworktrafficforthedataandgettheresulttotheclients.However,toomuchlogiconthedatabaseservercanquicklymakethingsdifficulttomaintainandcanaffecttheperformanceoftheserverbadly.

CreatingandtailingcappedcollectioncursorsinMongoDBCappedcollectionsarefixed-sizecollectionsandtheyactlikequeues.Thedocumentsaddedtoitareaddedtowardstheendofthecollection,removingtheoldestentryinthecollection,ifthespaceallocatedtothecollectionbecomesfull.Theyprovidefastaccesstothelimited-sizedcollectionsevenwithouttheuseoftheindex.Theyarenaturallysortedbytheorderoftheinsertion,andanyretrievalneededonthemorderedbytimecanberetrievedusingthe$naturalsortorder.Thefollowingdiagramgivesapictorialrepresentationofacappedcollectionwhosesizeisenoughtoholduptothreedocumentsofequalsize(whichistoosmallforanypracticaluse,butgoodforillustrationpurposes).Asweseeinthediagram,thecollectionissimilartoacircularqueue,wheretheoldestdocumentisreplacedbythenewlyaddeddocument,shouldthecollectionbecomefull:

TailablecursorsareaspecialtypeofcursorthattailsthecollectionjustasatailcommandinUnixdoes.Thesecursorsiteratethroughthecollectionlikenormalcursorsdobutadditionally,theywaitfordatatobeavailableinthecollectionifitisnotavailable.Wewillseecappedcollectionsandtailablecursorsindetailinthisrecipe.

GettingreadyRefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,andstartasingleinstanceofMongo.Thisistheonlyprerequisiteforthisrecipe.StartaMongoshellandconnecttothestartedserver.

Howtodoit…Therearetwopartstothisrecipe.Inthefirstpart,wewillbecreatingacappedcollectioncalledtestCappedandwilltryperformingsomebasicoperationsonit.Inthesecondpart,wewillbecreatingatailablecursoronthecappedcollectionwecreated.

1. First,wewilldropitifacollectionwiththetestCappednameexists,asfollows:

>db.testCapped.drop()

2. Nowcreateacappedcollectionasfollows(notethatthesizegivenhereisinbytesallocatedforthecollectionandnotthenumberofdocumentsitcontains):

>db.createCollection('testCapped',{capped:true,size:100})

3. Wewillnowinsert100documentsinthecappedcollection,asfollows:

>for(i=1;i<100;i++){

db.testCapped.insert({'i':i,val:'Testcapped'})

}

4. Nowquerythecollectionasfollows:

>db.testCapped.find()

5. Trytoremovethedatafromthecollection,asfollows:

>db.testCapped.remove()

Youshouldgetanerrorafterexecutingthepreviouscommand

6. Wewillnowcreateanddemonstrateatailablecursor.Itisrecommendedthatyoutype/copythefollowingpiecesofcodeintoatexteditorandkeepithandyforexecution.

7. Toinsertdatainacollection,wewillbeexecutingthefollowingfragmentofcode.Executethispieceofcodeintheshellasfollows(notethatthisexecutionwilltakequitesometime):

>for(i=101;i<500;i++){

sleep(1000)

db.testCapped.insert({'i':i,val:'TestCapped'})

}

8. Totailacappedcollection,weexecutethefollowingpieceofcode:

>varcursor=

db.testCapped.find().addOption(DBQuery.Option.tailable).addOption(DBQue

ry.Option.awaitData)

while(cursor.hasNext()){

varnext=cursor.next()

print('i:'+next.i+',value:'+next.val)

}

9. Openashellandconnecttotherunningmongodprocess.Thiswillbethesecondshellopenedandconnectedtotheserver.Copyandpastethecodegiveninstep8inthis

shellandexecuteit.10. Observehowtherecordsinsertedareshown,astheyareinsertedintothecapped

collection.

Howitworks…WecreatedacappedcollectionexplicitlyusingthecreateCollectionfunction.Thisistheonlywayacappedcollectioniscreated.TherearetwoparameterstothecreateCollectionfunction.Thefirstoneisthenameofthecollection.ThesecondparameterisaJSONdocumentthatcontainstwofields,cappedandsize,whichareusedtoinformwhetherthecollectioniscappedornot,andthesizeofthecollectioninbytes,respectively.Anadditionalfieldmaxcanbeprovidedtospecifythemaximumnumberofdocumentsinthecollection.Thefieldsizeisrequiredevenifthemaxfieldisspecified.Wetheninsertandquerythedocuments.Whenwetrytoremovethedocumentsfromthecollection,wewillseeanerrortotheeffectthatremovalisnotpermittedfromthecappedcollection.Itallowsthedocumentstobedeletedonlywhennewdocumentsareadded,andthereisn’tspaceavailabletoaccommodatethem.

Whatweseenextisaboutthetailablecursorwecreated.Westartedtwoshells,andoneofthemisanormalinsertionofdocumentswithanintervalof1secondbetweensubsequentinsertions.Inthesecondshell,wecreatedacursor,iteratedthroughit,andprintedthedocumentsthatwegetfromthecursorontotheshell.Theadditionaloptionsweaddedtothecursormakethedifferencethough.Therearetwooptionsadded,DBQuery.Option.tailableandDBQuery.Option.awaitData.Theformeroptionistoinstructthatthecursoristailableratherthannormal,wherethelastpositionismarkedandwecanresumewhereweleftoff.Thelatteroptionwaitsformoredataforsometimeratherthanreturningimmediatelywhennodataisavailableandwereachtheendofthecursor.TheawaitDataoptioncanbeusedwithtailablecursorsonly.ThecombinationofthesetwooptionsgivesusafeelsimilartothetailcommandintheUnixfilesystem.Foralistofdifferentavailableoptions,visithttp://docs.mongodb.org/manual/reference/method/cursor.addOption/.

SeealsoTheConvertinganormalcollectiontoacappedcollectionrecipe

ConvertinganormalcollectiontoacappedcollectionInthisrecipe,wewilldemonstratetheprocesstoconvertanormalcollectiontoacappedcollection.

GettingreadyRefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,andstartasingleinstanceofMongo.Thisistheonlyprerequisiteforthisrecipe.StartaMongoshellandconnecttothestartedserver.

Howtodoit…1. Executethefollowingcommandtoensurethatyouareinthetestdatabase:

>usetest

2. Createanormalcollectionasfollows.Wewillbeadding100documentstoit.Type/copythefollowingqueryintheMongoshellandexecuteit:

for(i=1;i<=100;i++){

db.normalCollection.insert({'i':i,val:'SomeTextContent'})

}

3. Querythecollection,asfollows,toconfirmifitcontainsthedata:

>db.normalCollection.find()

4. Nowquerythesystem.namespacescollectionasfollows,andnotetheresultdocument:

>db.system.namespaces.find({name:'test.normalCollection'})

5. Executethefollowingcommandtoconvertthecollectionintoacappedcollection:

>db.runCommand({convertToCapped:'normalCollection',size:100})

6. Querythecollectiontotakealookatthedata:

>db.normalCollection.find()

7. Querythesystem.namespacescollection,asfollows,andnotetheresultdocument:

>db.system.namespaces.find({name:'test.normalCollection'})

Howitworks…Wecreatedanormalcollectionwith100documentsandthentriedtoconvertittoacappedcollectionwithasizeof100bytes.ThecommandhasthefollowingJSONdocumentpassedtotherunCommandfunction:

{convertToCapped:<nameofnormalcollection>,size:<sizeinbytesofthe

cappedcollection>}

Thiscommandcreatesacappedcollectionwiththementionedsize,andloadsthedocumentsinthenaturalorderfromthenormalcollectiontothetargetcappedcollection.Ifthesizeofthecappedcollectionreachesthelimitmentioned,theolddocumentsareremovedintheFIFOorder,makingspacefornewdocuments.Oncethisisdone,thecreatedcappedcollectionisrenamed.Executingafindqueryonthecappedcollectionconfirmsthatnotall100documentsthatwereoriginallypresentinthenormalcollectionarepresentinthecappedcollection.Aqueryonthesystem.namespacescollection,beforeandaftertheexecutionoftheconvertToCappedcommand,showsthechangeinthecollectionattributes.Notethatthisoperationacquiresaglobalwritelock,blockingallreadandwriteoperationsinthisdatabase.Also,anyindexespresentontheoriginalcollectionarenotcreatedonthiscappedcollection-upconversion.

There’smore…OplogisanimportantcollectionusedforreplicationinMongoDBandisacappedcollection.Formoreinformationonreplicationandoplogs,refertotheUnderstandingandanalyzingoplogsrecipeinChapter4,Administration.IntheImplementingtriggersinMongoDBusingoplogrecipe,wewillusethisoplogtoimplementafeaturesimilartotheafterinsert/update/deletetriggerofarelationaldatabase.

StoringbinarydatainMongoDBSofarwehaveseenhowtostoretextvalues,dates,andnumbersfieldsinadocument.Binarycontentalsoneedstobestoredattimesinthedatabase.Considercaseswhereusersuploadtheirphotographsorscannedcopiesofdocumentsthatneedtobestoredinthedatabase.Inrelationaldatabases,theBLOBdatatypeisthemostcommonlyusedtypetoaddresstheserequirements.Mongotoosupportsbinarycontentstobestoredinadocumentinthecollection.Thecatchisthatthetotalsizeofthedocumentshouldn’texceed16MB,whichistheupperlimitofthedocumentsizeatthetimeofwritingthisbook.Inthisrecipe,wewillstoreasmallimagefileinMongo’sdocumentandalsoretrieveitlater.IfthecontentyouwishtostoreinMongoDBcollectionsisgreaterthan16MB,MongoDBoffersanout-of-the-boxsolutioncalledGridFS.WewillseehowtouseGridFSintheStoringlargedatainMongoDBusingGridFSrecipelaterinthischapter.

GettingreadyLookattheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,andstartasingleinstanceofMongoDB.Also,theprogramtowritebinarycontenttothedocumentiswritteninJava.FormoredetailsonJavadrivers,refertothefollowingrecipesinChapter3,ProgrammingLanguageDrivers:

ExecutingqueryandinsertoperationsusingaJavaclientExecutingupdateanddeleteoperationsusingaJavaclientAggregationinMongousingaJavaclientMapReduceinMongousingaJavaclient

OpenaMongoshellandconnecttothelocalMongoDBinstancelisteningtoport27017.Forthisrecipe,wewillbeusingtheprojectmongo-cookbook-bindata.Thisprojectisavailableinthesourcecodebundledownloadablefromthebook’swebsite.Thefolderneedstobeextractedonthelocalfilesystem.Openacommand-lineshellandgototherootoftheprojectextracted.Itshouldbethedirectorywherethepom.xmlfileisfound.

Howtodoit…1. Ontheoperatingsystemshellwiththepom.xmlfilepresentinthecurrentdirectoryof

themongo-cookbook-bindataproject,executethefollowingcommand:

$mvncleancompileexec:java-

Dexec.mainClass=com.packtpub.mongo.cookbook.BinaryDataTest

2. Observetheoutput;theexecutionshouldbesuccessful.3. SwitchtotheMongoshell,connectedtothelocalinstance,andexecutethefollowing

query:

>db.binaryDataTest.findOne()

4. Scrollthroughthedocumentandtakeanoteofthefieldsinthedocument.

Howitworks…Ifwescrollthroughthelargedocumentprintedout,weseethatthefieldsarefileName,size,anddata.Thefirsttwofieldsareoftypestringandnumberrespectively,whichwepopulatedondocumentcreation,andholdthenameofthefileweprovideandthesizeinbytes.ThedatafieldisafieldofBSONtype,whereweseethedataencodedinthebase64format.

Whatwedidtoinsertthisdocumentisnotmuchfromanapplication’sperspective.ThefollowinglinesofcodeshowhowwepopulatedtheDBObjectthatweaddedtothecollection:

DBObjectdoc=newBasicDBObject("_id",1);

doc.put("fileName",resourceName);

doc.put("size",imageBytes.length);

doc.put("data",imageBytes);

Aswesee,twofields,namely,fileNameandsize,areusedtostorethenameofthefileandthesizeofthefile,andareoftypestringandnumberrespectively.ThefielddataisaddedtoDBObjectasabytearray.ItgetsstoredautomaticallyastheBSONtypeBinDatainthedocument.

Whatwesawinthisrecipeisstraightforward,asthedocumentsizeislessthan16MB,whichisthemaximumdocumentsizeinMongoasofwritingthisbook.Ifthesizeofthefilesstoredexceedsthisvalue,wehavetoresorttosolutionssuchasGridFS,explainedinthenextrecipe.

StoringlargedatainMongoDBusingGridFSAdocument’ssizeinMongoDBcanbeamaximumof16MB,butdoesthatmeanwecannotstoredatathatismorethan16MBinsize?Therearecaseswhereyouprefertostorevideosandaudiofilesinadatabaseratherthaninthefilesystemforanumberofadvantages,suchas,afewofthemarestoringmetadataalongwiththem,accessingthefilefromanintermediatelocation,andreplicatingthecontentsforhighavailabilityifreplicationisenabledontheMongoDBserverinstances.GridFSisthewaytoaddresssuchusecasesinMongoDB.WewillalsoseehowGridFSmanageslargecontentthatexceeds16MBandwillanalyzethecollectionsitusesforstoringthecontentbehindthescenes.Fortestpurposes,wewillnotbeusingdataexceeding16MBbutsomethingsmallertoseeGridFSinaction.

GettingreadyRefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,andstartasingleinstanceofMongo.Thisistheonlyprerequisiteforthisrecipe.StartaMongoshellandconnecttothestartedserver.Additionally,wewillusethemongofilesutilitytostoredatainGridFSfromthecommandline.

Howtodoit…1. Downloadthecodebundleofthebookfromthebook’swebsiteandsavetheimage

filenamedglimpse_of_universe-wide.jpgfromittoyourlocaldrive(youmaychooseanyotherlargefile,asamatteroffact,andprovideanappropriatenametothefilewiththecommandsweexecute).Forthesakeoftheexample,theimageissavedintheHomedirectory.Wewillsplitourstepsintothreeparts:

2. Withtheserverupandrunning,executethefollowingcommandfromtheoperatingsystem’sshell,withthecurrentdirectorybeingtheHomedirectory.Therearetwoargumentshere.Thefirstoneisthenameofthefileonthelocalfilesystem,andthesecondoneisthenamethatwillbeattachedtotheuploadedcontentinMongoDB.

$mongofilesput-lglimpse_of_universe-wide.jpguniverse.jpg

3. Letusnowquerythecollectionstoseehowthiscontentisactuallystoredinthecollectionsbehindthescenes.Withtheshellopen,executethefollowingtwoqueries.Makesurethatinthesecondquery,youmentionnottoselectthedatafield:

>db.fs.files.findOne({filename:'universe.jpg'})

>db.fs.chunks.find({},{data:0})

4. NowthatwehaveputafiletoGridFSfromtheoperatingsystem’slocalfilesystem,wewillseehowwecangetthefiletothelocalfilesystem.Executethefollowingcommandfromtheoperatingsystemshell:

$mongofilesget-lUploadedImage.jpguniverse.jpg

5. Finally,wewilldeletethefileweuploadedasfollows.Fromtheoperatingsystemshell,executethefollowingcommand:

$mongofilesdeleteuniverse.jpg

6. Confirmthedeletionusingthefollowingqueriesagain:

>db.fs.files.findOne({filename:'universe.jpg'})

>db.fs.chunks.find({},{data:0})

Howitworks…Mongodistributioncomeswithanout-of-the-boxtoolcalledmongofilesthatletsusuploadlargecontenttotheMongoserver;thisgetsstoredusingtheGridFSspecification.GridFSisnotadifferentproductbutaspecificationthatisstandardandfollowedbydifferentdriversforMongoDBtostoredatagreaterthan16MB,whichisthemaximumdocumentsize.Itcanevenbeusedforfilesofsizelessthan16MB,aswedidinourrecipe,butthereisn’treallyagoodreasontodothat.Thereisnothingstoppingusfromimplementingourownwayofstoringtheselargefiles,butitispreferredtofollowthestandardbecausealldriverssupportit;theydotheheavyliftingofsplittingthebigfileintosmallchunksandreassemblingthechunkswhenneeded.

Wekepttheimagedownloadedfromthebook’swebsiteanduploadeditusingmongofilestoMongoDB.Thecommandtodothatisput,andthe-loptiongivesthenameofthefileonthelocaldrivethatwewanttoupload.Finally,universe.jpgisthenamebywhichwewantthefiletobestoredonGridFS.

Onsuccessfulexecution,weshouldseethefollowingoutput:

connectedto:127.0.0.1

addedfile:{_id:ObjectId('5310d531d1e91f93635588fe'),filename:

"universe.jpg

",chunkSize:262144,uploadDate:newDate(1393612082137),md5:

"d894ec31b8c5add

d0c02060971ea05ca",length:2711259}

done!

Thisgivesussomedetailsoftheupload,namely,theunique_idfortheuploadedfile,thenameofthefile,thechunksize(thesizeofeachchunkthisbigfileisbrokeninto,whichbydefaultis256KB),thedateofupload,thechecksumoftheuploadedcontent,andthetotallengthofupload.Thechecksumcanbecomputedbeforehandandthencomparedaftertheupload,tocheckwhethertheuploadedcontentwascorruptedornot.

WeexecutedthefollowingqueryfromtheMongoshellinthetestdatabase:

>db.fs.files.findOne({filename:'universe.jpg'})

Weseethattheoutputwesawfortheputcommandofmongofilesisthesameasthedocumentqueriedearlierinthefs.filescollection.ThisisthecollectionwherealltheuploadedfiledetailsareputwhensomedataisaddedtoGridFS.Therewillbeonedocumentperupload.ApplicationscanlateralsomodifythisdocumenttoaddtheirowncustommetadataalongwiththestandarddetailsaddedbyMongowhenaddingthedata.Applicationscanverywellusethiscollectiontoadddetails.Forexample,ifthedocumentisforanimageupload,wecanadddetailssuchasthenameofthephotographer,thelocationwheretheimagewastaken,whenitwastaken,andtagsfortheindividualsintheimageinthiscollection.

Thefilecontentissomethingthatcontainsthisdata.Letusexecutethefollowingquery:

>db.fs.chunks.find({},{data:0})

Wehavedeliberatelyleftoutthedatafieldfromtheresultselected.Letuslookatthestructureoftheresultdocument:

{

_id:<UniqueidentifieroftypeObjectIdrepresentingthischunk>,

file_id:<ObjectIdofthedocumentinfs.filesforthefilewhosechunk

thisdocumentrepresent>,

n:<Thechunkidentifierstartswith0,thisisusefulforknowingtheorder

ofthechunks>,

data:<BSONbinarycontentforthedatauploadedforthefile>

}

Forthefileweuploaded,wehave11chunksofmaximum256KBeach.Whenafileisbeingrequested,thefs.chunkscollectionissearchedbyfile_id,whichcomesfromthe_idfieldofthefs.filescollection,andthenfield,whichisthechunk’ssequence.Auniqueindexcreatedonthesetwofields,whenthiscollectioniscreatedforthefirsttimewhenafileisuploadedusingGridFSforfastretrievalofchunksusingthefileID,issortedbythechunk’ssequencenumber.

Similartoput,thegetoptionisusedtoretrievethefilesfromtheGridFSandputthemonalocalfilesystem.Theoption-l,whichisstillusedtoprovidethename,isthenameofthefilethatwouldbesavedonthelocalfilesystem.ThefinalparametertogetthecommandisthenameofthefileasstoredonGridFS.Thisisthevalueofthefilenamefieldinthefs.filescollection.Finally,thedeletecommandofmongofilessimplyremovestheentryofthefilefromthefs.filesandfs.chunkscollections.Thenameofthefilegivenfordeletionisagainthevaluepresentinthefilenamefieldofthefs.filescollection.

There’smore…SomeimportantusecasesofusingGridFSarewhentherearesomeuser-generatedcontentssuchaslargereportsonstaticdatathatdoesn’tchangetoooftenandisexpensivetogeneratefrequently.Insteadofrunningthemallthetime,theycanberunonceandstoreduntilachangeinstaticdataisdetected,inwhichcasethestoredreportisdeletedandexecutedagainonthenextrequestofthedata.Thefilesystemmaynotalwaysbeavailabletotheapplicationtowritethefilesto,inwhichcasethisisagoodalternative.Therearecaseswhereonemightbeinterestedinsomeintermediatechunkofthedatastored,inwhichcasethechunkcontainingtherequireddatacanbeaccessed.Yougetsomenicefeatures;forinstance,theMD5contentofthedataisstoredautomaticallyandisavailableforusebytheapplication.

NowthatwehaveseenwhatGridFSis,letusseesomescenarioswhereusingGridFSmightnotbeaverygoodidea.TheperformanceofaccessingthecontentfromMongoDBusingGridFSanddirectlyfromthefilesystemwillnotbethesame.DirectfilesystemaccesswillbefasterthanGridFS,andproofofconcept(POC)forthesystemtobedevelopedisrecommendedtomeasuretheperformance,seeifitiswithintheacceptablelimits;thetradeoffinperformancemightbeworththebenefitsweget.Also,ifyourapplicationserverisfrontedwithCDN,youmightnotactuallyneedalotofI/OforstaticdatastoredinGridFS.AsGridFSstoresthedatainmultipledocumentsincollections,atomicallyupdatingthemisnotpossible.Ifweknowthecontentislessthan16MB,whichisthecaseinalotofuser-generatedcontentorsomesmallfilesuploaded,wemayskipGridFSaltogetherandstorethecontentinonedocumentasBSONsupportsthestorageofbinarycontentinthedocument.Formoredetails,refertotheStoringbinarydatainMongoDBrecipe.

Wewillrarelybeusingthemongofilesutilitytostore,retrieve,anddeletedatafromGridFS.Thoughitmayoccasionallybeused,themajorityoftimeswewillbelookingatperformingtheseoperationsfromanapplication.Thus,inthenextcoupleofrecipes,wewillseehowtoconnecttoGridFStostore,retrieve,anddeletefilesusingJavaandPythonclients.

ThoughthishasnothingmuchtodowithMongo,OpenstackisanInfrastructureasaService(IaaS)platformandoffersavarietyofservicesforcomputing,storing,networking,andsoon.OneoftheimagestorageservicescalledGlancesupportsalotofpersistentstorestostoretheimages.OneofthesupportedstoresbyGlanceisMongoDB’sGridFS.YoucanfindmoreinformationonhowtoconfigureGlancetouseGridFSathttp://docs.openstack.org/trunk/config-reference/content/ch_configuring-openstack-image-service.html.

SeealsoTheStoringdatatoGridFSfromaJavaclientrecipeTheStoringdatatoGridFSfromaPythonclientrecipe

StoringdatatoGridFSfromaJavaclientInthepreviousrecipe,wesawhowtostoredatatoGridFSusingthecommand-lineutilitycalledmongofiles,whichcomeswithMongoDBtomanagelargedatafiles.TogetanideaofwhatGridFSisandwhatcollectionsareusedbehindthescenestostorethedata,refertothepreviousrecipe.Inthisrecipe,wewilllookatstoringdatatoGridFSusingaJavaclient.Theprogramwillbeahighlyscaleddownversionofthemongofilesutilityandfocusesonlyonhowtostore,retrieve,anddeletedataratherthantryingtoprovidealotofoptionssuchasmongofilesdo.

GettingreadyRefertotheConnectingtoasinglenodefromaJavaclientrecipefromChapter1,InstallingandStartingtheMongoDBServer,forallthenecessarysetupforthisrecipe.IfyouareinterestedinmoredetailsonJavadrivers,refertothefollowingrecipesinChapter3,ProgrammingLanguageDrivers:

ExecutingqueryandinsertoperationsusingaJavaclientExecutingupdateanddeleteoperationsusingaJavaclientAggregationinMongousingaJavaclientMapReduceinMongousingaJavaclient

OpenaMongoshellandconnecttothelocalmongodinstancelisteningtoport27017.Forthisrecipe,wewillbeusingtheprojectmongo-cookbook-gridfs.Thisprojectisavailableinthesourcecodebundledownloadablefromthebook’swebsite.Thefolderneedstobeextractedonthelocalfilesystem.Openaterminalonyouroperatingsystemandgototherootoftheprojectextracted.Itshouldbethedirectorywherethepom.xmlfileisfound.Also,savetheglimpse_of_universe-wide.jpgfileonthelocalfilesystem,justasinthepreviousrecipe.Thisfilecanbefoundinthedownloadablecodebundlefromthebook’swebsite.

Howtodoit…1. WeareassumingthatthecollectionsofGridFSarecleanandnopriordatais

uploaded.Ifthereisnothingcrucialinthedatabase,youmayexecutethefollowingqueriestoclearthecollection.Doexercisecautionbeforedroppingthecollectionsthough:

>usetest

>db.fs.chunks.drop()

>db.fs.files.drop()

2. Openanoperatingsystemshellandexecutethefollowingcommand:

$mvncleancompileexec:java-

Dexec.mainClass=com.packtpub.mongo.cookbook.GridFSTests-

Dexec.args="put~/glimpse_of_universe-wide.jpguniverse.jpg"

NoteThefileIneedtouploadwasplacedintheHomedirectory.Youmaychoosetogivethefilepathoftheimagefileaftertheputcommand.Notethatifthepathcontainsspaces,thewholepathneedstobewithinsinglequotes.

3. Iftheprecedingcommandrunssuccessfully,youshouldseethefollowingoutput:

Successfullywrittentouniverse.jpg,detailsare:

UploadIdentifier:5314c05e1c52e2f520201698

Length:2711259

MD5hash:d894ec31b8c5addd0c02060971ea05ca

ChunkSideinbytes:262144

TotalNumberOfChunks:11

4. Oncetheprecedingexecutionissuccessful,whichwecanconfirmfromtheconsoleoutput,executethefollowingqueriesfromtheMongoshell:

>db.fs.files.findOne({filename:'universe.jpg'})

>db.fs.chunks.find({},{data:0})

5. NowwewillgetthefilefromGridFStothelocalfilesystem.Executethefollowingcommandtoperformthisoperation.Ensurethatthedirectorytowhichwearewriting,thesecondparameterafterthegetoperation,iswritabletotheuser.

$mvncleancompileexec:java-

Dexec.mainClass=com.packtpub.mongo.cookbook.GridFSTests-

Dexec.args="get'~/universe.jpg'universe.jpg"

6. Confirmthatthefileispresentonthelocalfilesystematthementionedlocation.Weshouldseethefollowingoutputontheconsoleoutputtoindicateasuccessfulwriteoperation:

Connectedsuccessfully..

Successfullywritten2711259bytesto~/universe.jpg

7. Finally,wewilldeletethefilefromGridFSasfollows:

$mvncleancompileexec:java-

Dexec.mainClass=com.packtpub.mongo.cookbook.GridFSTests-

Dexec.args="deleteuniverse.jpg"

8. Onsuccessfuldeletion,weshouldseethefollowingoutputontheconsole:

Connectedsuccessfully..

Removedfilewithname'universe.jpg'fromGridFS

Howitworks…Thecom.packtpub.mongo.cookbook.GridFSTestsclassacceptsthreetypesofoperations,put,get,anddelete,touploadafiletoGridFS,getcontentsfromGridFStoalocalfilesystem,anddeletefilesfromGridFSrespectively.

Theclassacceptsuptothreeparameters.Thefirstoneistheoperationwithvalidvaluesasget,put,anddelete.Thesecondparameterisrelevantforthegetandputoperationsandisthenameofthefileonthelocalfilesystemtowritethedownloadedcontentto,orfromwhichthecontentissourcedforupload.ThethirdparameteristhenameofthefileinGridFS,whichisnotnecessarilythesameasthenameonthelocalfilesystem.However,fordelete,onlythefilenameonGridFSisneededfordeletionpurposes.

LetusseesomeimportantsnippetsofcodefromtheclassthatisspecifictoGridFS.

Openthecom.packtpub.mongo.cookbook.GridFSTestsclassinyourfavoriteIDEandlookforthehandlePut,handleGet,andhandleDeletemethods.Thesearethemethodswhereallthelogicis.First,wewillstartwiththehandlePutmethod,whichismeanttouploadthecontentsofthefilefromthelocalfilesystemtoGridFS.

Irrespectiveofwhichoperationwedo,wewillcreateaninstanceofthecom.mongodb.gridfs.GridFSclass.Inourcase,weinstantiateditasfollows:

GridFSgfs=newGridFS(client.getDB("test"));

Theconstructorofthisclasstakesthedatabaseinstanceofthecom.mongodb.DBclassinwhichwewishtocreatetheGridFStablesfs.chunksandfs.files,whichwillstoretheuploadedcontent.OncetheinstanceofGridFSiscreated,wewillinvokethecreateFilemethodonit.Thismethodacceptstwoparameters;thefirstoneisInputStream,whichsourcesthebytesofthecontenttobeuploaded,andthesecondparameteristhenameofthefilethatwillbesavedonGridFS.However,thismethoddoesn’tcreatethefileonGridFSbutreturnsaninstanceofcom.mongodb.gridfs.GridFSInputFile.Theuploadwillhappenonlywhenwecallthesavemethodinthisreturnedobject.ThereareafewoverloadedvariantsofthiscreateFilemethod.Formoredetails,refertotheJavadocofthecom.mongodb.gridfs.GridFSclass.

OurnextmethodishandleGet,whichgetsthecontentsofthefilesavedonGridFStoalocalfilesystem.Similartothecom.mongodb.DBCollectionclass,theclasscom.mongodb.gridfs.GridFShasthefindandfindOnemethodstosearch.However,insteadofacceptinganyDBObjectquery,findandfindOneinGridFSacceptthefilenameortheObjectIdvalueofthedocumenttosearchinthefs.filescollection.Similarly,thereturnvalueisnotaDBCursorbutaninstanceofcom.mongodb.gridfs.GridFSDBFile.ThisclasshasvariousmethodsthatletusgettheInputStreamofthebytesofthefilepresentonGridFS.OthermethodsarewriteTo,whichwritestothegivenfileorOutputStreamandagetLengthmethodthatgivethenumberofbytesinthefile.Fordetails,refertotheJavadocofthecom.mongodb.gridfs.GridFSDBFileclass.

Finally,welookatthehandleDeletemethodthatisusedtodeletethefilesonGridFSandisthesimplestofthelot.ThemethodontheobjectofGridFSisremove,whichacceptsa

stringargumentthatisthenameofthefiletodeleteontheserver.Thereturntypeofthismethodisvoid.So,irrespectiveofwhetherthecontentispresentonGridFSornot,themethodwillnotreturnavaluenorthrowanexceptionifanameisprovidedtothismethodforafilethatdoesn’texist.

SeealsoTheStoringbinarydatainMongoDBrecipeTheStoringdatatoGridFSfromaPythonclientrecipe

StoringdatatoGridFSfromaPythonclientIntheStoringlargedatainMongoDBusingGridFSrecipe,wesawwhatGridFSisandhowitcanbeusedtostorelargefilesinMongoDB.Inthepreviousrecipe,wesawhowtouseGridFSAPIfromaJavaclient.Inthisrecipe,wewillseehowtostoreimagedataintoMongoDBusingGridFSfromaPythonprogram.

GettingreadyRefertotheConnectingtoasinglenodefromaJavaclientrecipefromChapter1,InstallingandStartingtheMongoDBServer,forallthenecessarysetupforthisrecipe.IfyouareinterestedinmoredetailsonPythondrivers,refertothefollowingrecipesinChapter3,ProgrammingLanguageDrivers:

InstallingPyMongoExecutingqueryandinsertoperationsusingPyMongoExecutingupdateanddeleteoperationsusingPyMongo

Downloadandsavetheglimpse_of_universe-wide.jpgimagefilefromthedownloadablecodebundle,availableonthebook’swebsite,tothelocalfilesystem,aswedidinthepreviousrecipe.

Howtodoit…1. OpenaPythoninterpreterbytypinginthefollowingcommandintheoperating

systemshell(notethatthecurrentdirectoryisthesameasthedirectorywheretheimagefileglimpse_of_universe-wide.jpgisplaced):

$python

2. Importtherequiredpackagesasfollows:

>>>importpymongo

>>>importgridfs

3. OncethePythonshellisopened,createaMongoClientanddatabaseobjecttothetestdatabaseasfollows:

>>>client=pymongo.MongoClient('mongodb://localhost:27017')

>>>db=client.test

4. TocleartheGridFS-relatedcollectionstostartclean,andonlyifnothingimportantispresentinthem,executethefollowingqueries:

>>>db.fs.files.drop()

>>>db.fs.chunks.drop()

5. CreatetheinstanceofGridFSasfollows:

>>>fs=gridfs.GridFS(db)

6. Now,wewillreadthefileanduploaditscontentstoGridFS.First,createthefileobjectasfollows:

>>>file=open('glimpse_of_universe-wide.jpg','rb')

7. NowputthefileintoGridFSasfollows:

>>>fs.put(file,filename='universe.jpg')

8. Onsuccessfullyexecutingtheprecedingputcommand,weshouldseeObjectIdforthefileuploaded.Thiswouldbesameasthe_idfieldofthefs.filescollectionforthisfile.

9. ExecutethefollowingqueryfromthePythonshell.Itshouldprintoutthedictobjectwiththedetailsoftheupload.Verifythecontentsandcross-checkbyexecutingthefollowingquery:

>>>db.fs.files.find_one()

10. Now,wewillgettheuploadedcontentandwriteittoafileonthelocalfilesystem.LetusgettheGridOutinstancerepresentingtheobject,toreadthedataoutofGridFSasfollows:

>>>gout=fs.get_last_version('universe.jpg')

11. Withthisinstanceavailable,letuswritethedatatothefileonalocalfilesystemasfollows.First,openahandletothefileonthelocalfilesystemtowriteto,asfollows:

>>>fout=open('universe.jpg','wb')

12. Wewillthenwritecontentstoitasfollows:

>>>fout.write(gout.read())

>>>fout.close()

>>>gout.close()

13. Nowverifythefileonthecurrentdirectoryonthelocalfilesystem.Anewfilecalleduniverse.jpgwillbecreatedwiththesamenumberofbytesasthesourcepresentinit.Verifyitbyopeningitinanimageviewer.

Howitworks…Letuslookindetailatthestepsweexecuted.InthePythonshell,weimporttwopackages,pymongoandgridfs,andinstantiatethepymongo.MongoClientandgridfs.GridFSinstances.Theconstructorofthegridfs.GridFSclasstakesanargument,whichistheinstanceofpymongo.Database.

WeopenafileinbinarymodeusingtheopenfunctionandpassthefileobjecttotheGridFS’sputmethod.Thereisanadditionalargumentpassed,calledfilename,whichisthenameofthefileputintoGridFS.Thefirstparameter,infact,neednotbeafileobject,butanyobjectwithareadmethoddefined.

Oncetheputoperationsucceeds,thereturnvalueisanObjectIdfortheuploadeddocumentinthefs.filescollection.Aqueryonfs.filescanconfirmthatthefileisuploaded.Verifythatthesizeofthedatauploadedmatchesthesizeofthefile.

OurnextobjectiveistogetthefilefromGridFSontothelocalfilesystem.Intuitively,onewouldimaginethatifthemethodtoputafileinGridFSisput,thenthemethodtogetthefilewouldbeget.True,themethodisindeedget.However,itwillgetonlybasedontheObjectId,whichwasreturnedbytheputmethod.SoifyouareoktogetbyObjectId,themethodforyouisget.However,ifyouwanttogetbythefilename,themethodtouseisget_last_version,whichacceptsthenameofthefilethatweuploaded,andthereturntypeofthismethodisgridfs.gridfs_file.GridOut.Thisclasscontainsthemethodread,whichwillreadoutallthebytesfromtheuploadedfiletoGridFS.Weopenafilecalleduniverse.jpgtowriteinbinarymodeandwriteallthebytesreadfromtheGridOutobject.

SeealsoTheStoringbinarydatainMongoDBrecipeTheStoringdatatoGridFSfromaJavaclientrecipe

ImplementingtriggersinMongoDBusingoplogAtriggerinarelationaldatabaseisacodethatgetsinvokedwhenaninsert,update,oradeleteoperationisexecutedonatableinthedatabase.Atriggercanbeinvokedeitherbeforeoraftertheoperation.TriggersarenotimplementedinMongoDBoutofthebox,andincaseyouneedsomesortofnotificationforyourapplicationwheneveranyinsert,update,anddeleteoperationsareexecuted,youarelefttomanagethembyyourselfintheapplication.Oneapproachistohavesomesortofdataaccesslayerintheapplicationthatistheonlyplacetoquery,insert,update,ordeletedocumentsfromthecollections.However,thereareafewchallengestothis.First,youneedtoexplicitlycodethelogictoaccommodatethisrequirementintheapplication,whichmayormaynotbefeasible.Ifthedatabaseissharedandmultipleapplicationsaccessit,thingsbecomeevenmoredifficult.Second,theaccessneedstobestrictlyregulatedandnoothersourceofinsert,update,anddeleteshouldbepermitted.

Alternatively,weneedtolookatrunningsomesortoflogicinalayerclosetothedatabase.Onewaytotrackallwriteoperationsisusinganoplog.Notethatreadoperationscannotbetrackedusingoplogs.Inthisrecipe,wewillwriteasmallJavaapplicationtotailanoplogandgetalltheinsert,update,anddeleteoperationshappeningonaMongoinstance.NotethatthisprogramisimplementedinJavaandworksequallywellinanyotherprogramminglanguage.Thecruxliesinthelogicfortheimplementation;theplatformforimplementationcanvary.Also,thisworksonlyifthemongodinstanceisstartedasapartofareplicasetandnotastandaloneinstance.Also,thistrigger-likefunctionalitycanbeinvokedonlyaftertheoperationisperformedandnotbeforethedatagetsinserted,updated,ordeletedfromthecollection.

GettingreadyRefertotheStartingmultipleinstancesaspartofareplicasetrecipefromChapter1,InstallingandStartingtheMongoDBServer,forallthenecessarysetupforthisrecipe.IfyouareinterestedinmoredetailsonJavadrivers,refertotheExecutingqueryandinsertoperationsusingaJavaclientandExecutingupdateanddeleteoperationsusingaJavaclientrecipesinChapter3,ProgrammingLanguageDrivers.Theprerequisitesofthesetworecipesareallweneedforthisrecipe.

RefertotheCreatingandtailingcappedcollectioncursorsinMongoDBrecipeinthischapter,toknowmoreaboutcappedcollectionsandtailablecursorsifyouneedarefresher.Finally,thoughnotmandatory,Chapter4,Administration,explainsoplogindepthintheUnderstandingandanalyzingoplogsrecipe.ThisrecipewillnotexplainoplogindepthaswedidinChapter4,Administration.Openashellandconnectittotheprimaryofthereplicaset.

Forthisrecipe,wewillbeusingtheprojectmongo-cookbook-oplogtrigger.Thisprojectisavailableinthesourcecodebundledownloadablefromthebook’swebsite.Thefolderneedstobeextractedonthelocalfilesystem.Openacommand-lineshellandgototherootoftheprojectextracted.Itshouldbeinthedirectorywherethepom.xmlfileisfound.Also,theTriggerOperations.jsfilewillbeneededtotriggeroperationsinthedatabasethatweintendtocapture.

Howtodoit…1. Openanoperatingsystemshellandexecutethefollowingcommand:

mvncleancompileexec:java-

Dexec.mainClass=com.packtpub.mongo.cookbook.OplogTrigger-

Dexec.args="test.oplogTriggerTest"

2. WiththeJavaprogramstarted,wewillopentheshellasfollows,withtheTriggerOperations.jsfilepresentinthecurrentdirectoryandthemongodinstancelisteningtoport27000astheprimary:

$mongo--port27000TriggerOperations.js--shell

3. Oncetheshellisconnected,executethefollowingfunctionweloadedfromtheJavaScript:

test:PRIMARY>triggerOperations()

4. ObservetheoutputprintedoutontheconsolewheretheJavaprogramcom.packtpub.mongo.cookbook.OplogTriggerisbeingexecutedusingMaven.

Howitworks…Thefunctionalityweimplementedisprettyhandyforalotofusecases.Letusseewhatwedidatahigherlevelfirst.TheJavaprogramcom.packtpub.mongo.cookbook.OplogTriggerissomethingthatactsasatriggerwhennewdataisinserted,updated,ordeletedfromacollectioninMongoDB.ItusestheoplogcollectionthatisthebackboneofreplicationinMongotoimplementthisfunctionality.

TheJavaScriptwehavejustactsasasourceofproducing,updating,anddeletingdatafromthecollection.There’snothingreallysignificanttowhatthisJavaScriptfunctiondoes,butitinsertssixdocumentsinacollection,updatesoneofthem,deletesoneofthem,insertsfourmoredocuments,andfinally,deletesallthedocuments.YoumaychoosetoopentheTriggerOperations.jsfileandtakealookathowitisimplemented.ThecollectiononwhichitperformsispresentinthetestdatabaseandiscalledoplogTriggerTest.

WhenweexecutetheJavaScriptfunction,weshouldseesomethinglikethefollowingoutputprintedontheconsole:

[INFO]<<<exec-maven-plugin:1.2.1:java(default-cli)@mongo-cookbook-

oplogtriger<<<

[INFO]

[INFO]---exec-maven-plugin:1.2.1:java(default-cli)@mongo-cookbook-

oplogtriger---

Connectedsuccessfully..

Startingtailingoplog…

OperationisInsertObjectIdis5321c4c2357845b165d42a5f

OperationisInsertObjectIdis5321c4c2357845b165d42a60

OperationisInsertObjectIdis5321c4c2357845b165d42a61

OperationisInsertObjectIdis5321c4c2357845b165d42a62

OperationisInsertObjectIdis5321c4c2357845b165d42a63

OperationisInsertObjectIdis5321c4c2357845b165d42a64

OperationisUpdateObjectIdis5321c4c2357845b165d42a60

OperationisDeleteObjectIdis5321c4c2357845b165d42a61

OperationisInsertObjectIdis5321c4c2357845b165d42a65

OperationisInsertObjectIdis5321c4c2357845b165d42a66

OperationisInsertObjectIdis5321c4c2357845b165d42a67

OperationisInsertObjectIdis5321c4c2357845b165d42a68

OperationisDeleteObjectIdis5321c4c2357845b165d42a5f

OperationisDeleteObjectIdis5321c4c2357845b165d42a62

OperationisDeleteObjectIdis5321c4c2357845b165d42a63

OperationisDeleteObjectIdis5321c4c2357845b165d42a64

OperationisDeleteObjectIdis5321c4c2357845b165d42a60

OperationisDeleteObjectIdis5321c4c2357845b165d42a65

OperationisDeleteObjectIdis5321c4c2357845b165d42a66

OperationisDeleteObjectIdis5321c4c2357845b165d42a67

OperationisDeleteObjectIdis5321c4c2357845b165d42a68

TheMavenprogramrunscontinuouslyandneverterminatesastheJavaprogramdoesn’tterminate.YoumayhitCtrl+Ctostoptheexecution.

LetusanalyzetheJavaprogram,whichiswherethemeatofthecontentis.Thefirstassumptionisthatforthisprogramtowork,thereplicasetmustbesetup,aswewilluse

Mongo’soplogcollection.TheJavaprogramcreatesaconnectiontotheprimaryofthereplicasetmembers,connectstothelocaldatabase,andgetstheoplog.rscollection.Then,allitdoesisfindthelast,ornearlythelast,timestampintheoplog.Thisisdonenotjusttopreventthewholeoplogfrombeingreplayedonstartup,butalsotomarkapointtowardstheendintheoplog.Thefollowingisthecodetofindthistimestampvalue:

DBCursorcursor=collection.find().sort(newBasicDBObject("$natural",

-1)).limit(1);

intcurrent=(int)(System.currentTimeMillis()/1000);

returncursor.hasNext()?(BSONTimestamp)cursor.next().get("ts"):new

BSONTimestamp(current,1);

Theoplogissortedinthereversenaturalordertofindthetimeinthelastdocumentinit.AsoplogsfollowtheFIFOpattern,sortingtheoploginthenaturaldescendingorderisequivalenttosortingbythetimestampindescendingorder.

Oncethisisdone,findingthetimestampasearlier,wequerytheoplogcollectionasusual,butwithtwoadditionaloptionsasfollows:

DBCursorcursor=collection.find(QueryBuilder.start("ts")

.greaterThan(lastreadTimestamp).get())

.addOption(Bytes.QUERYOPTION_TAILABLE)

.addOption(Bytes.QUERYOPTION_AWAITDATA);

ThequeryfindsalldocumentsgreaterthanaparticulartimestampandaddstwooptionsBytes.QUERYOPTION_TAILABLEandBytes.QUERYOPTION_AWAITDATA.Thelatteroptioncanonlybeaddedwhentheformeroptionisadded.Thisnotonlyqueriesandreturnsthedata,butalsowaitsforsometimewhentheexecutionreachestheendofthecursorforsomemoredata.Eventually,whennodataarrives,itterminates.

Duringeveryiteration,storethelastseentimestampaswell.Thisisusedwhenthecursorcloseswhennomoredataisavailable,andwequeryagaintogetanewtailablecursorinstance.Thequerythistimewillusethetimestampthatwehavestoredonthepreviousiterationwhenthelastdocumentwasseen.ThisprocesscontinuesindefinitelyandwebasicallytailthecollectionjustaswetailafileinUnixusingthetailcommand.

Theoplogdocumentcontainsafieldcalledop,fortheoperationwhosevaluesarei,u,anddforinsert,update,anddelete,respectively.Thefieldocontainstheinsertedordeletedobject’sID(_id)inthecaseofinsertanddelete.Inthecasetheupdateofthefileo2containsthe_id,allwedoissimplycheckfortheseconditionsandprintouttheoperationandtheIDofthedocumentinserted,deleted,orupdated.

Let’slookatthingsweneedtobecarefulabout.Obviously,thedeleteddocumentswillnotbeavailableinthecollection,so_idwouldnotreallybeusefulifyouintendtoquery.Also,becarefulwhenselectingadocumentaftertheupdateusingtheIDweget,assomeotheroperation,laterintheoplog,mightalreadyhaveperformedmoreupdatesonthesamedocumentandourapplication’stailablecursorisyettoreachthatpoint.Thisiscommonincaseofhigh-volumesystems.Wehaveasimilarproblemforinsertsaswell.Thedocumentwemightquery,usingtheprovidedid,mightbeupdated/deletedalready.Applicationsusingthislogictotracktheseoperationsmustbeawareofthem.

Alternatively,takealookattheoplogthatcontainsmoredetails,suchasthedocumentinsertedortheupdatestatementexecuted.Updatesintheoplogcollectionareidempotent,whichmeanstheycanbeappliedanynumberoftimeswithoutunintendedsideeffects.Forinstance,iftheactualupdatewastoincrementthevaluebyone,theupdateintheoplogcollectionwillhavethesetoperatorwiththefinalvaluetobeexpected.Thisway,thesameupdatecanbeappliedmultipletimes.Thelogicyouwouldusewouldthenhavetobemoresophisticated,toimplementsuchscenarios.

Also,failoversarenothandledhere.Thisisneededforproduction-basedsystems.Theinfiniteloopontheotherhand,opensanewcursorassoonasthefirstoneterminates.Therecouldbeasleepdurationintroducedbeforetheoplogisqueriedagain,toavoidoverwhelmingtheserverwithqueries.Notethattheprogramgivenhereisnotaproduction-qualitycodebutjustasimpledemoofthetechniquethatisbeingusedbyalotofothersystemstogetnotifiedaboutnewdatainsertions,deletions,andupdatestocollectionsinMongoDB.

MongoDBdidn’thavethetextsearchfeaturetillversion2.4,andpriortothat,allfull-textsearchwashandledusingexternalsearchenginessuchasSolrorElasticsearch.Evennow,atthetimeofwriting,thetextsearchfeatureinMongoDB,thoughproduction-ready,issomethingmanywouldstilluseadedicatedexternalsearchindexer.Itwon’tbesurprisingifadecisionistakentouseanexternalfull-textindexsearchtoolinsteadofleveragingMongoDB’sinbuiltone.IncaseofElasticsearch,theabstractiontoflowthedataintotheindexesisknownasriver.TheMongoDBriverinElasticsearch,whichaddsdatatotheindexesasandwhenthedatagetsaddedtothecollectionsinMongo,isbuiltonthesamelogicaswesawinthesimpleprogramimplementedinJava.

Executingflatplane(2D)geospatialqueriesinMongousinggeospatialindexesInthisrecipe,wewillseewhatgeospatialqueriesareandthenseehowtoapplythesequeriesonflatplanes.Wewillputittouseinatestmapapplication.

Geospatialqueriescanbeexecutedondatainwhichgeospatialindexesarecreated.Therearetwotypesofgeospatialindexes.Thefirstone,called2Dindexes,isthesimplerofthetwo.Itassumesthatthedataisgivenasx,ycoordinates.Thesecondone.called3Dorsphericalindexes,isrelativelymorecomplicated.Inthisrecipe,wewillexplore2Dindexesandexecutesomequerieson2Ddata.Thedataonwhichwearegoingtoworkisa25X25gridwithsomecoordinatesrepresentingbusstops,restaurants,hospitals,andgardens.

GettingreadyRefertotheConnectingtoasinglenodefromaJavaclientrecipefromChapter1,InstallingandStartingtheMongoDBServer,forallthenecessarysetupforthisrecipe.Downloadthedatafilenamed2dMapLegacyData.jsonandkeepitreadytoimportonthelocalfilesystem.OpenaMongoshellconnectingtothelocalMongoDBinstance.

Howtodoit…1. Executethefollowingcommandfromtheoperatingsystemshelltoimportthedata

intothecollection.The2dMapLegacyData.jsonfileispresentinthecurrentdirectory.

$mongoimport-careaMap-dtest--drop2dMapLegacyData.json

2. Ifweseesomethinglikethefollowingoutputonthescreen,wecanconfirmthattheimporthasgonethroughsuccessfully:

connectedto:127.0.0.1

MonMar1723:58:27.880dropping:test.areaMap

MonMar1723:58:27.932check926

MonMar1723:58:27.934imported26objects

3. AfterthesuccessfulimportfromtheopenedMongoshell,verifythecollectionanditscontentsbyexecutingthefollowingquery:

>db.areaMap.find()

Thisshouldgiveyouthefeelofthedatainthecollection

4. Thenextstepistocreatea2Dgeospatialindexonthisdata.Executethefollowingcommandtocreatea2Dindex:

$db.areaMap.ensureIndex({co:'2d'})

5. Withtheindexcreated,wewillnowtrytofindthenearestrestaurantfromtheplacewhereanindividualisstanding.Assumingthepersonisnotfussyaboutthetypeofcuisine,letusexecutethefollowingquery,assumingthatthepersonisstandingatlocation(12,8),asshownintheprecedingscreenshot.Also,weareinterestedinjustthethreenearestplaces:

$db.areaMap.find({co:{$near:[12,8]},type:'R'}).limit(3)

6. Thisshouldgiveusthreeresultsstartingwiththenearestrestaurant,withthesubsequentonesgivenaspertheincreasingdistance.Ifwelookattheimagegivenearlier,wekindofagreewiththeresultsgivenhere.

7. Letusaddmoreoptionstothequery.Theindividualhastowalkand,thus,wantsthedistancetoberestrictedtoaparticularvalueintheresults.Letusrewritethequerywiththefollowingmodification:

$db.areaMap.find({co:{$near:[12,8],$maxDistance:4},type:'R'})

8. Observethenumberofresultsretrievedthistimearound.

Howitworks…Letusnowgothroughwhatwedid.Beforewecontinue,letusdefinewhatexactlywemeanbythedistancebetweentwopoints.Supposeonacartesianplanewehavetwopoints,(x1,y1)and(x2,y2),thedistancebetweenthemwouldbecomputedusingthefollowingformula:

Forexample,supposethetwopointsare(2,10)and(12,3),thedistancewouldbeasfollows:

AfterknowinghowcalculationsfordistancearedonebehindthescenesbyMongoDB,letusseewhatwedidrightfromstep1.

WestartedbyimportingthedatanormallyintoaareaMapcollectioninthexdatabaseandcreatedanindexasdb.areaMap.ensureIndex({co:'2d'}).Theindexiscreatedonthecofieldinthedocumentandthevalueisaspecialvalue2d,whichdenotesthatthisisaspecialtypeofindexcalled2Dgeospatialindex.Usually,wegivethisvalueas1,or-1inothercases,denotingtheorderoftheindex.

Therearetwotypesofindexes:2Dindexandsphericalindex.A2Dindexiscommonlyusedforplaneswhosespanislessanddoesnotinvolvesphericalsurfaces.Itcouldbesomethingsuchasamapofthebuilding,alocality,orevenasmallcitywherethecurvatureoftheearthcoveringtheportionofthelandisnotreallysignificant.However,oncethespanofthemapincreasesandcoverstheglobe,2Dindexeswillbeinaccurateforpredictingthevalues,asthecurvatureoftheearthneedstobeconsideredinthecalculations.Insuchcases,wegoforsphericalindexes,whichwewilldiscusssoon.

Oncethe2Dindexiscreated,wecanuseittoquerythecollectionandfindsomepointsnearthepointqueried,asfollows:

>db.areaMap.find({co:{$near:[12,8]},type:'R'}).limit(3)

WewillqueryfordocumentsthatareoftypeR,thatis,thosedocumentsfor“restaurants”andthatareclosesttothepoint(12,8).Theresultsreturnedbythisquerywillbeintheincreasingorderofthedistancefromthepointinquestion,(12,8)inthiscase.Thelimitjustlimitstheresulttothetopthreedocuments.Wemayalsoprovide$maxDistanceinthequery,whichwillrestricttheresultstoadistancelessthanorequaltotheprovidedvalue.Wequeriedforlocationsnotmorethanfourunitsaway,asfollows:

>db.areaMap.find({co:{$near:[12,8],$maxDistance:4},type:'R'})

SphericalindexesandGeoJSON-compliantdatainMongoDBBeforewecontinuewiththisrecipe,weneedtolookatthepreviousrecipetogetanunderstandingofwhatgeospatialindexesareinMongoDBandhowtousethe2Dindexes.WhatwedidsofarwastoimporttheJSONdocumentsinanonstandardformatintheMongoDBcollection,creategeospatialindexes,andquerythem.Thisapproachworksperfectlyfineandinfact,wastheonlyoptionavailableuntilMongoDB2.4.Version2.4ofMongoDBsupportsanadditionalwaytostore,index,andquerythedocumentsinthecollections.ThereisastandardwaytorepresentgeospatialdataparticularlymeantforgeodataexchangeinJSON,andthespecificationofGeoJSONmentionsitindetailathttp://geojson.org/geojson-spec.html.Wecannowstorethedatainthisformat.

Therearevariousgeographicalfiguretypessupportedbythisspecification.However,forourusecase,wewillbeusingthetypepoint.FirstletusseehowthedocumentweimportedbeforeusinganonstandardformatlooksandhowtheoneusingtheGeoJSONformatlooks:

Nonstandardway

{"_id":1,"name":"WhiteStreet","type":"B",co:[4,23]}

GeoJSONformat

{"_id":1,"name":"WhiteStreet","type":"B",co:{type:'Point',

coordinates:[4,23]}}

TheGeoJSONformatlooksmorecomplicatedthanthenonstandardformat,andforourparticularcaseIdoagree.However,whenrepresentingpolygonsandotherlines,thenonstandardformatmighthavetostoremultipledocuments;inthatcase,itcanbestoredinasingledocumentjustbychangingthevalueofthetypefield.Refertothespecificationformoredetails.

GettingreadyTheprerequisitesforthisrecipearethesameastheprerequisitesforthepreviousrecipe,exceptthatthefilestobeimportedwillbe2dMapGeoJSONData.jsonandcountries.geo.json.Downloadthesefilesfromthebook’swebsiteandkeepthemonthelocalfilesystemtoimportthemlater.

NoteSpecialthankstoJohanSundströmforsharingtheworlddata.TheGeoJSONfortheworldistakenfromhttps://github.com/johan/world.geo.json.ThefileismassagedtoenableimportingandindexcreationinMongo.Version2.4doesn’tsupportMultiPolygonandthus,allMultiPolygontypesofshapesareomitted.Thisshortcomingseemstobefixedinversion2.6though.

Howtodoit…1. ImporttheGeoJSON-compatibledatainanewcollectionasfollows.Thiscontains

26documents,similartowhatweimportedlasttimearound,exceptthattheyareformattedusingtheGeoJSONformat.

$mongoimport-careaMapGeoJSON-dtest--drop2dMapGeoJSONData.json

$mongoimport-cworldMap-dtest--dropcountries.geo.json

2. Createageospatialindexonthiscollectionasfollows:

>db.areaMapGeoJSON.ensureIndex({"co":"2dsphere"})

>db.worldMap.ensureIndex({geometry:'2dsphere'})

3. WewillfirstquerytheareaMapGeoJSONcollectionasfollows:

>db.areaMapGeoJSON.find(

{co:{

$near:{

$geometry:{

type:'Point',

coordinates:[12,8]

}

}

},

type:'R'

}).limit(3)

4. Next,wewilltryandfindalltherestaurantsthatfallwithinthesquaredrawnbetweenthepoints(0,0),(0,11),(11,11),and(11,0).Refertothepreviousscreenshottogetaclearvisualofthepointsandtheresultstoexpect.

5. Writethefollowingqueryandobservetheresults:

>db.areaMapGeoJSON.find(

{co:{

$geoIntersects:{

$geometry:{

type:'Polygon',

coordinates:[[[0,0],[0,11],[11,11],[11,0],[0,0]]]

}

}

},

type:'R'

})

6. Checkwhetheritcontainsthethreerestaurantsatcoordinates(2,6),(10,5),and(10,1)asexpected.

7. Next,wewilltryandperformsomeoperationsthatwouldfindallthematchingobjectsthatliecompletelywithinanotherenclosingpolygon.Supposewewanttofindsomebusstopsthatliewithinagivensquareblock;suchusecasescanbeaddressedusingthe$geoWithinoperator,andthequerytoachieveitisasfollows:

>db.areaMapGeoJSON.find(

{co:{

$geoWithin:{

$geometry:{

type:'Polygon',

coordinates:[[[3,9],[3,24],[6,24],[6,9],[3,9]]]}

}

},

type:'B'

}

)

8. Verifytheresults;weshouldhavethreebusstopsintheresult.Refertotheprecedingscreenshottogettheexpectedresultsofthequery.

9. Whenweexecutetheprecedingcommands,theyjustprintthedocumentsintheascendingorderofthedistance.However,wedon’tseetheactualdistanceintheresult.Letusexecutethesamequeryasinstep3andadditionallygetthecalculateddistancesasfollows:

>db.runCommand({

geoNear:"areaMapGeoJSON",

near:[12,8],

spherical:true,

limit:3,

query:{type:'R'}

}

)

10. Theprecedingqueryreturnsonedocumentwithanarraywithinthefieldcalledresults,whichcontainsthematchingdocumentsandthecalculateddistances.Theresultalsocontainssomeadditionalstatsgivingthemaximumdistance,theaverageofthedistancesintheresult,totaldocumentsscanned,andthetimetakeninmilliseconds.

11. WewillfinallyqueryontheworldMapcollectiontofindwhichcountrytheprovidedcoordinateliesin.ExecutethefollowingqueryfromtheMongoshell:

db.worldMap.find(

{geometry:{

$geoIntersects:{

$geometry:{

type:'Point',

coordinates:[7,52]

}

}

}

}

,{properties:1,_id:0}

)

12. ThepossibleoperationswecanperformwiththeworldMapcollectionarenumerous,anditisnotpracticallypossibletocoveralloftheminthisrecipe.Iwouldencourageyoutoplayaroundwiththiscollectionandtryoutdifferentusecases.

Howitworks…StartingfromMongoDBVersion2.4,thestandardwayforstoringgeospatialdatainJSONisalsosupported.Notethatthelegacyapproachwesawisalsosupported.However,ifyouarestartingfresh,itisrecommendedthatyougoaheadwiththisapproach,forthefollowingreasons:

ItisastandardandanybodyawareofthespecificationwilleasilybeabletounderstandthestructureofthedocumentItmakesstoringcomplexshapes,polygons,andmultilineseasyItalsoletsusqueryeasilyfortheintersectionoftheshapes,using$geoIntersectandothernewsetsofoperators

ForusingGeoJSON-compatibledocuments,weimportJSONdocumentsinthe2dMapGeoJSONData.jsonfileintotheareaMapGeoJSONcollectionandcreatetheindexasfollows:

>db.areaMapGeoJSON.ensureIndex({"co":"2dsphere"})

ThecollectionhasdatasimilartowhatwehadimportedintotheareaMapcollectioninthepreviousrecipe,butwithadifferentstructurethatiscompatiblewiththeJSONformat.Thetypeusedhereis2Dsphereandnot2D.The2dspheretypeofindexalsoconsidersthesphericalsurfacesincalculations.Notethatthefieldco,onwhichwearecreatingthegeospatialindex,isnotanarrayofcoordinatesbutadocumentitselfthatisGeoJSON-compatible.

Wequerywherethevalueofthe$nearoperatorisnotanarrayofthecoordinates,aswedidinthepreviousrecipe,butadocumentwiththe$geometrykeyandthevalueisaGeoJSON-compatibledocumentforapointwiththecoordinates.Theresults,irrespectiveofthequeryweuse,areidentical.Refertostep3inthisrecipeandstep5inthepreviousrecipe,toseethedifferenceinthequery.TheapproachusingGeoJSONlooksmorecomplicatedbutithassomeadvantages,whichwewillseesoon.

Itisimportanttonotethatwecannotmixthetwoapproaches.TryexecutingthequeryintheGeoJSONformatwejustexecutedontheareaMapcollectionandnotethatthoughwedonotgetanyerror,theresultsarenotcorrect.

Weusedthe$geoIntersectsoperatorinstep5ofthisrecipe.ThisisonlypossiblewhenthedocumentsarestoredintheGeoJSONformatinthedatabase.Thequerysimplyfindsallthepointsinourcasethatintersectanyshapewecreate.WecreateapolygonusingtheGeoJSONformat,asfollows:

{

type:'Polygon',

coordinates:[[[0,0],[0,11],[11,11],[11,0],[0,0]]]

}

Thecoordinatesareforthesquaregivingthefourcornersinaclockwisedirection,withthelastcoordinatethesameasthefirst,denotingittobecomplete.Thequeryexecutedisthesameas$near,apartfromthefactthatthe$nearoperatorisreplacedby

$geoIntersects,andthevalueofthe$geometryfieldistheGeoJSONdocumentofthepolygonwithwhichwewishtofindtheintersectingpointsintheareaMapGeoJSONcollection.Ifwelookattheresultsobtainedandlookattheprecedingscreenshot,theyindeedarewhatweexpected.

The$geoWithinoperator(http://docs.mongodb.org/manual/reference/operator/query/geoWithin/)isprettyhandytousewhenwewanttofindthepointsinthepolygonorevenwithinanotherpolygon.Notethatonlyshapescompletelyinsidethegivenpolygonwillbereturned.Supposethat,justlikeourworldMapcollection,wehaveacitiescollectionwiththeircoordinatesspecifiedinasimilarmanner.Wecanthenusethepolygonofacountrytoqueryallthepolygonsinthecitiescollectionthatlieentirelywithinit,thusgivingusthecities.Obviously,aneasierandfasterwaywouldbestorethecountrycodeinthecitydocument.Alternatively,ifwehavesomedatamissinginthecitiescollectionandthecountryisnotpresent,onepointanywherewithinthecity’spolygon(asacityentirelyliesinonecountry)canbeusedandaquerycanbeexecutedontheworldMapcollectiontogetitscountry,whichwehavedemonstratedinstep11.

Acombinationofwhatwesawearliercanbeputtogoodusetocomputethedistancesbetweentwopointsorevenexecutegeometricoperation.

Someofthefunctionalities,suchasgettingthecentroidofapolygonfigure,oreventheareaofapolygon,storedasGeoJSONinthecollection,arenotsupportedoutoftheboxandthereshouldhavebeensomeutilityfunctionstohelpcomputethesegivencoordinates.Thesefeaturesaregoodanditiscommonlyrequiredtohavethem;perhaps,wemighthavesomesupportinfuturereleasesforoperationsthatcanbeimplementedbydevelopersthemselves.Also,thereisnostraightforwardwaytofindout,ifthereisanoverlapbetweentwopolygons,atwhichcoordinatestheyoverlap,whatistheareaofoverlap,andsoon.The$geoIntersectsoperatorwesawdoestelluswhichpolygonsdointersectwiththegivenpolygon,point,orline.

ThoughunrelatedtoMongo,theGeoJSONformatdoesn’thavesupportforcircles;hence,storingcirclesinMongousingGeoJSONformatisnotpossible.Formoredetailsongeospatialoperators,visithttp://docs.mongodb.org/manual/reference/operator/query-geospatial/.

Implementingafull-textsearchinMongoDBManyofus(Iwon’tbewrongifIsayallofus)useGoogleeverydaytosearchcontentontheWeb.Tocutalongstoryshort,thetextthatweprovideinthetextboxonGoogle’spageisusedtosearchthepagesontheWebthatithasindexed.ThesearchresultsarethenreturnedtousinanorderdeterminedbyGoogle’spagerankalgorithm.Wemightwanttohaveasimilarfunctionalityinourdatabasethatletsussearchforsometextcontentandgivesthecorrespondingsearchresults.Notethatthistextsearchisnotthesameasfindingthetextaspartofasentence,whichcaneasilybedoneusingregex.Itgoeswaybeyondthatandcanbeusedtogetresultsthatcontainthesame,aresimilarsounding,orhaveasimilarbaseword;wecanevenreturnevenasynonymintheactualsentence.

SinceMongoDBVersion2.4,thetextindexesintroducedletuscreatetextindexesonaparticularfieldinthedocumentandenabletextsearchonthosewords.Inthisrecipe,wewillbeimportingsomedocumentsandcreatingtextindexesonthem,whichwelaterquerytoretrievetheresults.

GettingreadyAsimplesinglenodeiswhatweneedforthetest.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,forhowtostarttheserver.However,donotstarttheserveryet.Thereisanadditionalflagprovidedduringthestartuptoenabletextsearch.DownloadtheBlogEntries.jsonfilefromthebook’swebsiteandkeepitonyourlocaldrive,readytobeimported.

Howtodoit…1. StarttheMongoDBserverlisteningtoport27017asfollows:

$mongod/data/mongo/db--smallfiles–oplogSize50--setParameter

textSearchEnabled=true

Asversion2.4isused,weneedtoexplicitlyenabletextsearchusingtextSearchEnabled.Forversion2.6andabove,thiscommand-lineoptioncanbeskipped.

2. Oncetheserverisstarted,wewillbecreatingthetestdatainacollectionasfollows.WiththeBlogEntries.jsonfileplacedinthecurrentdirectory,wewillbecreatingauserBlogcollectionusingmongoimport:

$mongoimport-dtest-cuserBlogBlogEntries.json--drop

3. NowconnecttotheMongoprocessfromtheMongoshell,bytypingthefollowingcommandfromtheoperatingsystemshell:

$mongo

4. Onceconnected,getafeelofthedocumentsintheuserBlogcollectionasfollows:

>db.userBlog.findOne()

5. Theblog_textfieldisofinterest,andthisistheoneonwithwhichwewillbecreatingatextsearchindex.

6. Createatextindexontheblog_textfieldofthedocumentasfollows:

>db.userBlog.ensureIndex({'blog_text':'text'})

7. NowexecutethefollowingsearchonthecollectionfromtheMongoshell.Thefollowingwayistheonlywaytoperformatextsearchinversion2.4.Inversion2.6,thoughitworksfine,itisdeprecated.

$db.runCommand({'text':'userBlog',search:'plotzoo'})

8. Lookattheresultsobtained.9. Executeanothersearchasfollows:

$db.runCommand({'text':'userBlog',search:"Zoo-plot"})

Howitworks…Letusnowseehowitallworks.Textsearchisdonebyaprocesscalledreverseindexes.Insimpleterms,thisisamechanismwherethesentencesarebrokenintowordsandthen,theseindividualwordspointbacktothedocumentthattheybelongto.Theprocessisnotstraightforward,though;letusseewhathappensinthisprocessstepbystepatahighlevel:

1. Considerthesentence,“Iplayedcricketyesterday”.Thefirststepistobreakthissentenceintotokensas[“I”,“played”,“cricket”,“yesterday”].

2. Next,thestopwordsfromthebroken-downsentenceareremoved,andweareleftwithasubsetofthese.Stopwordsarealistofverycommonwordsthatareeliminated,asitmakesnosensetoindexthemsincetheycanpotentiallyaffecttheaccuracyofthesearchwhenusedinthesearchquery.Inthiscase,wewillbeleftwiththewords[“played”,“cricket”,“yesterday”].Stopwordsarelanguage-specificandwillbedifferentfordifferentlanguages.

3. Finally,thesewordsarestemmedtotheirbasewords.Inthiscase,itwillbe[“play”,“cricket”,“yesterday”].Stemmingistheprocessofreducingawordtoitsroot.Forinstance,thewordsplay,playing,played,andplayshavethesamerootwordplay.Therearealotofalgorithmsandframeworkspresentforstemmingawordtoitsrootform.Formoreinformationonstemmingandthealgorithmsusedforthispurpose,visithttp://en.wikipedia.org/wiki/Stemming.Similartoeliminatingthestopwords,thestemmingalgorithmislanguage-dependent.TheexamplesgivenherewerefortheEnglishlanguage.

Ifwelookattheindexcreationprocess,itisasfollows:

>db.userBlog.ensureIndex({'blog_text':'text'})

ThekeygivenintheJSONargumentisthenameofthefieldonwhichthetextindexistobecreated,andthevaluewillalwaysbetextdenotingthattheindextobecreatedisatextindex.Oncetheindexiscreated,atahighlevel,theprecedingthreestepsgetexecutedonthecontentofthefieldonwhichtheindexiscreatedineachdocument,andareverseindexiscreated.Youmayalsochoosetocreateatextindexonmorethanonefield.Supposewehadtwofields,blog_text1andblog_text2;wecancreatetheindexas{'blog_text1':'text','blog_text2':'text'}.Thevalue{'$**':'text'}createsanindexonallfieldsofthedocument.

Finally,weexecutedthesearchoperationbyinvokingthefollowingcommand:

db.runCommand({'text':'userBlog',search:'plotzoo'})

TheprecedingcommandrunsthetextsearchontheuserBlogcollectionandthesearchstringusedisplotzoo.Thissearchesforthevalueplotorzoointhetext,inanyorder.Ifwelookattheresults,weseethatwehavetwomatcheddocumentsandthesedocumentsareorderedbythescore.Thisscoretellsushowrelevantthedocumentsearchedis;thehigherthescore,themoreitsrelevance.Inourcase,oneofthedocumentshadboththe

wordsplotandzooinitandthusgotahigherscorethananotherdocument,asweseeinthefollowingexample.Theresultalsocontainsashortsummaryofthetotalnumberofdocumentsscannedtogettheresults,thenumberofresults,andthetotaltimetakentosearch.

{

"queryDebugString":"bought|zoo||||||",

"language":"english",

"results":[

{

"score":2.6353665865384617,

"obj":{

"_id":5,

...

}

},

{

"score":0.5263157894736842,

"obj":{

"_id":6,

...

}

}

],

"stats":{

"nscanned":3,

"nscannedObjects":0,

"n":2,

"nfound":2,

"timeMicros":119

},

"ok":1

}

Inversion2.6,therecommendedwaytoqueryforthesameresultisasfollows:

>db.userBlog.find({$text:{$search:'plotzoo'}})

NotethatifwecomparetheresultoforderingwiththepreviousexecutionusingrunCommand,weseethatheretheresultsaregivenintheascendingorderofthescore.Also,thescoreisnotavailableintheresultthatwasavailableinthepreviousrunusingrunCommand.Togetthescoresintheresult,weneedtomodifythequeryabit,asfollows:

>db.userBlog.find({$text:{$search:'plotzoo'}},{score:{$meta:

"textScore"}})

Nowwehaveanadditionaldocumentprovidedinthefindmethodthatasksforthescorecalculatedforthetextmatch.Theresultsarestillnotorderedinthedescendingorderofthescore.Letusseehowtosorttheresultsbyscore:

>db.userBlog.find({$text:{$search:'plotzoo'}},{score:{$meta:

"textScore"}}).sort({score:{$meta:"textScore"}})

Aswecansee,thequeryisthesameasbefore.It’sjusttheadditionalsortfunctionweaddedthatwillsorttheresultsbythedescendingorderofthescore.

Whenthesearchwasexecutedas{'text':'userBlog',search:"Zoo-plot"},itsearchedforallthedocumentsthatcontainthewordZoobutdon’tcontainthewordplot.Thuswegetonlyoneresult.The-signisfornegationandleavesoutthedocumentfromthesearchresultcontainingthatword.However,donotexpecttofindalldocumentswithoutthewordplotbyjustgiving-plotinthesearch.

Ifwelookatthecontentsreturnedasaresultofthesearch,theycontainthematcheddocumentsinentirety.Ifwearenotinterestedinfulldocuments,butonlyafewsectionsofadocument,wecanuseprojectiontogetthedesiredfieldsofthedocument.Forinstance,usethefollowingquery:

>db.runCommand({'text':'userBlog',search:"zooplot",project:{_id:1}})

ThiswillbethesameasfindingallthedocumentsintheuserBlogcollectioncontainingthewordsZooorplot,buttheresultswillcontainthe_idfieldfromtheresultingdocuments.

Ifmultiplefieldsareusedtocreateanindex,thenwemayhavedifferentweightsfordifferentfieldsinthedocument.Forinstance,ifblog_text1andblog_text2aretwofields,wecreateanindex;wewantblog_text1givenahigherweightthanblog_text2,sowecreatetheindexasfollows:

>db.collection.ensureIndex(

{

blog_text1:"text",

blog_text2:"text"

},

{

weights:{

blog_text1:2,

blog_text2:1,

},

name:"MyCustomIndexName"

}

)

Thisgivesthecontentinblog_text1twiceasmuchweightasthatinblog_text2.Thus,ifawordisfoundintwodocuments,butispresentintheblog_text1fieldofthefirstdocumentandblog_text2oftheseconddocument,thenthescoreoffirstdocumentwillbemorethanthatofthesecond.NotethatwehavealsoprovidedthenameoftheindexusingthenamefieldasMyCustomIndexName.

WealsoseefromthelanguagekeythatthelanguageinthiscaseisEnglish.MongoDBsupportsvariouslanguagestoimplementtextsearch.Languagesareimportantwhenindexingthecontent,astheydecidethestopwords;thestemmingofwordsislanguage-specificaswell.Visithttp://docs.mongodb.org/manual/reference/command/text/#text-search-languagesformoredetailsonthelanguagessupportedbyMongofortextsearch.Sohowdowechoosethelanguagewhilecreatingtheindex?Bydefault,ifnothingisprovided,theindexiscreatedassumingthatthelanguageisEnglish.However,ifweknowthelanguageisFrench,wecreatetheindexasfollows:

>db.userBlog.ensureIndex({'text':'text'},{'default_language':'french'})

SupposewehadoriginallycreatedtheindexusingtheFrenchlanguage,thegetIndexesmethodwillreturnthefollowingdocument:

[

{

"v":1,

"key":{

"_id":1

},

"ns":"test.userBlog",

"name":"_id_"

},

{

"v":1,

"key":{

"_fts":"text",

"_ftsx":1

},

"ns":"test.userBlog",

"name":"text_text",

"default_language":"french",

"weights":{

"text":1

},

"language_override":"language",

"textIndexVersion":1

}

]

However,ifthelanguagewasdifferentonaper-documentbasis,whichisprettycommoninscenariossuchasblogs,wehaveawayout.Ifwelookattheprecedingdocument,thevalueofthelanguage_overridefieldislanguage.Thismeansthat,onaper-documentbasis,wecanstorethelanguageofthecontentusingthisfield.Initsabsence,thevaluewillbeassumedasthedefaultvalue;Frenchintheprecedingcase.Thus,wecanhave:

{_id:1,language:'english',text:….}//LanguageisEnglish

{_id:2,language:'german',text:….}//LanguageisGerman

{_id:3,text:….}//Languageisthedefaultone;Frenchinthiscase

There’smore…TouseMongoDBtextsearchinproduction,youwouldneedversion2.6.Tillversion2.4,theMongoDBtextsearchwasinbeta.IntegratingMongoDBwithothersystemssuchasSolrandElasticsearchisawisechoicetomakefornow,atleasttillthetextsearchfeatureinMongomatures.Inthenextrecipe,wewillseehowtointegrateMongowithElasticsearch,usingtheMongoconnector.

SeealsoFormoreinformationonthe$textoperator,visithttp://docs.mongodb.org/manual/reference/operator/query/text/

IntegratingMongoDBwithElasticsearchforafull-textsearchMongoDBhasintegratedtextsearchfeatures,aswesawinthepreviousrecipe.However,therearemultiplereasonswhyonewouldnotusetheMongotextsearchfeatureandwouldfallbacktoconventionalsearchenginessuchasSolrorElasticsearch.Thefollowingareafewofthereasons:

Thetextsearchfeatureisproduction-readyinversion2.6.Inversion2.4,itwasintroducedinbeta,whichisnotsuitableforproductionusecases.ProductssuchasSolrandElasticsearcharebuiltontopofLucene,whichhasprovenitselfinthesearchenginearena.SolrandElasticsearchareprettystableproductstoo.YoumightalreadyhaveexpertiseonproductssuchasSolrandElasticsearchandwouldliketousethemasfull-textsearchenginesratherthanMongoDB.SomeparticularfeaturethatyourapplicationmightrequiremaybemissinginMongoDBsearch,.

SettingupadedicatedsearchenginedoesneedadditionaleffortstointegrateitwithaMongoDBinstance.Inthisrecipe,wewillseehowtointegrateaMongoDBinstancewiththesearchengineElasticsearch.

WewillbeusingtheMongoconnectorforintegrationpurpose.Itisanopensourceprojectthatisavailableathttps://github.com/10gen-labs/mongo-connector.

GettingreadyRefertotheInstallingPyMongorecipeinChapter3,ProgrammingLanguageDrivers,toinstallandsetupPython.ThetoolpipisusedtogettheMongoconnector.However,ifyouareworkingontheWindowsplatform,thestepstoinstallpipwerenotmentionedearlier.Visithttps://sites.google.com/site/pydatalog/python/pip-for-windowstogetpipforWindows.

Theprerequisitesforstartingasingleinstanceareallweneedforthisrecipe.However,inthisrecipe,wewillstarttheserverasasinglenodereplicasetfordemonstrationpurpose.

DownloadtheBlogEntries.jsonfilefromthebook’swebsiteandkeepitonyourlocaldrive,readytobeimported.

DownloadElasticsearchforyourtargetplatformfromhttp://www.elasticsearch.org/overview/elkdownloads/.Extractthedownloadedarchive,andfromtheshell,gotothebindirectoryoftheextraction.

Wewillbegettingthemongo-connectorsourcefromgithub.comandrunningit.AGitclientisneededforthispurpose.DownloadandinstalltheGitclientonyourmachine.Visithttp://git-scm.com/downloadsandfollowtheinstructionstoinstallGitonyourtargetoperatingsystem.IfyouarenotcomfortableinstallingGitonyouroperatingsystem,thenthereisanalternativeavailablethatletsyoudownloadthesourceasanarchive.

Visithttps://github.com/10gen-labs/mongo-connector.Here,youwillgetanoptionthatletsyoudownloadthecurrentsourceasanarchive,whichwecanthenextractonourlocaldrive.Thefollowingscreenshotshowsthedownloadoptionavailableonthebottom-rightcornerofthescreen:

NoteNotethatwecanalsoinstallmongo-connectorinaveryeasywayusingpipasfollows:

pipinstallmongo-connector

However,theversioninPyPIisveryold,withnotmanyfeaturessupportedandthus,usingthelatestversionfromtherepositoryisrecommended.

Justlikeinthepreviousrecipe,wherewesawtextsearchinMongo,wewillusethefivedocumentstotestoursimplesearch.DownloadandkeepBlogEntries.json

Howtodoit…1. Atthispoint,itisassumedthatPython,PyMongo,andpipforyouroperatingsystem

platformareinstalled.Wewillnowgetmongo-connectorfromthesource.IfyouhavealreadyinstalledtheGitclient,wewillbeexecutingthefollowingstepsontheoperatingsystemshell.Ifyouhavedecidedtodownloadtherepositoryasanarchive,youmayskipthisstep.Gotothedirectorywhereyouwouldliketoclonetheconnectorrepository,andexecutethefollowingcommands:

$gitclonehttps://github.com/10gen-labs/mongo-connector.git

$cdmongo-connector

$pythonsetup.pyinstall

2. TheprecedingsetupwillalsoinstalltheElasticsearchclientthatwillbeusedbythisapplication.

3. WewillnowstartasingleMongoinstance,butasareplicaset.Fromtheoperatingsystemconsole,executethefollowingcommand:

$mongod--dbpath/data/mongo/db--replSettextSearch--smallfiles--

oplogSize50

4. StartaMongoshellandconnecttothestartedinstanceasfollows:

$mongo

5. FromtheMongoshell,initiatethereplicasetasfollows:

>rs.initiate()

6. Thereplicasetwillbeinitiatedinafewmoments.Meanwhile,wecanproceedtostarttheElasticsearchserverinstance.

7. Executethefollowingcommandfromthecommandlineaftergoingtothebindirectoryoftheextractedelasticsearcharchive:

$elasticsearch

8. Wewon’tbegettingintoElasticsearchsettingsandwillstartitinthedefaultmode.9. Oncestarted,enterhttp://localhost:9200/_nodes/process?prettyinthe

browser.10. IfweseeaJSONdocument,suchasthefollowing,givingtheprocessdetails,we

havesuccessfullystartedElasticsearch:

{

"cluster_name":"elasticsearch",

"nodes":{

"p0gMLKzsT7CjwoPdrl-unA":{

"name":"Zaladane",

"transport_address":"inet[/192.168.2.3:9300]",

"host":"Amol-PC",

"ip":"192.168.2.3",

"version":"1.0.1",

"build":"5c03844",

"http_address":"inet[/192.168.2.3:9200]",

"process":{

"refresh_interval":1000,

"id":5628,

"max_file_descriptors":-1,

"mlockall":false

}

}

}

}

11. OncetheElasticsearchserverandMongoinstanceareupandrunning,andthenecessaryPythonlibrariesinstalled,wewillstarttheconnectorthatwillsyncthedatabetweenthestartedMongoinstanceandtheElasticsearchserver.

Forthesakeofthistest,wewillbeusingtheuser_blogcollectioninthetestdatabase.Thefieldonwhichwewouldliketohavetextsearchimplementedistheblog_textfieldinthedocument.

12. StarttheMongoconnectorfromtheoperatingsystemshellasfollows.ThefollowingcommandwasexecutedwiththeMongoconnector’sdirectoryasthecurrentdirectory:

$pythonmongo_connector/connector.py-mlocalhost:27017-t

http://localhost:9200-ntest.user_blog--fieldsblog_text-d

mongo_connector/doc_managers/elastic_doc_manager.py

13. ImporttheBlogEntries.jsonfileintothecollectionusingthemongoimportutilityasfollows.Thecommandisexecutedwiththe.jsonfilepresentinthecurrentdirectory:

$mongoimport-dtest-cuser_blogBlogEntries.json--drop

14. Openabrowserofyourchoiceandenterhttp://localhost:9200/_search?q=blog_text:facebookinit.

15. Youshouldseesomethinglikethefollowingscreenshotinthebrowser:

Howitworks…Basically,Mongoconnectortailstheoplogtofindnewupdatesthatitpublishestoanotherendpoint.WeusedElasticsearchinourcase,butitcouldevenbeSolr.YoumaychoosetowriteacustomDocManagerthatwouldpluginwiththeconnector.Formoredetails,visithttps://github.com/10gen-labs/mongo-connector/wiki.TheReadmeforhttps://github.com/10gen-labs/mongo-connectorgivessomedetailedinformationaswell.

Wegavetheconnectorthe-m,-t,-n,--fields,and-doptions.Theirmeaningasfollows:

Option Description

-m TheURLoftheMongoDBhosttowhichtheconnectorconnectstogetthedatatobesynchronized.

-t

ThetargetURLofthesystemwithwhichthedataistobesynchronized;Elasticsearchinthiscase.TheURLformatwilldependonthetargetsystem.ShouldyouchoosetoimplementyourownDocManager,theformatwillbeonethatyourDocManagerunderstands.

-n

Thisisthenamespacethatwewouldliketokeepsynchronizedwiththeexternalsystem.Theconnectorwilljustbelookingforchangesinthesenamespaceswhiletailingtheoplogfordata.Thevaluewillbeseparatedbycommasifmorethanonenamespaceistobesynchronized.

--

fields

Thesearethefieldsfromthedocumentthatwouldbesenttotheexternalsystem.Inourcase,itdoesn’tmakesensetoindextheentiredocumentandwasteresources.Itisrecommendedtoaddtotheindexjusttothefieldswhereyouwouldliketoaddtextsearchsupport.Theidentifier_idfieldandthenamespaceofthesourcearealsopresentintheresult,aswecanseeintheprecedingscreenshot.The_idfieldcanthenbeusedtoquerythetargetcollection.

-d Thisisthedocumentmanagertobeused;inourcase,wehaveusedtheElasticsearch’sdocumentmanager.

Formoresupportedoptions,refertothereadmeoftheconnector’spageonGitHub.

OncetheinsertisexecutedontheMongoDBserver,theconnectordetectsthenewlyaddeddocumentstothecollectionofitsinterest,thatis,user_blog,andstartssendingthedatatobeindexedfromthenewlyaddeddocumentstoElasticsearch.Toconfirmtheaddition,weexecuteaqueryinthebrowsertoviewtheresults.

Elasticsearchwillcomplainaboutindexnameswithuppercasecharactersinthem.Themongoconnectordoesn’ttakecareofthisandthus,ifthenameofthecollectionhastobeinlowercase(forexample,userBlog),itwillfail.

There’smore…WehavenotdoneanyadditionalconfigurationonElasticsearch,asthatwasnottheobjectiveofthisrecipe.WeweremoreinterestedinintegratingMongoDBandElasticsearch.YouwillhavetorefertotheElasticsearchdocumentationformoreadvancedconfigoptions.IfintegrationwithElasticsearchisrequired,thereisaconceptcalledriversinElasticsearch,thatcanbeusedaswell.RiversareElasticsearch’swaytogetdatafromanotherdatasource.ForMongoDB,thecodeforarivercanbefoundathttps://github.com/richardwilly98/elasticsearch-river-mongodb/.README.mdinthisrepositoryhasstepsonhowtosetup.

Inthischapter,weexploredarecipenamedImplementingtriggersinMongousingoplog,onhowtoimplementtrigger-likefunctionalitiesusingMongo.ThisconnectorandtheMongoDBriverforElasticsearchrelyonthesamelogictogetthedataoutofMongoasandhowitisneeded.

SeealsoTheElasticsearchdocumentationathttp://www.elasticsearch.org/guide/en/elasticsearch/reference/

Chapter6.MonitoringandBackupsInthischapter,wewillbetakingalookatthefollowingrecipes:

SigningupforMMSandsettinguptheMMSmonitoringagentManagingusersandgroupsontheMMSconsoleMonitoringMongoDBinstancesonMMSSettingupmonitoringalertsonMMSBackingupandrestoringdatainMongousingout-of-the-boxtoolsConfiguringtheMMSbackupserviceManagingbackupsintheMMSbackupservice

IntroductionMonitoringandbackupsareimportantaspectsofanymission-criticalsoftwareinproduction.Monitoringproactivelyletsustakeactionswheneveranyabnormaleventoccursinthesystem,whichcancompromisethedataconsistency,availability,ortheperformanceofthesystem.Issuesmightcometolightafterhavingasignificantimpactintheabsenceofproactivemonitoringofthesystems.Wecoveredadministration-relatedrecipesinChapter4,Administration,andbothmonitoringandbackupactivitiesarepartofit.However,theydemandaseparatechapter,asthecontenttobecoveredisextensive.Inthischapter,wewillseehowtomonitorvariousparametersandsetupalertsforvariousparametersofyourMongoDBcluster,usingtheMongoDBMonitoringService(MMS).Wewilllookatsomemechanismstobackupthedatausingtheout-of-the-boxtoolsprovidedandalsousingtheMMSbackupservice.

SigningupforMMSandsettinguptheMMSmonitoringagentMMSisacloud-basedoron-premisesservicethatenablesyoutomonitoryourMongoDBcluster.Theon-premiseversionisavailablewiththeEnterprisesubscriptiononly.Itgivesyouonecentralplacethatletstheadministratorsmonitorthehealthoftheserverinstancesandtheboxesonwhichtheinstancesarerunning.Inthisrecipe,wewillseewhatthesoftwarerequirementsareandhowtosetupMMSforMongo.

GettingreadyWewillbestartingasingleinstanceofmongod,whichwewillbeusingforthepurposeofmonitoring.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tostartaMongoDBinstanceandconnecttoitfromaMongoshell.Themonitoringagent,usedtosendthestatisticsoftheMongoinstancetothemonitoringservice,usesPythonandPyMongo.RefertotheInstallingPyMongorecipeinChapter3,ProgrammingLanguageDrivers,toknowmoreabouthowtoinstallPythonandPyMongo,thePythonclientofMongoDB.

Howtodoit…1. Ifyoudon’talreadyhaveanMMSaccount,thenloginathttps://mms.mongodb.com/

andsignupforanaccount.Onsigningupandloggingin,weshouldseethefollowingpage:

2. ClickontheGetStartedbuttonunderMonitoring.3. OncewereachtheDownloadAgentoptioninthemenu,clickontheappropriateOS

platformtodownloadtheagent.Followtheinstructionsgiven,afterselectingtheappropriateOSplatform.NotedowntheAPIkeytoo.Forexample,iftheWindowsplatformisselected,wewouldseethefollowingpage:

4. Oncetheinstallationiscomplete,openthemonitoring-agent.configfile,whichwillbepresentintheconfigurationfolderselectedwhileinstallingtheagent.

5. LookoutforthemmsApiKeykeyinthefileandsetitsvaluetotheAPIkeythatwasnoteddownearlierinstep3.

6. Tostartaservicemanually,wehavetogotoservices.msconMSWindows,whichcanbedonebytypingservices.mscintheRundialog(Windows+R).TheservicewillbenamedMMSMonitoringAgent.Onthewebpage,clickontheVerifyAgentbutton.Ifallgoeswell,thestartedagentwillbeverifiedandthesuccessmessagewillbeshown.

7. Thenextstepistoconfigurethehost.Thishostistheonethatisseenfromtheagent’sperspective,runningontheorganization/individual’sinfrastructure.Thefollowingscreenshowsthescreenusedfortheadditionofahost.Thehostnameistheinternalhostname(thehostnameontheclient’snetwork);theMMSontheclouddoesn’tneedtoreachouttotheMongoDBprocesses.ItistheagentthatcollectsthedatafromtheseMongoDBprocessesandsendsthedatatotheMMSservice.

8. Oncethehostdetailsareadded,clickontheVerifyHostbutton.Oncethe

verificationisdone,clickontheStartMonitoringbutton.

WehavesuccessfullysetupMMSandaddedonehosttoit,whichwouldbemonitored.

Howitworks…Inthisrecipe,wehavesetuptheMMSagentandmonitoringforastandaloneMongoDBinstance.Theinstallationandsetupprocessisprettysimple.Wealsoaddedastandaloneinstanceandallwasok.

Supposewehaveareplicasetupandrunning(refertotheStartingmultipleinstancesaspartofareplicasetrecipeinChapter1,InstallingandStartingtheMongoDBServer,formoredetailsonhowtostartareplicaset),andthethreemembersarelisteningtoports27000,27001,and27002,respectively.Refertostep7intheHowtodoit…section,wherewesetuponestandalonehost.IfweselectReplicaSetinthedropdownforHostType,andfortheinternalhostnamewegiveavalidhostnameofanymemberofthereplicaset(inmycase,Amol-PCandport27001weregiven,whichisasecondaryinstance),allotherinstanceswillautomaticallybediscoveredandtheywillbevisibleunderthehosts,asshowninthefollowingscreenshot:

Wedidn’tseewhatistobedonewhensecurityisenabledonthecluster,whichisprettycommoninproductionenvironments.Ifauthenticationisenabled,weneedpropercredentialsfortheMMSagenttogatherthestatistics.TheDBusernameandpasswordthatwegivewhileaddinganewhost(step7oftheHowtodoit…section)shouldhaveaminimumofclusterAdminandreadAnyDatabaseroles.

There’smore…WhatwesawinthisrecipewassettingupanMMSagentandcreatinganaccountfromtheMMSconsole.However,wecanaddgroupsandusersfortheMMSconsoleasadministrators,grantingvarioususersprivilegesforperformingvariousoperationsondifferentgroups.Inthenextrecipe,wewillthrowsomelightonuserandgroupmanagementintheMMSconsole.

ManagingusersandgroupsontheMMSconsoleInthepreviousrecipe,wesawhowtosetupanMMSaccountandhowtosetupanMMSagent.Inthisrecipe,wewillthrowsomelightonhowtosetupthegroupsanduseraccesstotheMMSconsole.

GettingreadyRefertothepreviousrecipeforsettinguptheagentandtheMMSaccount.Thisistheonlyprerequisiteforthisrecipe.

Howtodoit…1. StartbynavigatingtoAdministration|Usersontheleft-handsideofthescreen,as

showninthefollowingscreenshot:

2. Hereyoucanviewtheexistingusersandalsoaddnewusers.OnclickingontheAddUserbutton(circledinthetop-rightcornerofthepreviousscreenshot),youshouldseethefollowingpop-upwindowallowingyoutoaddanewuser:

Theprecedingscreenwillbeusedtoaddusers.Takenoteofthevariousavailableroles.

3. Similarly,bynavigatingtoAdministration|MyGroups,youcanviewandalsoaddnewgroups,byclickingontheAddGroupbutton.Inthetextbox,provideanameforthegroup.Rememberthatthenameofthegroupyouentershouldbeavailableglobally.ThegivennameofthegroupshouldbeuniqueacrossalluserbasesofMMSandnotjustyouraccount.

Whenanewgroupiscreated,itwillbevisibleintheupper-leftcornerinadropdownforallthegroups,asshowninthefollowingscreenshot:

Youcanswitchbetweenthegroupsusingthisdropdown,whichshouldshowallthedetailsandstatsrelevanttotheselectedgroup.

NoteRememberthatagrouponcecreatedcannotbedeleted.Sobecarefulwhilecreatingone.

Howitworks…Thetaskswecompletedintherecipeareprettystraightforwardanddon’tneedalotofexplanation,exceptforonequestion.Whenandwhydoweaddagroup?ItiswhenwewanttosegregateourMongoDBinstancesbydifferentenvironmentsorapplications.TherewillbeadifferentMMSagentrunningforeachgroup.Creatinganewgroupisnecessarywhenwewanttohaveseparatemonitoringgroupsfordifferentenvironmentsofanapplication(development,QA,production,andsoon),andeachgrouphasdifferentprivilegesfortheusers.Thatis,thesameagentcannotbeusedfortwodifferentgroups.Ifwerememberfromthepreviousrecipe,whileconfiguringtheMMSagent,wegiveitanAPIkeyuniquetothegroup.ToviewtheAPIkeyforthegroup,selecttheappropriategroupfromthedropdownonthetop(ifyouruserhasaccessonlytoonegroup,thedropdownwon’tbeseen),gotoAdministration|GroupSettings,asshowninthefollowingscreenshot.ThegroupIDandtheAPIkeywillbothbeshownatthetopofthepage.

Notethatnotalluserroleswillseethisoption.Forexample,userswithread-onlyprivilegescanonlypersonalizetheirprofile,andmostoftheotheroptionswillnotbevisible.

MonitoringMongoDBinstancesonMMSThepreviousrecipes,SigningupforMMSandsettinguptheMMSmonitoringagentandManagingusersandgroupsintheMMSconsole,showedushowtosetupanMMSaccountandagent,addhosts,andmanageuseraccesstotheMMSconsole.ThecoreobjectiveofMMSismonitoringthehostinstances,whichisstillnotdiscussed.Inthisrecipe,wewillbeperformingsomeoperationsonthehostthatweaddedtoMMSinthefirstrecipe,andwewillmonitoritfromtheMMSconsole.

GettingreadyFollowtherecipeSigningupforMMSandsettinguptheMMSmonitoringagentandthatisprettymuchwhatisneededforthisrecipe.Youmaychoosetohaveastandaloneinstanceorareplicaset,eitherwaysisfine.Also,openaMongoshellandconnecttotheprimaryinstancefromit(itisareplicaset).

Howtodoit…1. StartbyloggingintotheMMSconsoleandclickingonDeploymentintheupper-left

corner,andthenagainontheDeploymentlinkinthesubmenu,asshowninthefollowingscreenshot:

2. Clickingononeofthehostnamesshown,wewillseealargevarietyofgraphsshowingvariousstatistics.Inthisrecipe,wewillanalyzeamajorityofthese.

3. Openthebundledownloadedforthebook.InChapter4,Administration,weusedaJavaScriptfilenamedKeepServerBusy.jstokeeptheserverbusywithsomeoperations.Wewillbeusingthesamescriptthistimearound.

4. Intheoperatingsystemshell,executethefollowingcommandwiththe.jsfileinthecurrentdirectory.Theshellconnectstotheport,inmycaseport27000,fortheprimary.

$mongoKeepServerBusy.js--port27000--quiet

5. Oncestarted,keepitrunningandgiveit5–10minutesbeforeyoustartmonitoringthegraphsontheMMSconsole.

Howitworks…TheUnderstandingthemongostatandmongotoputilitiesutilitiesrecipeinChapter4,Administration,demonstratedhowtheseutilitiescanbeusedtogetthecurrentoperationsandresourceutilization.Thatisafairlybasicandhelpfulwaytomonitoraparticularinstance.MMS,however,givesusoneplacetomonitortheMongoDBinstancewithprettyeasy-to-understandgraphs.MMSalsogivesushistoricalstats,whichmongostatandmongotopcannotgive.

Beforewegoaheadwiththeanalysisofthemetrics,IwouldliketomentionthatincaseofMMSmonitoring,thedataisnotqueriednorsentoutoverthepublicnetwork.Itisjustthestatisticsthataresentoverasecurechannelbytheagent.Thesourcecodefortheagentisopensourceandisavailableforexaminationifneeded.Themongodserversneednotbeaccessiblefromthepublicnetwork,asthecloud-basedMMSservicenevercommunicatestotheserverinstancesdirectly.ItistheMMSagentthatcommunicatestotheMMSservice.Typically,oneagentisenoughtomonitorseveralservers,unlessyouplantosegregatethemintodifferentgroups.Also,itisrecommendedtoruntheagentonadedicatedmachine/virtualmachineandnotshareitwithanyofthemongodormongosinstances,unlessitisalesscrucialtestinstancegroupyouaremonitoring.

Letusseesomeofthesestatisticsontheconsole;westartwiththememoryrelatedones.Thefollowinggraphshowstheresident,mapped,andvirtualmemory:

Asseeninthepreviousgraph,theresidentmemoryforthedatasetis82MB,whichisverylow,anditistheactualphysicalmemoryusedupbythemongodprocess.Thiscurrentvalueissignificantlybelowthefreememoryavailable,andgenerally,thiswillincreaseoveraperiodoftimeuntilitreachesapointwhereithasusedupalargechunkofthetotalavailablephysicalmemory.Thisisautomaticallytakencareofbythemongodserverprocess,andwecan’tforceittouseupmorememory,eventhoughitisavailableonthemachineitisrunningon.

Themappedmemory,ontheotherhand,isaboutthetotalsizeofthedatabase,andismappedbyMongoDB.Thissizecanbe(andusuallyis)muchhigherthanthephysical

memoryavailable,whichenablesthemongodprocesstoaddresstheentiredatasetasitispresentinmemoryevenifitisn’tpresent.MongoDBoffloadsthisresponsibilityofmappingandloadingofdatatoandfromthedisktotheunderlyingoperatingsystem.WheneveramemorylocationisaccessedanditisnotavailableintheRAM(thatis,theresidentmemory),theoperatingsystemfetchesthepageintomemory,evictingsomepagetomakespaceforthenewpageifnecessary.Whatexactlyisamemory-mappedfile?Letustrytoseewithasuper-scaled-downversion.Supposewehaveafileof1KB(1024bytes)andtheRAMisonly512bytes,thenobviouslywecannothavethewholefileinthememory.However,youcanasktheoperatingsystemtomapthisfiletotheavailableRAMinpages.Supposethepageisof128bytes,thenthetotalfileiseightpages(128*8=1024).However,theOScanloadfourpagesonly,andassumethatitloadedthefirstfourpages(upto512bytes)inmemory.Whenweaccessthebytenumber200,itisokandfoundinmemory,asitispresentonpage2.Butwhatifweaccessbyte800,whichislogicallyonpage7,whichisnotloadedinmemory?WhattheOSdoesis,ittakesonepageoutfromthememoryandloadspage7,whichcontainsbytenumber800.MongoDBasanapplicationgetsafeelthateverythingwasloadedinmemoryandwasaccessedbythebyteindex,butactuallyitwasn’t,andOStransparentlydidtheworkforus.Asthepageaccessedwasnotpresentinmemoryandwehadtogotothedisktoloaditinmemory,itiscalledapagefault.

Gettingbacktothestatsshowninthegraph,thevirtualmemorycontainsallthememoryusage,includingthemappedmemory,plusanyadditionalmemoryused,suchasthememoryassociatedwiththethreadstackassociatedwitheachconnection,andsoon.Ifjournalingisenabled,thissizewilldefinitelybemorethantwicethatofthemappedmemory,asjournalingtoowillhaveaseparatememorymappingforthedata.Thuswehavetwoaddressesmappingthesamememorylocation.Thisdoesn’tmeanthatthepagewillbeloadedtwice.Itjustmeansthattwodifferentmemorylocationscanbeusedtoaddressthesamephysicalmemory.Veryhighvirtualmemorymightneedsomeinvestigations.Thereisnopredeterminedvalueforwhattoohighoralowvalueis;generallythesevaluesaremonitoredforyoursystemundernormalcircumstanceswhenyouarehappywiththeperformanceofyoursystem.Thesebenchmarkvaluesshouldthenbecomparedwiththefiguresseenwhenthesystemperformancegoesdown,andthenappropriateactionscanbetaken.

Aswesawearlier,pagefaultsarecausedwhenanaccessedmemorylocationisnotpresentintheresidentmemory,causingOStoloadthepagefromthememory.ThisIOactivitywilldefinitelycausetheperformancetoreduce,andtoomanypagefaultscanbringdownthedatabaseperformancedramatically.Thefollowinggraphshowsquiteafewpagefaultsoccurringperminute.However,ifthediskusedisSSDsinsteadofthespinningdisk,thehitintermsofseektimefromdrivemightnotbesignificantlyhigh.

Alargenumberofpagefaultsusuallyoccurwhenenoughphysicalmemoryisn’tavailabletoaccommodatethedataset,andtheoperatingsystemneedstogetthedatafromthediskintothememory.NotethatthisstatshownearlieristakenonanMSWindowsplatformandthisgraphmightseemhighforaverytrivialoperation.Thevalueshownhereisthesumofhardandsoftpagefaultsanddoesn’treallygiveatruefigureofhowgood(orbad)thesystemisdoing.ThesefigureswouldbedifferentonaUnix-basedoperatingsystem.ThereisaJIRAopenatthetimeofwritingthisbook,whichreportsthisproblem(https://jira.mongodb.org/browse/SERVER-5799).

Onethingyoumightneedtorememberisthat,inproductionsystems,MongoDBdoesn’tworkwellwithNUMAarchitectureandyoumightseealotofpagefaultsoccurringeveniftheavailablememoryseemstobehighenough.Refertohttp://docs.mongodb.org/manual/administration/production-notes/formoredetails.

Thereisanadditionalgraph,asseennext,whichgivessomedetailsaboutnonmappedmemory.Aswesawearlierinthissection,therearethreetypesofmemory,namely,mapped,resident,andvirtual.Mappedmemoryisalwayslessthanvirtualmemory.Virtualmemorywillbemorethantwicethatofmappedmemoryifjournalingisenabled.Ifwelookatthegraphgivenearlierinthissection,weseethatthemappedmemoryis192MB,whereasthevirtualmemoryis532MB.Asjournalingisenabled,thememoryismorethantwicethatofthemappedmemory.Whenjournalingisenabled,thesamepageofdataismappedtwiceinmemory.Notethatthepageisphysicallyloadedonlyonce;itisjustthatthesamelocationcanbeaddressedusingtwodifferentaddresses.

Letusfindthedifferencebetweenthevirtualmemory,whichis532MBandtwicethemappedmemory,whichis2*192=384MB.Thedifferencebetweenthesefiguresis148MB(532-384).

Whatweseenextistheportionofvirtualmemorythatisnotmapped.Thisvalueisthesameaswhatwejustcalculated.

Asmentionedearlier,ahighorlowvaluefornonmappedmemoryisnotdefined;however,whenthevaluereachesGBs,wemighthavetoinvestigate,ifthepossiblenumberofopenconnectionsishigh,andcheckifthereisaleakwithclientapplicationsnotclosingthemafterusingit.Thereisagraphthatgivesusthenumberofconnectionsopenanditlooksasfollows:

Onceweknowthenumberofconnectionsandfindittoohighascomparedtothenormalexpectedcount,wewillneedtofindtheclientswhohaveopenedtheconnectionstothatinstance.WecanexecutethefollowingJavaScriptcodefromtheshelltogetthosedetails.Unfortunately,atthetimeofwritingthisbook,MMSdidn’thavethisfeaturetolistouttheclientconnectiondetails.

testMon:PRIMARY>varcurrentOps=db.currentOp(true).inprog;

currentOps.forEach(function(c){

if(c.hasOwnProperty('client')){

print('Client:'+c.client+",connectionidis:"+c.desc);

}

//Getotherdetailsasneeded

});

Thedb.currentOpmethodreturnsalltheidleandsystemoperationsintheresult.Wetheniteratethroughalltheresultsandprintouttheclienthostandtheconnectiondetails.AtypicaldocumentintheresultofthecurrentOpmethodlookslikethefollowingcodesnippet.Youmaychoosetotweaktheprecedingpieceofcodetoincludemoredetailsaccordingtoyourneeds.

{

"opid":62052485,

"active":false,

"op":"query",

"ns":"",

"query":{

"replSetGetStatus":1,

"forShell":1

},

"client":"127.0.0.1:64460",

"desc":"conn3651",

"connectionId":3651,

"waitingForLock":false,

"numYields":0,

"lockStats":{

"timeLockedMicros":{

},

"timeAcquiringMicros":{

}

}

}

TheUnderstandingthemongostatandmongotoputilitiesrecipeinChapter4,Administration,wasusedtogetsomedetailsonthepercentageoftimeforwhichadatabasewaslocked,andthenumberofupdate,insert,delete,andgetmoreoperationsexecutedpersecond.Youmayrefertothisrecipeandtryitout.WeusedthesameJavaScriptthatwehaveusedcurrentlytokeeptheserverbusy.

IntheMMSconsole,wehavesimilargraphsgivingthesedetailsasfollows:

Thefirstone,Opcounters,showsthenumberofoperationsexecutedasofaparticularpointintime.Thisshouldbesimilartowhatwesawusingthemongostatutility.Similarly,theoneontherightshowsusthepercentageoftimeforwhichaDBwaslocked.Thepreviousdropdownlistsoutthedatabasenames;wecanselectanappropriatedatabaseforwhichwewanttoseethestats.Again,thisstatisticcanbeseenusingthemongostatutility.Theonlydifferenceis,withthecommand-lineutility,weseethestatsasofthecurrenttimewhereashere,weseethehistoricalstatsaswell.

InMongoDB,indexesarestoredinB-trees,andthefollowinggraphshowsthenumberoftimestheB-treeindexwasaccessed,hit,andmissed.Attheminimum,theRAMshouldbeenoughtoaccommodatetheindexesforoptimumperformance;soinmetrics,themissesshouldbezeroorverylow.Ahighnumberofmissesresultsinapagefaultfortheindexandpossibly,additionalpagefaultsforthecorrespondingdata,ifthequeryisnotcovered;allitsdatacannotbesourcedfromtheindex,whichisadoubleblowforitsperformance.Onegoodpractice,wheneverquerying,istouseprojectionsandfetchonlythenecessaryfieldsfromthedocument.Thisishelpfulwheneverwehaveourselectedfieldspresentinanindex,inwhichcase,thequeryiscoveredandallthenecessarydataissourcedonlyfromtheindex.

Tofindoutmoreaboutcoveredindexes,refertotheCreatinganindexandviewingplansofqueriesrecipeinChapter2,Command-lineOperationsandIndexes.

Forbusyapplications,whenMongoDBacquiresalockonthedatabase,otherreadandwriteoperationsgetqueuedup.Ifthevolumesareveryhighwithmultiplewriteandreadoperationscontendingforlock,theoperationsqueueup.Untilversion2.4ofMongoDB,thelocksareatdatabaselevel;thus,evenifthewritesarehappeningonanothercollection,readoperationsonanycollectioninthatdatabasewillblock.Thisqueuingoperationaffectstheperformanceofthesystemandisagoodindicatorthatthedatamightneedtobeshardedacrosstoscalethesystem.

TipRemember,novalueisdefinedashighorlow;itisanacceptablevaluebasedonanapplicationtoapplicationbasis.

MongoDBflushesthedataimmediatelyfromthejournalandperiodicallyfromthedatafiletothedisk.Thefollowingmetricsgiveustheflushtimeperminuteatagivenpointintime.Iftheflushtakesupasignificantpercentageofthetimeperminute,wecansafelysaythatthewriteoperationsareformingabottleneckfortheperformance.

There’smore…WehaveseenmonitoringoftheMongoDBinstances/clusterinthisrecipe.However,settingupalertstobenotifiedwhencertainthresholdvaluesarecrossed,iswhatwestillhaven’tseen.Inthenextrecipe,wewillseehowtoachievethiswithasamplealert,whichissentoutoverane-mailwhenthepagefaultscrossapredeterminedvalue.

SeealsoMonitoringhardware,suchasCPUusage,isprettyuseful,andtheMMSconsoledoessupportthat.It,however,needsmunin-nodetobeinstalledtoenableCPUmonitoring.Refertohttp://mms.mongodb.com/help/monitoring/configuring/tosetupmunin-nodeandhardwaremonitoring.Toupdatethemonitoringagent,refertohttp://mms.mongodb.com/help/monitoring/tutorial/update-mms/.

SettingupmonitoringalertsonMMSInthepreviousrecipe,wesawhowwecanmonitorvariousmetricsfromtheMMSconsole.ThisisagreatwaytoseeallthestatsinoneplaceandgetanoverviewofthehealthoftheMongoDBinstancesandcluster.However,itisnotpossibletomonitorthesystemcontinuouslyforthesupportpersonnel,andtherehastobesomemechanismtoautomaticallysendoutalertsinthecaseofsomethresholdbeingexceeded.Inthisrecipe,wewillsetupanalertwheneverthepagefaultsexceed1000.

GettingreadyRefertotheMonitoringMongoDBinstancesonMMSrecipe.Thisistheonlyprerequisiteforthisrecipe.

Howtodoit…1. ClickontheActivityoptionfromtheleft-handsidemenuoptionsandthenclickon

AlertSettings.OntheAlertSettingspage,clickonAddAlert.2. Addanewalertforthehost,whichisaprimaryinstance,ifthepagefaultsexceeda

givennumber,whichis1000pagefaultsperminuteinourcase.Thenotificationwaschosentobee-mailinthiscase,andtheintervalafterwhichthealertwillbesentissetat10minutes.

3. ClickonSavetosavethealert.

Howitworks…Thestepswereprettysimple.WhatwedidwassuccessfullysetupMMSalertswhenthepagefaultsexceeded1000perminute.Aswesawinthepreviousrecipe,nofixedvalueisclassifiedashighorlow.Itissomethingthatisacceptableforyoursystem,whichshouldcomewithbenchmarkingthesystemduringthetestingphasesinyourenvironment.Similartopagefaults,thereisavastarrayofalertsthatcanbesetup.Onceanalertisraised,itwillbesentevery10minutes,aswehavesetuntiltheconditionforsendingthealertsisnotmet,which,inthiscase,isifthenumberofpagefaultsfallbelow1000orsomebodymanuallyacknowledgesthealertwhichmeansnoalertwillbesentfurtherforthatincident.

Asweseeinthefollowingscreenshot,thealertisopenandwecanacknowledgethealert:

OnclickingAcknowledge,thefollowingpopupwillletuschoosethedurationforwhichwewillacknowledge:

Thismeansthatforthisparticularincident,nomorealertswillbesentoutuntiltheselectedtimeperiodelapses.

TheopenalertscanbeviewedbyclickingontheActivitiesmenuoptionontheleft-hand

sideofthepage.

SeealsoVisithttp://www.mongodb.com/blog/post/five-mms-monitoring-alerts-keep-your-mongodb-deployment-trackforsomeoftheimportantalertsthatyoushouldsetupforyourdeployment

BackingupandrestoringdatainMongousingout-of-theboxtoolsInthisrecipe,wewilllookatsomebasicbackupandrestoreoperationsusingutilitiessuchasmongodumpandmongorestoretobackupandrestorefiles.

GettingreadyWewillbestartingasingleinstanceofmongod.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tostartaMongoinstanceandconnecttoitfromaMongoshell.Wewillneedsomedatatobackup;ifyoualreadyhavesomedatainyourtestdatabasethatwouldbefine,elsecreatesomefromthecountries.geo.jsonfileavailableinthecodebundle,usingthefollowingcommand:

$mongoimport-ccountries-dtest--dropcountries.geo.json

Howtodoit…1. Withthedatainthetestdatabase,executethefollowingcommand,assumingwe

wanttoexportthedatatoalocaldirectorycalleddumpinthecurrentdirectory:

$mongodump-odump-oplog-hlocalhost-port27017

Verifythatthereisdatainthedumpdirectory.Allfilesshouldbe.bsonfiles,onepercollection,intherespectivedatabasefoldercreated.

2. NowletusimportthedatabackintotheMongoDBserverusingthefollowingcommand.Thisisagainwithanassumptionthatwehavethedirectorydumpinthecurrentdirectorywiththerequired.bsonfilespresentinit.

mongorestore--drop-hlocalhost-port27017dump-oplogReplay

Howitworks…Weexecutedjustacoupleofstepstoexportandrestorethedata.Letusnowseeexactlywhatitdoesandwhatthecommand-lineoptionsforthisutilityare.Themongodumputilityisusedtoexportthedatabaseinto.bsonfiles,whichcanlaterbeusedtorestorethedatainthedatabase.Theexportutilityexportsonefolderperdatabase,exceptthelocaldatabase,andtheneachofthemwillhaveone.bsonfilepercollection.Inourcaseweusedthe-oplogoptiontoexportapartoftheoplogaswell,andthedatawillbeexportedtotheoplog.bsonfile.Similarly,weimportthedatabackintothedatabaseusingthemongorestoreutility.Weexplicitlyasktheexistingdatatobedroppedbyprovidingthe--dropoptionbeforetheimportandreplayofthecontentsintheoplog,ifany.

Themongodumputilitysimplyqueriesthecollectionandexportsthecontentstothefiles.Thebiggerthecollection,themorewillbethetimetakentorestorethecontents.Itisthusadvisabletopreventthewriteoperationswhenthedumpisbeingtaken.Incaseofshardedenvironments,thebalancershouldbeturnedoff.Ifthedumpistakenwhilethesystemisrunning,exportitwiththe-oplogoptiontoexportthecontentsoftheoplogaswell.Thisoplogcanthenbeusedtorestorethepoint-in-timedata.Thefollowingaresomeoftheimportantoptionsavailableforthemongodumpandmongorestoreutilities,firstformongodump.

Option Description

--help Thisshowsallthepossiblesupportedoptionsandabriefdescriptionofthoseoptions.

-hor--host

Thisisthehostthatmustbeconnectedto.Bydefault,itislocalhostonport27017.Ifastandaloneinstanceistobeconnectedto,wecangivethehostnameas<hostname>:<portnumber>.Forareplicaset,theformatwillbe<replicasetname>/<hostname>:<port>,….<hostname>:<port>,wherethecomma-separatedlistofhostnamesandportsiscalledtheseedlist,whichcancontainallorasubsetofhostnamesinareplicaset.

--portThisistheportnumberofthetargetMongoDBinstance.Itisnotreallyrelevantiftheportnumberisprovidedintheprevious-hor--hostoption.

-uor--usernameThisprovidestheusernameoftheuser,usingwhichthedatawouldbeexported.Asthedataisreadfromalldatabases,theuserisatleastexpectedtohavereadprivilegesinalldatabases.

-por--password Thisisthepasswordusedinconjunctionwiththeusername.

--

authenticationDatabase

Thisisthedatabaseinwhichtheusercredentialsarekept;ifnotspecified,thedatabasespecifiedinthe--dboptionisused.

-dor--db Thisisthedatabasetobackup.Ifnotspecified,thenallthedatabasesareexported.

-cor--collection Thisisthecollectioninthedatabasetobeexported.

-oor--out Thisisthedirectorytowhichthefileswillbeexported.Bydefault,theutilitywillcreateadumpfolderinthecurrentdirectoryandexportthecontentstothatdirectory.

--dbpath

Thevalueisthedirectorywherethedatabasefileswillbefound.UsethisoptiononlywhenweintendnottoconnecttoarunningMongoDBinstancebutwritetothedatabasefilesdirectly.Theservershouldnotbeupandrunningwhilereadingdirectlyfromthedatabasefiles,astheexportlocksthedatafiles,whichcan’thappenifaserverisupandrunning.Alockfilewillbecreatedinthedirectorywhilethelockisacquired.

--oplog

Withtheoptionenabled,thedatafromtheoplogfromthetimetheexportprocessstartedisalsoexported.Withoutthisoptionenabled,thedataintheexportwillnotrepresentasinglepointintimeifwritesarehappeninginparallel,astheexportprocesscantakefewhoursanditsimplyisaqueryoperationonallthecollections.Exportingtheoploggivesanoptiontorestoreapoint-in-timedata.Thereisnoneedtospecifythisoptionifyouarepreventingwriteoperationswhiletheexportisinprogress.

Similarly,forthemongorestoreutility,theoptionsareasfollows.Themeaningoftheoptions--help,-hor--host,--port,-uor--username,-por--password,--authenticationDatabase,-dor--db,-cor–collectionissameasincaseofmongodump:

Option Description

--dbpath

Thevalueisthedirectorywherethedatabasefileswillbefound.UsethisoptiononlywhenweintendnottoconnecttoarunningMongoDBinstancebutwritetothedatabasefilesdirectly.Theservershouldnotbeupandrunningwhilewritingdirectlytothedatabasefiles,astherestoreoperationlocksthedatafiles,whichcan’thappenifaserverisupandrunning.Alockfilewillbecreatedinthedirectorywhilethelockisacquired.

--drop Droptheexistingdatainthecollectionbeforerestoringthedatafromtheexporteddumps.

--

oplogReplay

Ifthedatawasexportedwhilewritestothedatabasewereallowed,andifthe--oplogoptionwasenabledduringexport,theoplogexportedwillbereplayedonthedatatobringtheentiredatainthedatabasetothesamepointintime.

--

oplogLimit

Thevalueofthisparameterisanumberrepresentingthetimeinseconds.ThisoptionisusedinconjunctionwiththeoplogReplaycommand-lineoption,whichisusedtotelltherestoreutilitytoreplaytheoplogandstopjustatthelimitspecifiedbythisoption.

Onemighteventhink“whynotcopythefilesandtakeabackup?”.Thatworkswell,butthereareafewproblemsassociatedwithit.Thefirstbeing,youcannotgetapoint-in-timebackupunlessthewriteoperationsaredisabledandsecondly,thespaceusedforbackupsisveryhigh,asthecopywouldalsocopythezero-paddedfilesofthedatabase,asagainstthemongodumputilitythatexportsjustthedata.

Havingsaidthat,filesystemsnapshottingisacommonlyusedpracticeforbackups.Onethingtorememberisthat,whiletakingthesnapshot,thejournalfilesandthedatafilesneedtocomeinthesamesnapshotforconsistency.

ConfiguringtheMMSbackupserviceMMSbackupisarelativelynewofferingbyMongoDBforreal-timeincrementalbackupofyourMongoDBinstances,replicasets,andshards,anditoffersyoupoint-in-timerecoveryforyourinstances.Theserviceisavailableason-prem(inyourdatacenter)orcloud.Wewill,however,bedemonstratingtheon-cloudservice,whichistheonlyoptionforthecommunityandbasicsubscriptions.Formoredetailsontheavailableoptions,youcanrefertothedifferentproductofferingsbyMongoDBathttps://www.mongodb.com/products/subscriptions.

GettingreadyTheMongoMMSbackupservicewillworkonlyonMongo2.0andabove.Wewillstartasingleserverthatwewouldbackup.MMSbackupreliesontheoplogforcontinuousbackup,andasoplogisavailableonlyinreplicasets,theserverneedstobestartedasareplicaset.RefertotheInstallingPyMongorecipeinChapter3,ProgrammingLanguageDrivers,toknowmoreabouthowtoinstallPythonandthePythonclientofMongo,PyMongo.

Howtodoit…1. Ifyoudon’thaveanMMSaccountalready,thenlogintohttps://mms.mongodb.com/

andsignupforanaccount.Forscreenshots,refertotheSigningupforMMSandsettinguptheMMSmonitoringagentrecipe.

2. StartasingleinstanceofMongobyreplacingthevalueoftheappropriatefilesystempathonyourmachineasfollows:

$mongod--replSettestBackup--smallfiles--oplogSize50--dbpath

/data/mongo/db

NotethatsmallfilesandoplogSizeareoptionssetonlyforthepurposeoftestingandtheyarenottobeusedinproduction.

3. Startashell,connecttothisstartedinstance,andinitiatethereplicasetasfollows:

>rs.initiate()

Thereplicasetwillbeupandrunninginsometime.

4. Gobacktothebrowserandpointtomms.mongodb.com.Addanewhostbyclickingonthe+AddHostbutton.Selectthetypeasreplicasetandhostnameasyourhostnameandthedefaultport(27017inourcase).RefertotheSigningupforMMSandsettinguptheMMSmonitoringagentrecipeforthescreenshotsoftheaddhostprocess.

5. Oncethehostissuccessfullyadded,registerforMMSbackupbyclickingontheBackupoptionontheleft-handsideandthenonBeginSetup.

6. AnSMSorGoogleAuthenticatorcanbeusedforregistration.IfasmartphoneisavailablewithAndroid,iOS,orBlackberryOS,GoogleAuthenticatorisagoodoption.ForsomecountriessuchasIndia,GoogleAuthenticatoristheonlyoptionavailable.

7. AssumingGoogleAuthenticatorisnotconfiguredalreadyandweareplanningtouseit,wewouldneedtheapptobeinstalledonyoursmartphone.GototherespectiveappstoreofyourmobileOSplatformandinstalltheGoogleAuthenticatorsoftware.

8. Withthesoftwareinstalledonthephone,comebacktothebrowser.WeshouldseethefollowingscreenonselectingGoogleAuthenticator:

9. BeginthesetupforanewaccountbyscanningtheQRcodefromtheGoogleAuthenticatorapplication.Ifbarcodescanningisaproblem,youmaychoosetomanuallyenterthekeygivenontheright-handsideofthescreen.

10. Oncethescanningiscompletedorthekeyisenteredsuccessfully,yoursmartphoneshouldshowasix-digitnumberthatchangesevery30seconds.EnterthatnumberintheAuthenticationCodeboxgivenonthescreen.

NoteItisimportantnottodeletethisaccountinGoogleAuthenticatoronyourphone,asthiswouldbeusedinfuturewheneverwewishtochangeanysettingsrelatedtobackup,suchasstoppingbackup,changingtheexclusionlist,andliterallyanyoperationinMMSbackup.TheQRcodeandkeywouldnotbevisibleagainoncethesetupisdone.YouwouldhavetocontactMongoDBsupporttogettheconfigurationreset.

11. Oncetheauthenticationisdone,thenextscreenyoushouldseeisforthebillingaddressandbillingdetails,suchasthecardyouregister.AllchargesbelowUSD5arewaivedoff,soyoushouldbeoktotryoutasmalltestinstancebeforebeingcharged.

12. Oncethecreditcarddetailsaresaved,wemoveaheadwiththesetup.Wewillhavetoinstallabackupagent;thisisaseparateagentfromthemonitoringagent.Choosethe

appropriateplatformandfollowtheinstructionsforitsinstallation.Takenoteofthelocationwheretheconfigurationfilesoftheagentwillbeplaced.

13. Anewpopupwillcontaintheinstructions/linktothearchive/installerfortheplatformandthestepstoinstall.ItshouldalsocontaintheapiKey.TakenoteofthatAPIkey,whichwewillneedinthenextstep.

14. Oncetheinstallationiscomplete,openthelocal.configfileplacedintheconfigdirectoryoftheagentinstallation(thelocationthatwasshown/modifiedduringtheinstallationoftheagent)andpaste/typeintheapiKeynoteddowninthepreviousstep.

15. Oncetheagentisconfiguredandstarted,clickontheVerifyAgentbutton.16. Oncetheagentissuccessfullyverified,weshouldstartbyaddingahosttobackup.

Thedropdownshouldshowusallthereplicasetsandshardswehaveadded.Selecttheappropriateoneandfillthesyncsourceastheprimaryinstance,asthatistheonlyonewehaveinourstandaloneinstance.Syncsourceisonlyusedfortheinitialsyncprocess.Wheneverwehaveaproperreplicasetwithmultipleinstances,itispreferabletouseasecondaryasasync-processinstance.

Astheinstanceisnotstartedwithsecurity,leavetheDBUsernameandPasswordfieldsblank.

17. ClickontheManageexcludednamespacesbuttonifyouwishtoskipaparticulardatabaseorcollectionbeingbackedup.Ifnothingisprovided,bydefault,everythingwillbebackedup.Theformatforthecollectionnamewouldbe<databasename>.<collectionname>.Alternatively,itcouldbejustthedatabasename,inwhichcaseallcollectionsinthatdatabasewouldnotbeeligibleforbackup.

18. Oncethedetailsareallok,clickontheStartbutton.ThisshouldcompletethesetupofthebackupprocessforareplicasetonMMS.

TipTheinstallationstepsIperformedwereonWindowsOS,andtheserviceneedstobestartedmanuallyinthatcase.PresstheWindowsbutton+Randtypeservices.msc.ThenameoftheserviceisMMSBackupAgent.

Howitworks…ThestepsareprettysimpleandthisisallweneedtodotosetupaserverforMongoMMSbackup.OneimportantthingmentionedearlieristhatMMSbackupusesmultifactorauthenticationforanyoperationoncethebackupissetup;theaccountsetupinGoogleAuthenticatorforMongoDBshouldnotbedeleted.Thereisnowaytorecovertheoriginalkeyusedtosetuptheauthenticator.YouwillhavetocleartheGoogleAuthenticatorsettingsandsetupanewkey.Todothat,clickontheHelp&SupportlinkatthebottomleftofthescreenandclickonHowdoIresetmytwo-factorauthentication?.

Onclickingthelink,anewwindowwillopenup,asshowninthefollowingscreenshot,whichwillaskfortheusername.Ane-mailwillbesentouttotheregisterede-mailID,whichallowsyoutoresetthetwo-factorauthentication.

Asmentioned,oplogisusedtosynchronizethecurrentMongoDBinstancewiththeMMSservice.However,fortheinitialsync,aninstance’sdatafilesareused.Whichinstancetouseisprovidedbyuswhenwesetupthebackupofthereplicaset.Asthisisaresource-heavyoperation,wemustpreferablyuseasecondaryinstanceforthisonbusysystemssoasnottoaddmorequeryingontheprimaryinstancebytheMMSbackupagent.Oncetheinstanceisdonewithinitialsynchronization,theoplogoftheprimarywillbeusedtogetdataonacontinuousbasis.Theagentdoeswriteperiodicallytoacollectioncalledmms.backupintheadmindatabase.

ThebackupagentforMMSbackupisdifferentfromtheMMSmonitoringagent.Thoughthereisnorestrictiononhavingthembothrunonthesamemachine,youmightneedtoevaluatethatbeforehavingsuchasetupinproduction.Asafebetwouldbetohavethemrunningonseparatemachines.Neverruneitheroftheseagentswithamongodormongosinstanceonthesameboxinproduction.Thereareacoupleofimportantreasonswhyitisnotrecommendedtoruntheagentsonthesameboxasthemongodinstances.Theyareasfollows:

Theresourceutilizationoftheagentisdependentontheclustersizeitmonitors.Wedon’twanttheagenttousealotofresourcesaffectingtheperformanceoftheproductioninstance.Theagentcouldbemonitoringalotofserverinstancesatonetime.Asthereisonly

oneinstanceofthisagent,wedonotwantittogodownduringthedatabaseservermaintenanceandrestart.

ThecommunityeditionofMongoDBbuiltwithSSLortheEnterpriseversions,withtheSSLoptionusedforcommunicationbetweentheclientandtheMongoDBserver,mustperformsomeadditionalsteps.ThefirststepistochecktheMydeploymentsupportsSSLforMongoDBconnectionsflagwhenwesetupthereplicasetforbackup(seestep16).Notethecheckboxatthebottomofthescreenshot;itshouldbechecked.Secondly,openthelocal.configfilefortheMMSconfigurationandlookoutforthefollowingtwoproperties:

sslTrustedServerCertificates=

sslRequireValidServerCertificates=true

Thefirstisafully-qualifiedpathofthecertifyingauthority’scertificateinthePEMformat.ThiscertificatewillbeusedtoverifythecertificatepresentedbythemongodinstancerunningoverSSL.Thesecondpropertycanbesettofalseifthecertificateverificationistobedisabled;thisishowevernotarecommendedoption.AsfarasthetrafficbetweenthebackupagentandMMSbackupisconcerned,datasentfromtheagenttotheMMSserviceoverSSLissecure,irrespectiveofwhetherSSLisenabledonyourMongoDBinstancesornot.Thedataatrestinthedatacenterforthebackedupdataisnotencrypted.

Ifsecurityisenabledonthemongodinstance,ausernameandpasswordneedstobeprovided,whichwillbeusedbytheMMSbackupagent.Theusernameandpasswordareprovidedwhilesettingupbackupforthereplicaset,asseeninstep16.

Astheagentneedstoreadtheoplog,possiblyalldatabasesfortheinitialsyncandwritedatatotheadmindatabase;therolesexpectedfromtheuserarereadAnyDatabase,clusterAdmin,readWriteonadminandlocaldatabase,anduserAdminAnyDatabasedatabaseroleinthecaseofversion2.4andabove.Inversionspriorto2.4,wewouldexpecttheusertohavereadaccessonallthedatabasesandread/writeaccesstoadminandlocaldatabases.

Whilesettingupareplicasetforbackupyoumaygetanerrorsuchas,Insufficientoplogsize:Theoplogwindowmustbeatleast1hoursoverthelast24hoursforallactivereplicasetmembers.Pleaseincreasetheoplog.Whileyoumaythinkthisisalwayssomethingtodowithoplogsize,itisalsoseenwhenthereplicasethasaninstancethatisinarecoverystate.Thismightfeelmisleading,sodolookoutforrecoveringnodes,ifany,inthereplicaset,whilesettingupabackupforareplicaset.AspertheMMSsupport,itseemstoorestrictivetonotletsetupareplicasetforbackupwithsomerecoveringnodesanditmightbefixedinfuture.

ManagingbackupsintheMMSbackupserviceInthepreviousrecipe,welearnedhowtosetuptheMMSbackupserviceandasimpleone-memberreplicasetwassetupforbackup.Thoughasinglememberreplicasetmakesnosenseatall,itwasneeded,asastandaloneinstancecannotbesetupforbackupinMMS.Inthisrecipe,wedivedeeperandlookattheoperationswecanperformontheserverthatissetupforbackup,suchasstarting,stopping,orterminatingabackup;managingexclusionlists;managingbackupsnapshots;andretainingandrestoringtopoint-in-timedata.

GettingreadyThepreviousrecipeisallthatisneededtobefollowedforthisrecipe.Thenecessarysetupdescribedinitisexpectedtobedone,aswearegoingtousethesameserverwehadsetupforbackupinthatrecipe.

Howtodoit…1. Withtheserverupandrunning,let’simportsomedatainit.Itcanbeanything,but

wechosetousethecountries.geo.jsonfilethatwasusedinthepreviouschapter.ItshouldbeavailableinthebundledownloadedfromthePacktPublishingwebsite.

Startbyimportingthedataintoacollectioncalledcountriesinthetestdatabase.Usethefollowingcommandtodoit.Thefollowingimportcommandwasexecutedwiththecurrentdirectoryhavingthecountries.geo.jsonfile:

$mongoimport-ccountries-dtest--dropcountries.geo.json

2. Wehavealreadyseenhowtoexcludenamespaceswhenthereplicasetbackupwasbeingsetup.Wewillnowseehowtoexcludenamespacesoncethebackupforareplicasetisdone.ClickontheBackupmenuoptionontheleftandthenonReplicaSetStatus,whichopensbydefaultwhenBackupisclicked.Clickonthegearbuttonontheright-handsideoftherowwherethereplicasetisshown.Itshouldlookasfollows:

3. Asshowninthepreviousscreenshot,clickontheEditExcludedNamespacesoptionandtypeinthenameofthecollectionthatwewanttoexclude.SupposewewanttoexcludetheapplicationLogscollectioninthetestdatabase,typeintest.applicationLogs.

4. Onsavingit,youwillbeaskedtoenterthetokencodethatiscurrentlydisplayedonyourGoogleAuthenticator.

5. Onsuccessfulvalidationofthecode,thenamespacetest.applicationLogswillbeaddedtothelistofnamespacesexcludedfrombeingbackedup.

6. Weshallnowseehowtomanagethesnapshotscheduling.Snapshotisthestateofthedatabaseasofaparticularpointintime.Tomanagethesnapshotfrequencyandretentionpolicy,clickonthegearbuttonshowninstep2andclickonEditSnapshotSchedule.

7. Asseeninthefollowingscreenshot,wecansetthetimeswhenthesnapshotsaretakenandtheirretentionperiod.Moreonthiswillbecoveredinthenextsection.Anychangestoitwouldneedmultifactorauthenticationtosavethechanges.

8. WewillnowlookathowwegoaboutrestoringthedatausingMMSbackup.Atanypointintimewheneverwewanttorestorethedata,clickonBackupandtheReplicaSetStatus/ShardClusterStatusandthenclickontheset/clustername.

9. Onclickingit,wewillseethesnapshotsthataresavedagainstthisset.Itshouldlooksomethinglikewhatisseeninthefollowingscreenshot:

Wehaveencircledsomeoftheportionsonthescreen,whichwewillseeonebyone.

10. Torestoreasofatimewhenthesnapshotwastaken,clickontheRestorethissnapshotlinkintheACTIONScolumnofthegrid.

11. Thepreviousscreenshotshowsushowwecanexportthedata,eitheroverHTTPSorSCP.WeselectPullviaSecureHTTP(HTTPS)fornow,andclickonAuthenticate.WewillseeaboutSCPinthenextsection.

12. EnterthetokenthatisreceivedeitheroverSMSorseenonGoogleAuthenticator,andclickonFinalizeRequestonenteringtheauthcode.

13. Onsuccessfulauthentication,clickonRestoreJobsasshowninthefollowingscreenshot.Thisisaone-timedownloadthatwillletyoudownloadthetar.gzarchive.Clickonthedownloadlinktodownloadthetar.gzarchive.

14. Oncethearchiveisdownloaded,extractittogetthedatabasefileswithinit.15. Stopthemongodinstance,replacethedatabasefileswiththeonesthatareextracted,

andrestarttheservertogetthedataasofthetimewhenthesnapshotwastaken.Notethatthedatabasefilewillnotcontaindataforthecollectionthatwasexcludedfrombackupifatall.

Wewillnowseehowtogetthepoint-in-timedatausingMMSbackup:

1. ClickonReplicaSetStatusorShardClusterStatusandthenonthecluster/setthat

istoberestored.

1. Ontheright-handsideofthescreen,clickontheRestorebutton.2. Thisshouldgivealistofavailablesnapshots,oryoumayenteracustomtime.

ChecktheUseCustomPointInTimecheckbox.ClickontheDatefieldandselectadateandatimetowhichyouwanttorestorethedatato,inhoursandminutes,andclickonNext.NotethatthePointinTimefeatureonlyrestorestoapointinthelast24hours.

HereyouwouldbeaskedtheHTTPSorSCPformat.Subsequentstepsaresimilartowhatwedidonapreviousoccasionstep14and15onwards.

Howitworks…Afterthebackupforareplicasetwassetup,wefirstimportedsomerandomdataintothetestdatabasesothatwecanexpectthattobesenttotheMMSbackupservicethatwewouldrestoreatalaterpointintime.Wesawhowtoexcludenamespacesfrombeingbackedupinsteps2,3,4,and5.

Now,lookingatthesnapshotandretentionpolicysettings,wecanseewehavethechoiceofthetimeintervalinwhichthesnapshotsaretobetakenandthenumberofdaysforwhichtheyaretoberetained(step9).Wecanseethat,bydefault,snapshotsaretakenevery6hoursandtheyaresavedfor2days.Thesnapshotthatistakenattheendofthedaygetssavedforaweek,whichis7days.Thesnapshottakenattheendoftheweekandmonthissavedfor4weeksand13monthsrespectively.Asnapshotcanbetakenonceevery6,8,12,and24hours.However,oneneedstounderstandtheflipsideoftakingsnapshotsafterlongtimedurations.Supposethelastsnapshotistakenat18:00hours,gettingthedataasof18:00hoursforrestoreisveryeasy,asitisstoredontheMMSbackupservers.However,weneedthedataasof21:30hoursforrestoration.AsMMSbackupsupportspoint-in-timebackup,itwouldusethebasesnapshotas18:00hoursandthenjustreplaythechangesonitafterthesnapshotistaken,till21:30hours.Thisreplayingissimilartohowanoplogwouldbereplayedonthedata.Thereisacostforthisreplayandthus,gettingpoint-in-timebackupisslightlymoreexpensivethangettingthedatafromasnapshot.Herewehadtoreplaythedatafor3.5hours,from18:00hoursto21:30hours.Imagineifthesnapshotsweresettobetakenafter12hoursandourfirstsnapshotwastakenat00:00hours;wewouldhavesnapshotsat00:00hoursand12:00hourseveryday.Torestorethedataasof21:30hours,with12:00hoursasthelastsnapshot,wewillhavetoreplay9.5hoursofdata,whichismuchmoreexpensive.Morefrequentsnapshotsmeansmorestoragespaceusagebutlesstimeneededtorestoreadatabasetoagivenpointintime.Atthesametime,lessfrequentsnapshotsrequirelessstoragebutatthecostofmoretimetorestorethedatatoapointintime.Youneedtodecideandhaveatrade-offbetweenthesetwo,spaceandtimeforrestoration.Forthedailysnapshot,wecanchoosetheretentionfrom3–180days.Similarly,fortheweeklyandmonthlysnapshots,theretentionperiodcanbechosenbetween1–52weeksand1–36months,respectively.

Thescreenshotinstep9hasacolumnfortheexpiryofthesnapshot.Forthefirstsnapshottakenitisis1year,whereasothersexpirein2days.Theexpirationisasperwhatwediscussedearlier.Onchangingtheexpirationvalues,theoldsnapshotsarenotaffectedoradjustedasperthechangedtimes.Thenewsnapshotstakenwillhoweverbeasperthemodifiedsettingsforretentionandfrequency.

Wesawhowtodownloadthedump(step10onwards)andthenuseittorestorethedatainthedatabase.Itwasprettystraightforwardanddoesn’tneedalotofexplanation,exceptforafewthings.First,ifthedataisforashard,therewillbemultiplefolders,oneforeachshard;andeachofthemwillhavethedatabasefilesasagainstwhatwesawhereinthecaseofareplicaset,wherewehaveasinglefolderwithdatabasefilesinit.

Finally,letuslookatthescreenwhenwechooseSCPastheoption:

SCPisforsecurecopy.Thefileswillbecopiedoverasecurechanneltoamachine’sfilesystem.ThehostthatisgivenneedstohaveapublicIP,whichwillbeusedtoSCPthefiles.ThismakesalotofsensewhenwewantthedatafromMMStobedeliveredtoamachinerunningonUnixOSonthecloud,sayoneoftheAWSvirtualinstances.RatherthangettingthefileusingHTTPSonourlocalmachineandthenreuploadingittotheserveronthecloud,youcanspecifythelocationwherethedataneedstobecopiedintheTargetDirectoryblock,thehostname,andthecredentials.Thereareacoupleofwaysforauthenticationaswell;apasswordisaneasywaywithanadditionaloptiontoSSHkeypair.IfyouhavetoconfigurethefirewallsofyourhostonthecloudtoallowincomingtrafficovertheSSHport,thepublicIPaddressesaregivenatthebottomofthescreen(64.70.114.115/32or4.71.186.0/24inourscreenshot),whichyoushouldwhitelisttoallowincomingsecurecopyrequestoverport22.

SeealsoWehaveseenrunningbackupsusingMMS,whichusesoplogsforthispurpose.TheImplementingtriggersinMongousingoplogrecipeinChapter5,AdvancedOperations,usesoplogtoimplementtrigger-likefunctionalities.Thisconceptisthebackboneofthereal-timebackupusedbytheMMSbackupservice.

Chapter7.CloudDeploymentonMongoDBInthischapter,wewillcoverthefollowingrecipes:

SettingupandmanagingtheMongoLabaccountSettingupasandboxMongoDBinstanceonMongoLabPerformingoperationsonMongoDBfromtheMongoLabGUISettingupMongoDBonAmazonEC2usingtheMongoDBAMISettingupMongoDBonAmazonEC2withoutusingtheMongoDBAMI

IntroductionThoughexplainingcloudcomputingisnotinthescopeofthisbook,Iwillexplainitinjustoneparagraph.Anybusiness,bigorsmall,needshardwareinfrastructureanddifferentsoftwareinstalledonit.Anoperatingsystemisthebasicsoftwareneeded,alongwithdifferentservers(fromasoftwareperspective)forstorage,mail,Web,database,DNS,andsoon.Thelistofsoftwareframeworks/platformsneededmightendupbeinglarge.Thepointofinteresthereisthattheinitialbudgetforthishardwareandsoftwareplatformishigh;wearenotevenconsideringtherealestateneededtohostit.ThisiswherecloudcomputingproviderssuchasAmazon,Rackspace,Google,andMicrosoftcomeintoplay.Theyhavehostedhigh-endhardwareandsoftwareindifferentdatacentersacrosstheglobeandletuschoosefromdifferentconfigurationstostartaninstance.Then,thisisaccessedremotelyoverthepublicnetworkformanagementpurposes.Literally,alloursetupisdoneinthecloudprovider’sdatacenter,andwejustpayasweuse.Shutdowntheinstanceandyoustoppayingforit.Notonlysmallstart-upsbutlargeenterprisesalsooftentemporarilyfallbacktocloudserversforatemporaryriseinthecomputingresourcedemand.Thepricesofferedbytheprovidersareverycompetitivetoo;particularly,AmazonWebService(AWS)ofalloftheminmyopinionanditspopularitysaysitall.

Thewikipageathttp://en.wikipedia.org/wiki/Cloud_computinghasalotofdetail,perhapsabittoomuchforsomeonenewtotheconcept,butitisagoodread,nevertheless.Thearticleathttp://computer.howstuffworks.com/cloud-computing/cloud-computing.htmisprettygoodandIrecommendedthatyoureaditifyouarenotawareoftheconceptofcloudcomputing.

Inthischapter,wewillsetupMongoDBinstancesonthecloudusingMongoDBserviceprovidersandthen,byourselvesonAWS.

SettingupandmanagingtheMongoLabaccountInthisrecipe,wewillbeevaluatingoneofthevendors,MongoLab,providingMongoDBasaservice.ThisintroductoryrecipewillintroducetoyouwhatMongoDBasaserviceis,andthenitwilldemonstratehowtosetupandmanageanaccountinMongoLab(https://mongolab.com/).

Inalltherecipesinthisbook,wehavecoveredsettingup,administering,monitoring,anddevelopingtheinstancesofMongoDBintheorganizations/personalpremisessofar.Thisnotonlyneedsmanpowerwiththeappropriateskillsettomanagethedeployments,butalsoappropriatehardwaretoinstallandrunMongoservers.Thisneedslargeinvestmentsupfrontthatmightnotbeaviablesolutionforstart-upsorevenorganizationsthatarenotclearaboutadoptingthistechnologyormigratingtoit.Theymightwanttoevaluateitandseehowitgoesbeforemovingfullfledgedtothissolution.WhatwouldbeidealistohaveaserviceproviderthattakescareofhostingtheMongoDBdeployments,managing,andmonitoringthedeployments,andprovidingsupport.Theorganizationsthatoptfortheseservicesneednotinvestupfrontinsettinguptheserversnorrecruitoroutsourcetoconsultantsfortheadministrationandmonitoringoftheinstances.Allthatoneneedstodoischoosethehardwareandsoftwareplatform,configuration,andtheappropriateMongoDBversion,andsetupanenvironmentfromauser-friendlyGUI.Itevengivesyouanoptiontouseyourexistingcloudprovider’sservers.

Havingexplainedinbriefwhatthesevendor-hostingservicesdoandwhytheyareneeded,wewillstartthisrecipebysettingupanaccountwithMongoLabandseesomebasicuserandaccountmanagement.MongoLabisbynomeanstheonlyhostingproviderforMongoDB.Youmightalsowanttotakealookathttp://www.mongohq.com/andhttp://www.objectrocket.com/.Atthetimeofwritingthisbook,MongoDBitselfstartedprovidingMongoDBasaserviceonAzurecloudandiscurrentlyinbetaphase.

Howtodoit…1. Visithttps://mongolab.com/signup/tosignup.Ifyoudon’thaveanaccountcreated,

justfillintherelevantdetailsandcreateanaccount.2. Oncetheaccountiscreated,clickontheAccountlinkinthetop-rightcornerofthe

page,asshowninthefollowingscreenshot:

3. ClickontheAccountUserstabinthetop-leftcorner;itshouldbeselectedbydefault:

4. Toaddanewaccount,clickonthe+Addaccountuserbutton.Onepop-upwindowwillaskfortheusername,e-mailID,andpasswordoftheuser.Entertherelevantdetails,andclickontheAddbutton.

5. Clickontheuser,andyouwillbeabletonavigatetoapagewhereyoucanchangetheusername,e-mailID,andpassword.YoumighttransfertheadminrightstotheuserbyclickingontheChangetoAdminbuttononthispage.

6. Similarly,byclickingonyourownuserdetails,youwillhavetheoptionstochangetheusername,e-mailID,andpassword.

7. ClickontheSetuptwo-factorauthenticationbuttontoactivatethemultifactorauthenticationusingGoogleAuthenticator.YouneedtohavetheGoogleAuthenticatorinstalledonyourAndroid,iOS,orBlackBerryphonetoproceedwiththesetupofmultifactorauthentication.

8. Onclickingonthebutton,weshouldseetheQRcodethatcanbescannedusingtheGoogleAuthenticator,orifscanningisnotpossible,clickontheURLunderneaththeQRcode,whichwillshowthecode.Manuallysetupatime-basedaccountintheGoogleAuthenticator.TherearetwotypesofGoogleAuthenticatoraccounts:time

basedandcounterbased.Formoredetails,visithttp://en.wikipedia.org/wiki/Google_Authenticator.

9. Similarly,youcandeleteusersfromtheAccountpagebyclickingonthecrossnexttotheuser’srowunderAccountUsers.

Howitworks…Thereisnothingmuchtoexplaininthissection.Thesetupprocessanduseradministrationareprettysimple.Notethattheusersweaddedherearenotdatabaseusers.ThesearetheusersthathaveaccesstoAccountonMongoLab,forwhichweaddedthem.Theaccountcanbethenameoftheorganizationandcanbeseenatthetopofthescreen.ThemultifactorauthenticationaccountsetupintheGoogleAuthenticatorsoftwareonthehandhelddeviceshouldnotbedeleted,aswhenevertheuserlogsintotheMongoLabaccountfromthebrowser,hewillbeaskedtoentertheGoogleAuthenticatoraccounttocontinue.

SettingupasandboxMongoDBinstanceonMongoLabInthepreviousrecipe,wesawhowtosetupanaccountonMongoLabandadduserstoyouraccount.Westillhaven’tseenhowtofireupaninstanceonthecloudanduseittoperformsomesimpleoperations.Inthisrecipe,thisisexactlywhatwewilldo.

GettingreadyRefertothepreviousrecipetosetupanaccountwithMongoLab.Wewillsetupafreesandboxinstance.WewillrequiresomewaytoconnecttothisstartedMongoinstance,andthus,wewillneedaMongoshell,whichcomesonlywiththecompleteMongoinstallation,oryoumightchoosetouseaprogramminglanguageofyourchoicetoconnecttothestartedMongoinstance.RefertoChapter3,ProgrammingLanguageDrivers,forrecipesonconnectingandperformingoperationsusingaJavaorPythonclient.

Howtodoit…1. Gotothehomepageathttps://mongolab.com/homeandclickontheCreatenew

button.2. Selectacloudprovider;forthisexample,wechoseAmazonWebServices.

3. ClickonSingle-node(development)andthenontheSandboxoption.Donotchangethelocationofthecloudserver,asthefreesandboxinstanceisnotavailableinalldatacenters.Sincethisissandboxweareokwithanylocation.

4. Addanynameforyourdatabase.ThenameIchoseismongolab-test.ClickonCreatenewMongoDBdeploymentafterenteringthename.

5. Thiswilltakeyoutothehomepage,andthedatabasewillnowbevisible.Clickontheinstancename.ThepagehereshowsthedetailsoftheMongoDBinstanceselected.Theinstructiontoconnectfromtheshellorprogramminglanguageisgivenatthetopofthepage,alongwiththepublichostnameofthestartedinstance.

6. ClickontheUserstabandthenontheAdddatabaseuserbutton.

7. Inthepop-upwindow,addtheusernameandpasswordastestUserandtestUser,respectively(oranyofyourchoice).

8. Withtheuseradded,starttheMongoshellasfollows,assumingthatthenameofthedatabaseismongolab-test,andtheusernameandpasswordistestUser:

$mongo<host-name>/mongolab-test–utestUser–ptestUser

9. Onconnecting,executethefollowingcommandintheshellandcheckifthedatabasenameismongolab-test:

>db

10. Insertonedocumentinacollectionasfollows:

>db.messages.insert({_id:1,message:'Hellomongolab'})

11. Querythecollectionasfollows:

>db.messages.findOne()

Howitworks…Thestepsexecutedareverysimple.Wecreatedonesharedsandboxinstanceinthecloud.MongoLabitselfdoesnothosttheinstancesbutusesoneofthecloudproviderstodothehosting.MongoLabdoesnotsupportsandboxinstancesforallproviders.Thestoragewiththesandboxinstanceis0.5GBandissharedwithotherinstancesonthesamemachine.Sharedinstancesarecheaperthanrunningonadedicatedinstance,butthepriceispaidinperformance.TheCPUandI/Oaresharedwithotherinstances,andthus,theperformanceofoursharedinstanceisnotnecessarilyinourcontrol.Foraproductionusecase,sharedinstanceisnotarecommendedoption.Similarly,weneedtosetupareplicasetwhenrunninginproduction.Ifwelookatthescreenshotinstep2,wewillseeanothertabnexttotheSingle-node(development)option.ThisiswhereyoumightchoosetheconfigurationforthemachineintermsofRAManddiskcapacity(andthepricetoo)andsetupareplicaset.

Asyoucansee,yougettochoosewhichversionofMongoDBtouse.EvenifanewversionofMongoDBgetsreleased,MongoLabwillnotstartsupportingitimmediately,asitusuallywaitsforafewminorversionstoberolledoutbeforesupportingthenewversionforproductionusers.Also,whenwechooseaconfiguration,thedefaultavailableoptionistwodatanodesandonearbiter,whichissufficientforthemajorityofusecases.

TheRAManddiskchosendependcompletelyonthenatureofthedataandhowquery/writeintensiveitis.Thissizingissomethingwedoirrespectiveofwhetherwearedeployingonourowninfrastructureoronthecloud.TheworkingsetissomethingthatisimportanttobeknownbeforewechoosetheRAMofthehardware.POCandexperiments

aredonetodealwithasubsetofdata,andthen,theestimationcanbedonefortheentiredataset.RefertotheEstimatingtheworkingsetrecipeinChapter4,Administration,toestimatetheworkingsetonyoursampledataset.IftheI/OactivityishighandlowI/Olatencyisdesired,youmightevenoptforSSD,aswesawintheprecedingscreenshot.Standaloneinstancesareasgoodasreplicasetsintermsofscalability,butnotintermsofavailability.Thus,wemightchoosestandaloneinstancesforsuchestimationanddevelopmentpurposes.Sharedinstances,bothfreeandpaid,aregoodcandidatesfordevelopmentpurposes.Notethatsharedinstancescannotberestartedondemandaswecanfordedicatedinstances.

Whatcloudproviderdowechoose?Ifyoualreadyhaveyourapplicationserversdeployedinthecloud,obviously,ithastobethesamevendorasyourexistingvendor.Itisrecommendedthatyouusethesamecloudvendorfortheapplicationserveranddatabase.Also,theyarebothdeployedonthesamelocationtominimizelatencyandimproveperformance.Ifyouarestartingafresh,theninvestsometimeinchoosingthecloudprovider.Lookatallotherservicesthattheapplicationwillneed,suchasthestorage,compute,andotherservicesincludinge-mails,notificationservices,andsoon.Allthisanalysisisoutsidethescopeofthisbook,butonceyouaredonewiththisandfinalizedwithaprovider,youmightaccordinglychoosetheprovidertouseinMongoLab.Asfaraspricinggoes,alltheleadingprovidersoffercompetitivepricing.

PerformingoperationsonMongoDBfromMongoLabGUIInthepreviousrecipe,wesawhowtosetupasimplesandboxinstanceforMongoDBinthecloudusingMongoLab.Inthisrecipe,we’llbuildonitandseewhatservicesMongoLabprovidesfromtheperspectivesofmanagement,administration,monitoring,andbackup.

GettingreadyRefertothepreviousrecipetoknowhowtosetupasandboxinstanceinthecloudusingMongoLab.

Howtodoit…1. Gotohttps://mongolab.com/home;youshouldseealistofdatabases,servers,and

clusters.Ifyouhavefollowedthepreviousrecipe,youwouldseeonestandalonedatabase,mongolab-test(orwhatevernameyouchoseforthedatabase).Clickonthedatabasename;thiswilltakeyoutothedatabasedetailspage.

2. OnclickingontheCollectionstab,whichshouldbeselectedbydefault,wewillseealistofcollectionspresentinthedatabase.Ifthepreviousrecipewasexecutedbeforethisone,youwouldseeonecollectionmessageinthedatabase.

3. Clickonthenameofthecollection,andwewillbenavigatedtothecollectiondetailspageasfollows:

4. ClickontheStatsoptiontoviewthestatsofthecollection.Exceptforwhetherthecollectionandthemaximumnumberofdocumentsinacollectionarecappedornot,thecontentscomeasaresultofthefollowingcommand:

db.<collectionName>.stats()

5. IntheDocumentstab,wecanquerythecollection.Bydefault,wewillseeallthedocumentswith10documentsshownperpage,whichcanbechangedfromtherecords/pagedrop-downlist.Amaximumvalueof100canbechosen.

6. Thereisanotherwaytoviewthedocuments,whichisasatable.ClickonthetableradiobuttoninDisplaymodeandclickonthe(edittableview)linktocreate/editthetableview.Inthepopupshown,enterthefollowingdocumentforthemessages

collectionandclickonSubmit:

{

"id":"_id",

"MessageText":"message"

}

Ondoingthis,thedisplaywillchangeasfollows:

7. Fromthe–Startnewsearch–drop-downlist,selectthe[newsearch]option,asshowninthefollowingscreenshot:

8. Withthenewquery,wewillseethefollowingfieldstoletusenterthequerystring,sortorder,andprojections.Enterthequeryas{"_id":1}andfieldsas{"message":1,"_id":0}.

9. YoumightchoosetosavethequerybyclickingontheSavethissearchbuttonandgiveanametothequerytobesaved.

10. Individualdocumentscanbedeletedbyclickingonthecrossnexttoeachrecord.Similarly,theDeleteallbuttonwilldeleteallthecontentsofthecollection.

11. Similarly,clickingon+Adddocumentwilldisplayaneditortotypeinthedocumentthatwillbeinsertedintothecollection.AsMongoDBisschemaless,thedocumentneednothaveafixedsetoffields;theapplicationshouldmakesenseoutofit.

12. Gotohttps://mongolab.com/databases/<yourdatabasename>(mongolab-testinthiscase),whichcanalsobereachedbyclickingonthedatabasenamefromthehomepage.

13. ClickontheStatstabnexttotheUserstab.Thecontentshowninthetableistheresultofthedb.stats()command.

14. Similarly,clickontheBackupstabatthetop,nexttotheStatstab.Here,wecanselectoptionstotakearecurringorone-timebackup.

15. WhenyouclickonSchedulerecurringbackup,youwillgetapop-upwindowthatwillletyouenterthedetailsofthescheduling,suchasthefrequencyofthebackup,thetimeofthedaywhenthebackupneedstobetaken,andthenumberofbackupstokeep.

16. ThebackuplocationcanbechosentobeeitherMongoLab’sownSimpleStorageService(S3)bucketortheRackspacecloudfile.Youmightchoosetouseyourownaccount’sstorage,inwhichcase,youwillhavetosharetheAWSaccesskey/secretkeyoruserID/APIkeyincaseofRackspace.

Howitworks…Steps1to5areprettystraightforward.Instep6,weprovidedaJSONdocumenttoshowtheresultsinatabularformat.Theformatofthedocumentisasfollows:

{

<displaycolumn1>:<nameofthefieldintheJSONdocument>,

<displaycolumn2>:<nameofthefieldintheJSONdocument>,

<displaycolumnn>:<nameofthefieldintheJSONdocument>

}

Thekeyisthenameofthecolumntodisplay,andthevalueisthenameofthefieldintheactualdocumentwhosevaluewillbeshownasthevalueofthatcolumn.Togetaclearunderstanding,lookatthedocumentdefinedforthemessagescollection,lookatthedocumentinthemessagescollection,andthentakealookatthedisplayedtabulardata.ThefollowingistheJSONdocumentweprovided;itstatesthenameofthecolumnasthevalueofthekeyandtheactualfieldinthedocumentasthevalueofthecolumn:

{

"id":"_id",

"MessageText":"message"

}

Also,notethatthefieldnameandvaluesoftheJSONdocumentshereareenclosedinquotes.TheMongoshellislenientinthesensethatitallowsustogivefieldnameswithoutquotes.

Ifweseestep16,wewillseethatthebackupsarestoredeitherinMongoLab’sAWSS3/RackspaceCloudFilesorinyourcustomAWSS3bucket/RackspaceCloudFiles.Inlattercases,youneedtoshareyourAWS/RackspacecredentialswithMongoLab.Ifthisisaconcernandthecredentialscanpotentiallybeusedtoaccessotherresources,itisrecommendedthatyoucreateaseparateaccountanduseitforbackuppurposesfromMongoLab.YoumightalsousethebackupcreatedtocreateanewMongoDBserverinstancefromMongoLab.Needlesstosay,ifyouhaveusedyourownAWSS3bucket/RackspaceCloudFiles,storagechargesareadditional;theyarenotapartofMongoLab’scharges.

Therearesomeimportantpointsworthmentioning.MongoLabprovidesaRESTAPIforvariousoperations.TheRESTAPItoocanbeusedinplaceofthestandarddriverstoperformCRUDoperations.However,usingMongoDBclientlibrariesistherecommendedapproach.OnegoodreasontousetheRESTAPIrightnowoverthelanguagedriverisiftheclientisconnectingtotheMongoDBserveroverpublicnetwork.TheshellwestartedonourlocalmachinethatconnectstotheMongoDBserveronthecloudsendsunencrypteddatatotheserver,whichmakesitvulnerable.Ontheotherhand,ifRESTAPIsareused,thetrafficissentoverasecurechannelasHTTPSisused.MongoLabplanstosupportasecurechannelforcommunicationbetweentheclientandtheserverinfuture,butatthetimeofwritingthisbook,itisnotavailable.Iftheapplicationanddatabaseareinthesamedatacenterofthecloudprovider,youaresafeanddependonthesecurity

providedbythecloudproviderfortheirlocalnetwork,whichgenerallyisnotaconcern.However,thereisnothingyoucandoforsecurecommunicationotherthanensuringthatyourdatadoesn’tgooverpublicnetworks.

OnemorescenariowhereMongoLabdoesn’tworkiswhenyouwanttheinstancestorunonyourowninstanceofavirtualmachineratherthanontheonechosenbyMongoLaborwhenwewanttheapplicationtobeinavirtualprivatecloud.CloudprovidersprovideservicessuchasAmazonVPC,whereapartoftheAWScloudcanbetreatedasapartofyournetwork.IfyouintendtodeployyourMongoDBinstanceinsuchanenvironment,MongoLabcannotbeused.

SettingupMongoDBonAmazonEC2usingtheMongoDBAMIIntheearlierfewrecipes,wesawhowtostartMongoDBinthecloudusingahostedserviceprovidedbyMongoLab,whichgaveanalternativetosetupMongoDBonallleadingcloudvendors.However,ifweplantohostandmonitortheinstanceourselvesforgreatercontrolorsetupwithinourownvirtualprivatecloud,wecandoitourselves.Thoughtheprocedurevariesfromcloudprovidertocloudprovider,wewilldemonstrateitusingAWS.Thereareacoupleofwaystodothis,butinthisrecipe,wewilldoitusingtheAmazonMachineImage(AMI).TheAMIisatemplatethatcontainsdetailssuchastheoperatingsystemandthesoftwarethatwillbeavailableonthestartedvirtualmachine.Allthisinformationwillbeusedwhilebootingupanewvirtualmachineinstanceonthecloud.ToknowmoreabouttheAMI,visithttp://en.wikipedia.org/wiki/Amazon_Machine_Image.

TalkingaboutAWS,ElasticCloudCompute(EC2)isaservicethatletsyoucreate,start,andstopserversofdifferentconfigurationsinthecloudthatrunonoperatingsystemsofyourchoice(thepricesdifferaccordingly).Similarly,AmazonElasticBlockStore(EBS)isaservicethatprovidespersistentblockstoragewithhighavailabilityandlowlatency.Initially,eachinstancehasastoreknownastheephemeralstoreattachedtoit.Thisisatemporarystore,andthedatamightbelostwhentheinstancerestarts.EBSblockstorageisthusattachedtotheEC2instancetomaintainpersistenceevenwhentheinstanceisstoppedandthenrestarted.StandardEBSdoesn’tpromiseaminimumguaranteefortheI/Ooperationspersecond(IOPS).Formoderateworkload,thedefaultofabout100IOPSisok.However,forhigh-performanceI/O,EBSblockswithguaranteedIOPSarealsoavailable.ThepricingismoreascomparedtothestandardEBSblock,butitisagoodoptiontooptforiflowIOratecanbeabottleneckintheperformanceofthesystem.

GettingreadyThefirstthingyouneedtodoissignupforanAWSaccount.Visithttp://aws.amazon.com/andclickonSignUp.LoginifyouhaveanAmazonaccount;otherwise,createanewone.Youwillhavetogiveyourcreditcarddetails,althoughtherecipeswehaveherewillusethefreemicroinstanceunlessweexplicitlymentionotherwise.WewillconnecttotheinstanceonthecloudusingPuTTY.YoucandownloadandinstallPuTTYonyourmachineifyouhavenotalreadydoneso.Itcanbedownloadedfromhttp://www.putty.org/.

FortheinstallationofusingAMI,wecannotusethemicroinstanceandwillhavetousetheminimumofstandardlarge.GetmoredetailsonthepricingofEC2instancesindifferentregionsathttps://aws.amazon.com/ec2/pricing/.Choosetheappropriateregionbasedonthegeographicalandfinancialfactors.

1. Thefirstthingyouneedtodoiscreateakeypairifyouhavenotalreadycreatedone.Steps1to5areonlyforthecreationofthekeypair.ThiskeypairwillbeusedtologintotheUnixinstancestartedinthecloudfromthePuTTYclient.Skiptostep6ifthekeypairisalreadycreatedandthe.pemfileisavailablewithyou.

2. Gotohttps://console.aws.amazon.com/ec2/andmakesuretheregionyouhaveinthetop-rightcorner(asshowninthefollowingscreenshot)isthesameastheoneinwhichyouareplanningtosetuptheinstance:

3. Oncetheregionisselected,thepagewiththeResourcesheadingwillshowalltheinstances,keypairs,IPaddresses,andsoonforthisregion.ClickontheKeyPairslink;thiswillnavigateyoutothepagewherealltheexistingkeypairswillbeshown,andyoucancreatenewones.

4. ClickontheCreateKeyPairbutton,andinthepop-upwindowtypeinanynameofyourchoice.Let’ssay,wecallitEC2TestKeyPair,andclickonCreate.

5. Oncethekeypairiscreated,a.pemfilewillbegenerated.Ensurethatthefileis

saved,asthiswillbeneededforsubsequentaccesstothemachine.6. Next,wewillconvertthis.pemfiletoa.ppkfiletobeusedwithPuTTY.7. StartPuTTYgen.Ifitisnotalreadyavailable,itcanbedownloadedfrom

http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html.8. Wewillseethefollowingscreenshot:

9. SelecttheSSH-2RSAoptionandclickontheLoadbutton.Inthefile,selectAllfilesandselectthe.pemfilethatwasdownloadedwhenthekeypairwasgeneratedintheEC2console.

10. Oncethe.pemfileisimported,clickontheSaveprivatekeyoptionandsavethefilewithanyname.Thistime,thefileisa.ppkfile.SavethisfiletologintotheEC2instancefromPuTTYinfuture.

Howtodoit…1. VisittheAmazonmarketplaceathttps://aws.amazon.com/marketplace/andsearch

forMongoDB,asshowninthefollowingscreenshot:

2. LookoutforAMIsbyMongoDB,asthesearetheofficialonessoldbyMongoDB.TherearedifferentAMIsavailablewithdifferentprovisionedI/Orates.Forthisexample,wewillchoosetheonewith1000IOPSofdata.

3. ClickonNameoftheImage,whichisaURL,andwewillbenavigatedtothedetailspage.Thefollowingportionofthedetailspageisofparticularourinterest.NoticetheHighlightssection.TherearethreeadditionalEBSvolumesthatarereservedandwillbeattachedtothisinstance:onewillbeusedfordata(withthehighestIOPS),oneforjournal,andoneforlogs(withthelowestIOPS).

4. ThepagewillalsoprovideinformationontheAMIandthepricing.ClickontheContinuebutton.

5. Onthispage,wewillreviewalltheconfigurationsfortheEC2instancethatwillbestarted.ThefirstoptionwillbetheMongoDBversion,whichwillbethelatestone,andthesecondoptionwillbetheAWSregion,whichisUSEastbydefault.Choosetheversionandregionifrequired.

6. Thenextoptionistoselecttheinstance.WewillchoosetheStandardLarge(m1.large)optionforourtest.LeavetheVPCsettingssettodefault.

7. ThenextsettingistheSecuritysettingsthatallowconnectionsfromtheentireWorldtothestartedinstanceofEC2.WewillchoosetousethesettingsrecommendedbythevendoroftheAMI.YouarefreetouseanyothersecuritypolicyifyouhavedefinedoneearlierinEC2.

8. Finally,selectthekeypair,theonewecreatedintheGettingreadysection.Oncedone,clickonAcceptTerms&Launchwith1-Click.

9. VisittheEC2consolefromthebrowserandclickonInstancesontheleft-handsidemenu.

10. Theinstancewilltakesometimetostart.Oncestarted,clickontheinstancenameinthelistofinstancestoseethepublicDNSandIPaddressatthebottomofthepage.CopythispublicDNS.

11. StartPuTTY,andclickontheAuthoptionunderConnection/SSH,asshowninthefollowingscreenshot:

12. Clickonthebrowserandloadthe.ppkfile,whichwasgeneratedearlierintheGettingreadysection.

13. Now,clickonSessionunderCategoryandenterthehostnamethatwascopiedinstep8.Theportwillremain22,asthisistheonlyopenportfromthepublicnetworktothisinstance.

14. Whenpromptedfortheuser,entertheuserasec2-userinPuTTY.Theprivatekeyloadedinsteps9and10willbeusedforauthentication,andyoudonotneedtoenterapassword.

15. Wewilluse/dataforthedataand/logstosavethelogs.Thesetwoareconfigurableparameters.Thejournalisalwayscreatedin/data/journalandisnotconfigurable.Refertostep3wherewementionedthattherearethreeEBSvolumesassociatedwiththisEC2instance.

16. Executethefollowingcommandtostartamongodinstancewithlogswrittentothemongo.logfileinthe/logsdirectoryandtheprocesstorunthebackground:

$sudomongod--logpath~/logs/mongo.log--smallfiles--oplogSize50--

fork

17. Now,startaMongoclientfromtheshellasfollows:

$mongo

18. ExecutethefollowingcommandfromtheMongoshellafterconnectingtotheMongoinstance:

>db.ec2Test.insert({_id:1,msg:'Hello,MyfirstMongoinstanceon

cloud'})

>db.ec2Test.find()

{"_id":1,"msg":"Hello,MyfirstMongoinstanceoncloud"}

>

Congratulations!Now,wehavesuccessfullystartedastandaloneMongoDBinstanceonanEC2instance.

Howitworks…Instep6,wesawhowtosetupthesecurityforthestartedinstance.WeconfiguredittojustallowincomingtrafficforSSHoverport22fromallhostsfromthepublicnetwork.Fortightersecurity,ratherthanallowingtrafficfromallhosts(0.0.0.0),wecanallowtrafficfromalimitedsetofIPaddresses.Let’sseeifwecanconnecttotheMongoDBinstancestartedoverthecloudfromtheMongoshellonthelocalmachine.Forthisactivity,wewillneedMongoDBsetuponthelocalmachine;ifnot,youmightjustreadthroughthecontentandunderstandtheconcept.

1. NotethepublicIPaddress/hostnameoftheinstancestartedinthecloudandenterthefollowingcommandonyourlocalmachine’scommandline:

$mongo--host<Publichostnameofthecloudinstance>

2. Wewillseethatthisoperationfailswiththefollowingexceptionontheconsole:

MongoDBshellversion:2.4.6

connectingto:ec2-54-87-4-215.compute-1.amazonaws.com:27017/test

SatMay0314:30:23.376Error:couldn'tconnecttoserverec2-54-87-4-

215.compute-1.amazonaws.com:27017atsrc/mongo/shell/mongo.js:147

exception:connectfailed

Thisissimplybecausetheincomingtrafficfromapublicnetworktothisserveroverport27017isblocked.Infact,alltraffic,exceptthatonport22,isblocked.

3. Wewillnowopenport27017forourcurrentIPaddress.Notethatthisisnotarecommendedapproachintheproductionenvironment;wearejustdoingthistotestconnectingtotheinstanceonthecloud.Instead,thecorrectwayistojustopentheSSHconnectiontothecloudinstanceandthenconnecttotheserverfromaclientrunoverthisinstance,aswedidintheprevioussection.

4. GototheEC2console,choosethecorrectregionatthetop,andthenclickontheSecurityGroupsmenuoptionontheleft-handside.Wewillseethesecuritygroupsdefinedasfollows:

5. Aswecansee,thereisagroupthatiscreatedwhenwestartedtheinstance.Clickon

thisgrouptoseethedetailsatthebottomofthescreenandawaytoedittherules.SelectthetypeasCustomTCPRule,portas27017,andsourceasMyIP/CustomIP,whereyoucanenteranyIPaddressorallIPaddresses.WewillchooseMyIPinthiscasefortestingpurposeandclickonSave,asshowninthefollowingscreenshot:

$mongo--host<Publichostnameofthecloudinstance>

6. Thistime,wewillbeabletoconnecttothisinstance.Now,connecttothisMongoDBinstancestartedinthecloudbytypinginthefollowingcommandfromyourlocalmachine’soperatingsystemshell:

$mongo--host<Publichostnameofthecloudinstance>

Whatwesawwasaverysimpledemoofwhatasecuritygroupofthiscloudinstancedoes.FormoredetailsonEC2instancesecurity,visithttp://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_Network_and_Security.html

WhatwesawsofarwashowtostartthemongodinstanceonthecloudusingtheMongoDBAMI.Ifthisisyouronlyobjective,thentherestofthecontentsinthesectioncanbeskipped.Whatwewillseenowishowthefilesystemissetuponthisinstance.

Movingon,let’slookatthefilesystemsetup.FromtheshellofthestartedMongoDBinstance,executethefollowingcommand:

[ec2-user@ip-10-236-144-125~]$mount

/dev/xvda1on/typeext4(rw,noatime)

procon/proctypeproc(rw)

sysfson/systypesysfs(rw)

devptson/dev/ptstypedevpts(rw,gid=5,mode=620)

tmpfson/dev/shmtypetmpfs(rw)

/dev/xvdfon/datatypeext4(rw,noexec,noatime)

/dev/xvdgon/journaltypeext4(rw,noexec,noatime)

/dev/xvdhon/logtypeext4(rw,noexec,noatime)

noneon/proc/sys/fs/binfmt_misctypebinfmt_misc(rw)

Wecanseethattherearethreedifferentmountpointsfor/data,/journal,and/log.ThesearethethreeprovisionedEBSstorageblockswith1000,250,and100IOPS,respectively.Thejournalis,however,createdinthejournaldirectoryinthedatadirectory.Let’slistthefilesinthe/datadirectoryasfollows:

[ec2-user@ip-10-236-144-125~]$ls-al/data

total98344

drwxr-xr-x4mongodmongod4096May308:35.

dr-xr-xr-x26rootroot4096May308:23..

lrwxrwxrwx1rootroot8Apr282013journal->/journal

Aswecansee,inthepartialcapturedoutputofthelscommand,thejournaldirectoryinthe/datadirectoryisalinktothe/journalmount.

Finally,let’sexecutethefollowingcommand:

[ec2-user@ip-10-236-144-125~]$cat/etc/fstab

#

LABEL=//ext4defaults,noatime11

tmpfs/dev/shmtmpfsdefaults00

devpts/dev/ptsdevptsgid=5,mode=62000

sysfs/syssysfsdefaults00

proc/procprocdefaults00

/dev/sdf/dataext4defaults,auto,noatime,noexec00

/dev/sdg/journalext4defaults,auto,noatime,noexec00

/dev/sdh/logext4defaults,auto,noatime,noexec00

Wewillseethatthethreemountpointsarealreadydefinedinthisfile.AsweuseAMIstocreatetheEBSmachine,wegetallthesethingsconfigured.

Let’slookatoneentry,/dev/sdf/dataext4defaults,auto,noatime,noexec00,addedinthefileandanalyzeit’sfieldsonebyone.Thesevaluesaretabseparated.

Thefirstvalue,/dev/sdf,isthedevicethatwearelookingtomountThesecondvalue,/data,isthedirectorytowhichthedirectorywillbemountedtoThethirdparameter,ext4,isthetypeofthefilesystemNext,wehavecomma-separatedvaluesofoptions:

Thevalue,default,isusedtoloaddefaultoptionsfortheext4partition.Thevalue,auto,isusedtoindicatethatthedevicewillbemountedautomaticallyonstartup;autoisthedefaultvalueandneednotbeexplicitlymentioned.Wheneverafileisaccessed,evenincaseofread,thelast-accessedtimeofthefileonthefilesystemisupdatedbyUnix.Thiswillhaveheavy,negativeperformanceimpactonbothreadandwriteoperations.SettingnoatimeinstructsOStonotupdatethislast-accessedtime.Thenoexecvalueinstructsthatthesefilesystemscannothaveexecutablesonthem.

Thefinaltwovaluesare0and0fordumpfrequencyandpassnumber.Bysettingthepassnumberto0,wedisablepartitionchecksforthesepartitions

Thatisprettymuchit;aswesaw,theAMIhasmadelifeeasyforusandgivenamachine

imagewithalltherecommendedsettingstohelpusgetuptospeedinspinningoffaserverinthecloudandstartingtheMongoDBserver.Allotherstepstostarttheservers,formreplicasetsandshards,andmonitorthemarethesameasthestepsusedtostartaserveronyourlocalmachineorinyourowndatacenters.RefertoChapter4,Administration,andChapter6,MonitoringandBackups,formorerecipesonadministrationandmonitoringMongoinstances.

MakesureyoustoptheEC2instance,ifthisisatest,assoonaspossiblefromtheEC2consoletoavoidbeingchargedunnecessarily.Astoppedinstancewillnotattractanycharges.TheblockedEBSinstancesarealsochargedfordataonit;ifyouplantonotusethisinstanceanymore,terminatetheinstanceandreleasetheEBSvolumesattached.

SettingupMongoDBonAmazonEC2withoutusingtheMongoDBAMIInthepreviousrecipe,wesawhowtostartastandalonedatabaseinthecloudusingtheMongoDBAMI;thisisperhapsthesimplestwaytostarttheserverontheEC2instance.However,whenusinganAMI,youaretiedtotheconfigurationstheAMIsupports.Forinstance,foranoncrucialinstance,youmightnotwantalargeinstanceoranEBSvolumewithguaranteedIOPS.YoumightevenbeOKwiththesameEBSvolumefordata,journal,andlogstocutonthecosts,astheinstanceyouaresettingupisadevelopmentortestinstance.Also,theoperatingsystemfortheAMIsisAmazonLinux;ifyouwishtouseadifferentOSandinstallMongoDBonit,AMIisn’tofanyhelp.Inallsuchscenarios,settinguptheinstancemanuallyistheonlyoptionleft.Thisisn’tasimplejob,andacarefulsetupofvariousfactors,includingthestandardoperatingsystemparameters,isneeded.Inthisrecipe,wewillnotgetintothesecomplicatedtasksbutrathersetupasmallmicroinstance,whichisasgoodasasandboxinstancewithoneEBSblockvolumeattachedtoit.

GettingreadyRefertotheGettingreadysectionofthepreviousrecipe,whichisaprerequisiteforthisrecipeaswell.

Howtodoit…1. Gotohttps://console.aws.amazon.com/ec2/,clickontheInstancesoptionintheleft-

handcorner,andthentheLaunchInstancebutton.

2. Aswewillstartafreemicroinstance,checktheFreetieronlycheckboxontheleft-handside.Ontheright-handside,selecttheinstancewewanttosetup.WewillchoosetouseUbuntuServer.ClickonSelecttonavigatetothenextwindow.

3. ChooseMicroInstanceandclickonReviewandLaunch.Ignorethesecuritywarning;thedefaultsecuritygroupthatyouhaveistheonethatwillacceptconnectionsoverport22fromallthehostsonthepublicnetwork.

4. Withouteditinganydefaultsettings,clickonLaunch.OnclickingLaunch,apopupwillappear,lettingyouchooseanexistingkeypair.Ifyouproceedwithoutakeypair,youwouldneedthepasswordorhavetocreateanewkeypair.Inthepreviousrecipe,wealreadycreatedakeypair;wewillusethishere.

5. ClickonLaunchInstancetostartthenewmicroinstance.6. Refertosteps9to12inthepreviousrecipetolearnhowtoconnecttothestarted

instanceusingPuTTY.Notethatwewillusetheubuntuuserinsteadofec2-user,whichweusedinthelastrecipe,asthistimeweareusingUbuntuinsteadofAmazonLinux.

7. BeforeweaddaMongoDBrepository,weneedtoimporttheMongoDBpublickeyasfollows:

$sudoapt-keyadv--keyserverhkp://keyserver.ubuntu.com:80--recv

7F0CEB10

8. Executethefollowingcommandontheoperatingsystemshell:

$echo'debhttp://downloads-distro.mongodb.org/repo/ubuntu-upstart/

dist10gen'|sudotee/etc/apt/sources.list.d/mongodb.list

9. Loadthelocaldatabasebyexecutingthefollowingcommand:

$sudoapt-getinstallmongodb-org

10. Executethefollowingcommandtocreatetherequireddirectories:

$sudomkdir/data/log

11. Startthemongodprocessasfollows:

$sudomongod--dbpath/data--logpath/log/mongodb.log--smallfiles--

oplogsize50–fork

12. Toensurethattheserverprocessisupandrunning,executethefollowingcommandfromtheshell,andwewillseethefollowingcommandinthelog:

$tail/log/mongodb.log

2014-05-04T13:41:16.533+0000[initandlisten]journaldir=/data/journal

2014-05-04T13:41:16.534+0000[initandlisten]recover:nojournalfiles

present,norecoveryneeded

2014-05-04T13:41:16.628+0000[initandlisten]waitingforconnectionson

port27017

13. StarttheMongoshellasfollowsandexecutethefollowingcommand:

$mongo

>db.ec2Test.insert({_id:1,message:'HelloWorld!'})

>db.ec2Test.findOne()

Howitworks…Alotofstepsareself-explanatory.Itisrecommendedthatyouatleastgothroughthepreviousrecipeasalotofconceptsthatareexplainedthereapplyforthisrecipe.Afewthingsthataredifferentareexplainedinthissection.Forinstallation,wechoseUbuntuagainstAmazonLinux,whichisstandardwhenyousetuptheserverusingtheAMI.Differentoperatingsystemshavedifferentstepsforinstallation.Visithttp://docs.mongodb.org/manual/installation/forstepstoinstallMongoDBondifferentplatforms.Steps7to9inthisrecipearespecificfortheinstallationofMongoDBonUbuntu.Refertothehttps://help.ubuntu.com/12.04/serverguide/apt-get.htmlpageformoredetailsontheapt-getcommandthatweexecutedheretoinstallMongoDB.

Inourcase,wechosetohavethedata,journal,andlogfoldersonthesameEBSvolume.Thisisbecausewhatwesetupisadevinstance.Inthecaseofaprodinstance,therewouldbedifferentEBSvolumeswithprovisionedIOPSforoptimumperformance.Thissetupallowsustogainadvantageofthefactthatthesedifferentvolumeshavedifferentcontrollers,andthus,concurrentwriteoperationsarepossible.EBSvolumeswithprovisionedvolumesarebackedbySSDdrives.Theproductiondeploymentnotesathttp://docs.mongodb.org/manual/administration/production-notes/statethatMongoDBdeploymentshouldbebackedbyRAID-10disks.WhendeployingonAWS,preferPIOPSoverRAID-10.Forinstance,if4000IOPSisdesired,thenchoosetheEBSvolumewith4000IOPSratherthanaRAID-10setupwith2X2000IOPSora4X1000IOPSsetup.ThisnotonlyeliminatesunnecessarycomplexitybutalsomakesitpossibletosnapshotasinglediskasagainstdealingwithmultipledisksintheRAID-10setup.Speakingofsnapshotting,journalloganddataarewrittentoseparatevolumesinthemajorityofproductiondeployments.Thisisthescenariowheresnapshottingdoesn’twork.WeneedtoflushtheDBwrites,lockthedataforfurtherwritesuntilbackupcompletes,andthenreleasethelock.Visithttp://docs.mongodb.org/manual/tutorial/backup-with-filesystem-snapshots/formoredetailsonsnapshottingandbackups.

Visithttp://docs.mongodb.org/ecosystem/platforms/formoredetailsondeploymentondifferentcloudproviders.ThereisasectionspecificallyforbackupsonAmazonEC2instances.PreferusingAMIstosetupMongoDBinstancesforproductiondeployments,asdemonstratedinthepreviousrecipe,overmanuallysettinguptheinstances.ManualsetupisokforasmalldevpurposewherealargeinstancewithEBSvolumeswithprovisionedIOPSisoverkill.

SeealsoCloudformationisawayinwhichyoucandefinetemplatesandautomateyourinstancecreationforEC2instances.Knowwhatcloudformationisathttps://aws.amazon.com/cloudformation/andhttps://mongodb-documentation.readthedocs.org/en/latest/ecosystem/tutorial/automate-deployment-with-cloudformation.html.Visithttp://en.wikipedia.org/wiki/Standard_RAID_levelsandhttp://en.wikipedia.org/wiki/Nested_RAID_levelstoknowmoreaboutRAID.

Chapter8.IntegrationwithHadoopInthischapter,wewillcoverthefollowingrecipes:

ExecutingourfirstsampleMapReducejobusingthemongo-hadoopconnectorWritingourfirstHadoopMapReducejobRunningMapReducejobsonHadoopusingstreamingRunningaMapReducejobonAmazonEMR

IntroductionHadoopisawell-knownopensourcesoftwareforprocessinglargedatasets.ItalsohasanAPIfortheMapReduceprogrammingmodel,whichiswidelyused.NearlyallBigDatasolutionshavesomesortofsupporttointegratewithHadooptouseitsMapReduceframework.MongoDBtoohasaconnectorthatintegrateswithHadoop;itletsuswriteMapReducejobsusingtheHadoopMapReduceAPI,processdatathatresidesintheMongoDB/MongoDBdumps,andwritetheresultbacktotheMongoDB/MongoDBdumpfiles.Inthischapter,wewilllookatsomerecipesthatdealwithbasicMongoDBandHadoopintegration.

ExecutingourfirstsampleMapReducejobusingthemongo-hadoopconnectorInthisrecipe,wewillseehowtobuildtheMongoHadoopconnectorfromsourceandsetupHadoopjustforthepurposeofrunningexamplesinthestandalonemode.TheconnectoristhebackbonethatrunsMapReducejobsonHadoopusingthedatainMongo.

GettingreadyTherearevariousdistributionsofHadoop;however,wewilluseApacheHadoop(http://hadoop.apache.org/).TheinstallationwillbedoneonaLinux-flavoredOS,andIamusingUbuntuLinux.Forproduction,ApacheHadoopalwaysrunsonaLinuxenvironment;Windowsisnottestedforproductionsystems.Fordevelopmentpurposes,however,Windowscanbeused.IfyouareaWindowsuser,IwouldrecommendthatyouinstallavirtualizationenvironmentsuchasVirtualBox(https://www.virtualbox.org/),setupaLinuxenvironment,andtheninstallHadooponit.SettingupVirtualBoxandthensettingupLinuxonitisnotshowninthisrecipe,butthisisnotatedioustask.TheprerequisiteforthisrecipeisamachinewiththeLinuxoperatingsystemonitandanInternetconnection.Theversionwesetuphereis2.4.0ofApacheHadoop.Atthetimeofwritingthisbook,thelatestversionofApacheHadoopandthatsupportedbythemongo-hadoopconnectoris2.4.0.

AGitclientisneededtoclonetherepositoryofthemongo-hadoopconnectortoalocalfilesystem.Refertohttp://git-scm.com/book/en/Getting-Started-Installing-GittoinstallGit.

YouwillalsoneedMongoDBtobeinstalledonyouroperatingsystem.Refertohttp://docs.mongodb.org/manual/installation/andinstallitaccordingly.Startthemongodinstancethatlistenstoport27017.YouarenotexpectedtobeanexpertinHadoop,butsomefamiliaritywithitwillbehelpful.KnowingtheconceptofMapReduceisimportant,andknowingtheHadoopMapReduceAPIwillbeanadvantage.Inthisrecipe,wewillexplainwhatisneededtogettheworkdone.YoumightprefertogetmoredetailsonHadoopanditsMapReduceAPIfromothersources.Thewikipageathttp://en.wikipedia.org/wiki/MapReducegivesenoughinformationontheMapReduceprogramming.

Howtodoit…1. First,installJava,Hadoop,andtherequiredpackages.2. StartbyinstallingJDKontheoperatingsystem.Typethefollowingcommandinthe

commandpromptoftheoperatingsystem:

$javac–version

3. Iftheprogramdoesn’texecuteandinstead,youaretoldaboutvariouspackagesthatcontainthejavacprogram,wewouldneedtoinstallitasfollows:

$sudoapt-getinstalldefault-jdk

ThisisallweneedtodotoinstallJava

4. Now,downloadthecurrentversionofHadoopfromhttp://www.apache.org/dyn/closer.cgi/hadoop/common/anddownloadversion2.4(orthelatestmongo-hadoopconnectorsupports).

5. Afterthe.tar.gzfileisdownloaded,executethefollowingcommandsinthecommandprompt:

$tar–xvzf<nameofthedownloaded.tar.gzfile>

$cd<extracteddirectory>

6. Opentheetc/hadoop/hadoop-env.shfileandreplaceexportJAVA_HOME=${JAVA_HOME}withexportJAVA_HOME=/usr/lib/jvm/default-java.

7. Wewillnowgetthemongo-hadoopconnectorcodefromGitHubonourlocalfilesystem.Notethatyoudon’tneedaGitHubaccounttoclonearepository.ClonetheGitprojectfromtheoperatingsystem’scommandpromptasfollows:

$gitclonehttps://github.com/mongodb/mongo-hadoop.git

$cdmongo-hadoop

8. Createasoftlinkasfollows;theHadoopinstallationdirectoryisthesameastheoneweextractedinstep5:

$ln–s<hadoopinstallationdirectory>~/hadoop-binaries

Forexample,ifHadoopinextracted/installedinthehomedirectory,thefollowingcommandwouldneedtobeexecuted:

$ln–s~/hadoop-2.4.0~/hadoop-binaries

Bydefault,themongo-hadoopconnectorwilllookforaHadoopdistributionunderthe~/hadoop-binariesfolder.So,eveniftheHadooparchiveisextractedelsewhere,wecancreateasoftlinktoit.Oncetheprecedinglinkiscreated,wewillhavetheHadoopbinariesinthe~/hadoop-binaries/hadoop-2.4.0/binpath.

9. Wewillnowbuildthemongo-hadoopconnectorfromsourceforApacheHadoopVersion2.4.0asfollows.Thebuild,bydefault,buildstheconnectorforthelatestversion;so,asofnow,the-Phadoop_versionparametercanbeleftout,as2.4isthelatestanyway:

$./gradlewjar–Phadoop_version='2.4'

Thisbuildprocesswilltakesometimetocomplete

10. Oncethebuildcompletessuccessfully,wearereadytoexecuteourfirstMapReducejob.WewouldbedoingitusingatreasuryYieldsampleprovidedwiththemongo-hadoopconnectorproject.ThefirstactivityistoimportthedatatoacollectioninMongo.

11. Assumingthatthemongodinstanceisupandrunningandlisteningtoport27017forconnectionsandthecurrentdirectoryistherootofthemongo-hadoopconnectorcodebase,executethefollowingcommand:

$mongoimport-cyield_historical.in-dmongo_hadoop--drop

examples/treasury_yield/src/main/resources/yield_historical_in.json

12. Oncetheimportactionissuccessful,weareleftwithcopyingtwoJARfilestothelibdirectory.Executethefollowingcommandsfromtheoperatingsystemshell:

$wgethttp://repo1.maven.org/maven2/org/mongodb/mongo-java-

driver/2.12.0/mongo-java-driver-2.12.0.jar

$cpcore/build/libs/mongo-hadoop-core-1.2.1-SNAPSHOT-hadoop_2.4.jar

~/hadoop-binaries/hadoop-2.4.0/lib/

$mvmongo-java-driver-2.12.0.jar~/hadoop-binaries/hadoop-2.4.0/lib

13. TheJARfilebuiltforthemongo-hadoopcoretobecopiedwasnamedasmongo-hadoop-core-1.2.1-SNAPSHOT-hadoop_2.4.jarforthetrunkversionofthecodeandbuiltforHadoop2.4.0.ChangethenameoftheJARaccordinglywhenyoubuildityourselfforadifferentversionoftheconnectorandHadoop.TheMongodrivercanbethelatestversion.Version2.12.0isthelatestoneatthetimeofwritingthisbook.

14. Now,executethefollowingcommandinthecommandpromptoftheoperatingsystemshell:

~/hadoop-binaries/hadoop-2.4.0/bin/hadoopjar

examples/treasury_yield/build/libs/treasury_yield-1.2.1-SNAPSHOT-

hadoop_2.4.jar\

com.mongodb.hadoop.examples.treasury.TreasuryYieldXMLConfig\

-Dmongo.input.split_size=8-Dmongo.job.verbose=true\

-

Dmongo.input.uri=mongodb://localhost:27017/mongo_hadoop.yield_historica

l.in\

-

Dmongo.output.uri=mongodb://localhost:27017/mongo_hadoop.yield_historic

al.out

15. Theoutputshouldprintoutalotofthings.However,thefollowinglineintheoutputwilltellusthattheMapReducejobissuccessful:

14/05/1121:38:54INFOmapreduce.Job:Jobjob_local1226390512_0001

completedsuccessfully

16. ConnectthemongodinstancethatrunsonalocalhostfromtheMongoclientandexecuteafindqueryonthefollowingcollection:

$mongo

>usemongo_hadoop

switchedtodbmongo_hadoop

>db.yield_historical.out.find()

Howitworks…InstallingHadoopisnotatrivialtask,andwedon’tneedtogetintothistotryoursamplesforthemongo-hadoopconnector.TolearnaboutHadoop,therearededicatedbooksandarticlesavailable.Forthepurposeofthischapter,wewillsimplydownloadthearchiveandextractandruntheMapReducejobsinthestandalonemode.ThisisthequickestwaytogetgoingwithHadoop.Allthestepsuptostep6areneededtoinstallHadoop.Inthenextcoupleofsteps,wesimpleclonedthemongo-hadoopconnectorrepository.YoumightalsodownloadastablebuildversionforyourversionofHadoopfromhttps://github.com/mongodb/mongo-hadoop/releasesifyouprefertonotbuildfromsourceanddownloaddirectly.WethenbuilttheconnectorforourversionofHadoop(2.4.0)untilstep13.Fromstep14onwards,werantheactualMapReducejobtoworkonthedatainMongoDB.Weimportedthedataintotheyield_historical.incollection,whichwillbeusedasaninputtotheMapReducejob.GoaheadandquerythecollectionfromtheMongoshellusingthemongo_hadoopdatabasetoseeadocument.Don’tworryifyoudon’tunderstandthecontents;youwanttoseeinthisexamplewhatyouintendtodowiththisdata.

ThenextstepwastoinvoketheMapReduceoperationonthedata.ThecommandHadoopwasexecutedgivingoneofJAR’spath(examples/treasury_yield/build/libs/treasury_yield-1.2.1-SNAPSHOT-hadoop_2.4.jar).ThisistheJARfilethatcontainstheclassesthatimplementasampleMapReduceoperationfortreasuryyield.Thecom.mongodb.hadoop.examples.treasury.TreasuryYieldXMLConfigclassinthisJARfileisthebootstrapclassthatcontainsthemainmethod.Wewillseethisclasssoon.Therearelotsofconfigurationssupportedbytheconnector.Acompletelistofconfigurationscanbefoundathttps://github.com/mongodb/mongo-hadoop/blob/master/CONFIG.md.Fornow,wewilljustrememberthatmongo.input.uriandmongo.output.uriarethecollectionsforinputandoutput,respectively,oftheMapReduceoperations.

Withtheprojectcloned,youmightimportitintoanyJavaIDEofyourchoice.Weareparticularlyinterestedintheprojectat/examples/treasury_yieldandthecoreprojectpresentintherootoftheclonedrepository.

Let’slookatthecom.mongodb.hadoop.examples.treasury.TreasuryYieldXMLConfigclass.ThisistheentrypointintotheMapReducemethodandhasamainmethodinit.TowriteMapReducejobsforMongousingthemongo-hadoopconnector,themainclassalwayshastoextendfromcom.mongodb.hadoop.util.MongoTool.Thisclassimplementstheorg.apache.hadoop.Toolinterface,whichhastherunmethodandisimplementedforusbytheMongoToolclass.Allthatthemainmethodneedstodoisexecutethisclassusingtheorg.apache.hadoop.util.ToolRunnerclassbyinvokingitsstaticrunmethod,passingtheinstanceofourmainclass(aninstanceofTool).

ThereisastaticblockthatloadssomeconfigurationsfromtwoXMLfiles:hadoop-local.xmlandmongo-defaults.xml.Theformatofthesefiles(oranyXMLfile)isasfollows.Therootnodeofthefileistheconfigurationnodeandmultiplepropertynodes

underit:

<configuration>

<property>

<name>{propertyname}</name>

<value>{propertyvalue}</value>

</property>

...

</configuration>

ThepropertyvaluesthatmakesenseinthiscontextareallthosewementionedintheURLprovidedearlier.Weinstantiatecom.mongodb.hadoop.MongoConfigwrappinganinstanceoforg.apache.hadoop.conf.ConfigurationintheconstructorofthebootstrapclassTreasuryYieldXmlConfig.TheMongoConfigclassprovidessensibledefaults,whichisenoughtosatisfythemajorityoftheusecases.SomeofthemostimportantthingsweneedtosetintheMongoConfiginstancearetheoutputandtheinputformats,themapperandthereducerclasses,theoutputkeyandthevalueofmapper,andtheoutputkeyandthevalueofreducer.Theinputandoutputformatswillalwaysbethecom.mongodb.hadoop.MongoInputFormatandcom.mongodb.hadoop.MongoOutputFormatclasses,respectively;theyareprovidedbythemongo-hadoopconnectorlibrary.Forthemapperandreduceroutputkeyandthevalue,wehaveanyoftheorg.apache.hadoop.io.Writableimplementation.RefertotheHadoopdocumentationfordifferenttypesofWritableimplementationsintheorg.apache.hadoop.iopackage.Apartfromthese,themongo-hadoopconnectoralsoprovidesuswithsomeimplementationsinthecom.mongodb.hadoop.iopackage.Forthetreasuryyieldexample,weusedtheBSONWritableinstance.TheseconfigurablevaluescaneitherbeprovidedintheXMLfilewesawearlierorcanbeprogrammaticallyset.Finally,wehavetheoptiontoprovidethemasvmarguments,aswedidformongo.input.uriandmongo.output.uri.TheseparameterscanbeprovidedeitherinXMLorinvokeddirectlyfromthecodeintheMongoConfiginstance;thetwomethodsaresetInputURIandsetOutputURI.

Wewillnowlookatthemapperandreducerclassimplementations.Here,wewillcopytheimportantportionoftheclasstoanalyzeit.Refertotheclonedprojectfortheentireimplementation:

publicclassTreasuryYieldMapper

extendsMapper<Object,BSONObject,IntWritable,DoubleWritable>{

@Override

publicvoidmap(finalObjectpKey,

finalBSONObjectpValue,

finalContextpContext)

throwsIOException,InterruptedException{

finalintyear=((Date)pValue.get("_id")).getYear()+1900;

doublebid10Year=((Number)pValue.get("bc10Year")).doubleValue();

pContext.write(newIntWritable(year),newDoubleWritable(bid10Year));

}

}

Ourmapperextendstheorg.apache.hadoop.mapreduce.Mapperclass.Thefourgeneric

parametersareforthekeyclass,typeoftheinputvalue,typeoftheoutputkey,andtheoutputvalue,respectively.Thebodyofthemapmethodreadsthe_idvaluefromtheinputdocument,whichisdate,andextractstheyearoutofit.Then,itgetsthedoublevaluefromthedocumentforthebc10Yearfieldandsimplywritestothecontextkey-valuepairwherethekeyistheyearandthevalueisthedouble.Theimplementationheredoesn’trelyonthevalueofthepKeyparameterpassed;thiscanbeusedasthekey,insteadofhardcodingthe_idvalueintheimplementation.Thisvalueisbasicallythesamefieldthatwillbesetusingthemongo.input.keypropertyintheXMLorusingtheMongoConfig.setInputKeymethod.Ifnoneisset,_idisanywaythedefaultvalue.

Let’slookatthereducerimplementation(withtheloggingstatementsremoved):

publicclassTreasuryYieldReducer

extendsReducer<IntWritable,DoubleWritable,IntWritable,BSONWritable>{

@Override

publicvoidreduce(finalIntWritablepKey,finalIterable<DoubleWritable>

pValues,finalContextpContext)

throwsIOException,InterruptedException{

intcount=0;

doublesum=0;

for(finalDoubleWritablevalue:pValues){

sum+=value.get();

count++;

}

finaldoubleavg=sum/count;

BasicBSONObjectoutput=newBasicBSONObject();

output.put("count",count);

output.put("avg",avg);

output.put("sum",sum);

pContext.write(pKey,newBSONWritable(output));

}

}

Thisclassextendedfromorg.apache.hadoop.mapreduce.Reducerandhadfourgenericparametersagainfortheinputkey,inputvalue,outputkey,andtheoutputvaluerespectively.Theinputtoreduceristheoutputfrommapper.Thus,ifyounoticecarefully,thetypeofthefirsttwogenericparametersisthesameasthelasttwogenericparametersofmapperwesawearlier.Thethirdandfourthparametersinthiscasearethetypeofthekeyandthevalueemittedfromreduce,respectively.ThetypeofthevalueisBSONDocument,andthus,wehaveBSONWritableasthetype.

Wenowhavethereducemethodthathastwoparameters:thefirstoneisthekey,whichisthesameasthekeyemittedfromthemapfunction,andthesecondparameterisjava.lang.Iterableofthevaluesemittedforthesamekey.ThisishowstandardMapReducefunctionswork.Forinstance,ifthemapfunctiongavethekey-valuepairsas(1950,10),(1960,20),(1950,20),(1950,30),thenreducewouldbeinvokedwithtwouniquekeys,1950and1960.Thevalueforthekey1950willbeanIterablewith(10,20,30),whereasthatof1960willbeanIterableofasingleelement(20).ThereducefunctionofthereducerclasssimplyiteratesthroughthisIterableofdoubles,findsthesumandcountofthesenumbers,andwritesonekey-valuepairwherethekeyisthesame

astheincomingkeyandtheoutvalueisBasicBSONObject,withthesum,count,andaverageinitforthecomputedvalues.

Therearesomegoodsamples,includingtheenrondataset,intheexamplesoftheclonedmongo-hadoopconnector.Ifyouwouldliketoplayaroundabit,Iwouldrecommendthatyoutakealookattheseexampleprojectstooandrunthem.

There’smore…Whatwesawherewasaready-madesamplethatweexecuted.ThereisnothinglikewritingoneMapReducejobourselvestoclarifyourunderstanding.Inthenextrecipe,wewillwriteonesampleMapReducejobusingtheHadoopAPIinJavaandseeitinaction.

Seealsohttp://www.mail-archive.com/[email protected]/msg00378.htmltoknowwhattheWritableinterfaceisallaboutandwhyyoushouldnotuseplainoldserialization

WritingourfirstHadoopMapReducejobInthisrecipe,wewillwriteourfirstMapReducejobusingtheHadoopMapReduceAPIandrunitusingthemongo-hadoopconnectorthatgetsthedatafromMongoDB.RefertotheMapReduceinMongousingaJavaclientrecipeinChapter3,ProgrammingLanguageDrivers,toseehowMapReduceisimplementedusingaJavaclient,howtocreatetestdataandproblemstatements.

GettingreadyRefertothepreviousrecipetosetupthemongo-hadoopconnector.TheprerequisitesoftheExecutingourfirstsampleMapReducejobusingthemongo-hadoopconnectorrecipe(whichispresentinthischapter)andtheMapReduceinMongousingaJavaclientrecipeinChapter3,ProgrammingLanguageDrivers,areallweneedforthisrecipe.ThisisaMavenproject;thus,Mavenneedstobesetupandinstalled.RefertotheConnectingtoasinglenodefromaJavaclientrecipeinChapter1,InstallingandStartingtheMongoDBServer,wherewegavethestepstosetupMaveninWindows.However,thisprojectisbuiltonUbuntuLinux,andthefollowingisthecommandyouneedtoexecutefromtheoperatingsystemshelltogetMaven:

$sudoapt-getinstallmaven

Howtodoit…1. WehaveaJavamongo-hadoop-mapreduce-testproject,whichcanbedownloaded

fromthebook’swebsite.TheprojectistargetedatachievingthesameusecasethatweachievedintherecipesinChapter3,ProgrammingLanguageDrivers,whereweusedMongoDB’sMapReduceframework.WehadinvokedthatMapReducejobusingthePythonandJavaclientsonearlieroccasions.

2. Inthecommandprompt,withthecurrentdirectoryintherootoftheprojectwherethepom.xmlfileispresent,executethefollowingcommand:

$mvncleanpackage

3. TheJARmongo-hadoop-mapreduce-test-1.0.jarfilewillbebuiltandkeptinthetargetdirectory.

4. WiththeassumptionthattheCSVfileisalreadyimportedintothepostalCodescollection,executethefollowingcommandwiththecurrentdirectorystillintherootofthemongo-hadoop-mapreduce-testprojectwejustbuilt:

~/hadoop-binaries/hadoop-2.4.0/bin/hadoop\

jartarget/mongo-hadoop-mapreduce-test-1.0.jar\

com.packtpub.mongo.cookbook.TopStateMapReduceEntrypoint\

-Dmongo.input.split_size=8\

-Dmongo.job.verbose=true\

-Dmongo.input.uri=mongodb://localhost:27017/test.postalCodes\

-

Dmongo.output.uri=mongodb://localhost:27017/test.postalCodesHadoopmrOut

5. OncetheMapReducejobcompletes,opentheMongoshellbytypingthefollowingcommandintheoperatingsystemcommandpromptandexecutethefollowingqueryfromtheshell:

$mongo

>db.postalCodesHadoopmrOut.find().sort({count:-1}).limit(5)

6. ComparetheoutputwiththeoneswegotearlierwhenweexecutedtheMapReducejobsusingMongo’sMapReduceframework(Chapter3,ProgrammingLanguageDrivers).

Howitworks…Wehavekepttheclassesverysimpleandwiththefewestpossiblerequirements.Wejusthavethreeclassesinourproject:TopStateMapReduceEntrypoint,TopStateReducer,andTopStatesMapper.Alltheseclassesareinthesamepackagecalledcom.packtpub.mongo.cookbook.Themapfunctionofthemapperclassjustwritesakey-valuepairtothecontext;here,thekeyisthenameofthestate,andthevalueisanintegervalue1.ThefollowinglineofcodeisfromtheMapperfunction:

context.write(newText((String)value.get("state")),newIntWritable(1));

WhatthereducergetsisthesamekeythatisalistofstatesandanIterableofintegervalue1.Allthatwedoiswritetothecontextthesamenameofthestateandthesumoftheiterables.Now,sincethereisnosizemethodintheIterablethatcangivethecountinconstanttime,weareleftwithaddinguptheoneswegetinlineartime.ThefollowingisthecodesnippetintheReducermethod:

intsum=0;

for(IntWritablevalue:values){

sum+=value.get();

}

BSONObjectobject=newBasicBSONObject();

object.put("count",sum);

context.write(text,newBSONWritable(object));

WewritetocontextthetextstringthatisthekeyandthevaluethatisaJSONdocumentthatcontainsthecount.Themongo-hadoopconnectoristhenresponsibleforwritingtotheoutputcollectionwehave,thatis,postalCodesHadoopmrOut.Thedocumenthasthe_idfieldwhosevalueissameasthekeyemittedfromthemapper.Thus,whenweexecutethefollowingquery,wewillgetthetopfivestateswiththegreatestnumberofcitiesinourdatabase:

>db.postalCodesHadoopmrOut.find().sort({count:-1}).limit(5)

{"_id":"Maharashtra","count":6446}

{"_id":"Kerala","count":4684}

{"_id":"TamilNadu","count":3784}

{"_id":"AndhraPradesh","count":3550}

{"_id":"Karnataka","count":3204}

Finally,themainmethodofthemainentrypointclassisasfollows:

Configurationconf=newConfiguration();

MongoConfigconfig=newMongoConfig(conf);

config.setInputFormat(MongoInputFormat.class);

config.setMapperOutputKey(Text.class);

config.setMapperOutputValue(IntWritable.class);

config.setMapper(TopStatesMapper.class);

config.setOutputFormat(MongoOutputFormat.class);

config.setOutputKey(Text.class);

config.setOutputValue(BSONWritable.class);

config.setReducer(TopStateReducer.class);

ToolRunner.run(conf,newTopStateMapReduceEntrypoint(),args);

Allthatwedoiswraptheorg.apache.hadoop.conf.Configurationobjectwiththecom.mongodb.hadoop.MongoConfiginstancetosetvariouspropertiesandthensubmittheMapReducejobforexecutionusingToolRunner.

There’smore…Inthisrecipe,weexecutedasimpleMapReducejobonHadoopusingtheHadoopAPI,sourcingthedatafromMongoDB,andwritingittotheMongoDBcollection.Whatifwewanttowritethemapandreducefunctionsinadifferentlanguage?Fortunately,thisispossibleusingaconceptcalledHadoopstreaming,wherestdoutisusedasameanstocommunicatebetweentheprogramandtheHadoopMapReduceframework.Inthenextrecipe,wewilldemonstratehowtousePythontoimplementthesameusecaseastheoneinthisrecipeusingHadoopstreaming.

RunningMapReducejobsonHadoopusingstreamingInthepreviousrecipe,weimplementedasimpleMapReducejobusingtheJavaAPIofHadoop.TheusecasewasthesameastheoneintherecipesofChapter3,ProgrammingLanguageDrivers,wherewesawMapReduceimplementedusingMongoclientAPIsinPythonandJava.Inthisrecipe,wewilluseHadoopstreamingtoimplementMapReducejobs.

Theconceptofstreamingworksbasedoncommunicationusingstdinandstdout.GetmoreinformationonwhatHadoopstreamingisandhowitworksathttp://hadoop.apache.org/docs/r1.2.1/streaming.html.

GettingreadyRefertotheExecutingourfirstsampleMapReducejobusingthemongo-hadoopconnectorrecipetoseehowtosetupHadoopfordevelopmentpurposesandbuildthemongo-hadoopprojectusinggradle.AsfarasPythonlibrariesareconcerned,wewillinstalltherequiredlibraryfromsource.However,youcanusepiptocarryoutthesetupifyoudonotwishtobuildfromsource.Wewillalsoseehowtosetuppymongo-hadoopusingpip.

RefertotheInstallingPyMongorecipeinChapter3,ProgrammingLanguageDrivers,toseehowtoinstallPyMongoandpip.

Howitworks…1. Wewillfirstbuildpymongo–hadoopfromsource.Withtheprojectclonedtothelocal

filesystem,executethefollowingcommandsfromtherootoftheclonedproject:

$cdstreaming/language_support/python

$sudopythonsetup.pyinstall

2. Afteryouenterthepassword,setupwillcontinuetoinstallpymongo-hadooponyourmachine.

3. Thatisallweneedtodotobuildpymongo-hadoopfromsource.However,ifyouchosetonotbuildfromsource,youcouldexecutethefollowingcommandfromtheoperatingsystemshell:

$sudopipinstallpymongo_hadoop

4. Afterinstallingpymongo-hadoopineitherway,wewillnowimplementourmapperandreducerfunctionsinPython.

5. Themapperfunctionisasfollows:

#!/usr/bin/envpython

importsys

frompymongo_hadoopimportBSONMapper

defmapper(documents):

print>>sys.stderr,'Startingmapper'

fordocindocuments:

yield{'_id':doc['state'],'count':1}

print>>sys.stderr,'Mappercompleted'

BSONMapper(mapper)

6. Thereducerfunctionisasfollows:

#!/usr/bin/envpython

importsys

frompymongo_hadoopimportBSONReducer

defreducer(key,documents):

print>>sys.stderr,'Invokedreducerforkey"',key,'"'

count=0

fordocindocuments:

count+=1

return{'_id':key,'count':count}

BSONReducer(reducer)

7. The$HADOOP_HOMEand$HADOOP_CONNECTOR_HOMEenvironmentvariablesshouldpointtothebasedirectoryofHadoopandthebasedirectoryofthemongo-hadoopconnectorproject,respectively.Now,wewillinvoketheMapReducefunctionusingthefollowingcommandfromtheoperatingsystemshell.Thecodeavailableonthebook’swebsitehasthemapperandreducerPythonscriptandashellscriptthatwill

beusedtoinvokemapperandreducer:

$HADOOP_HOME/bin/hadoopjar\

$HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming*\

-libjars$HADOOP_CONNECTOR_HOME/streaming/build/libs/mongo-hadoop-

streaming-1.2.1-SNAPSHOT-hadoop_2.4.jar\

-input/tmp/in\

-output/tmp/out\

-inputformatcom.mongodb.hadoop.mapred.MongoInputFormat\

-outputformatcom.mongodb.hadoop.mapred.MongoOutputFormat\

-iomongodb\

-jobconfmongo.input.uri=mongodb://127.0.0.1:27017/test.postalCodes\

-jobconfmongo.output.uri=mongodb://127.0.0.1:27017/test.pyMRStreamTest

\

-jobconf

stream.io.identifier.resolver.class=com.mongodb.hadoop.streaming.io.Mon

goIdentifierResolver\

-mappermapper.py\

-reducerreducer.py

8. Themapper.pyandreducer.pyfilesarepresentinthecurrentdirectorywhenexecutingthiscommand.

9. Onexecutingthecommand,whichshouldtakesometimeforsuccessfulexecutionoftheMapReducejob,opentheMongoshellbytypingthefollowingcommandintheoperatingsystemcommandpromptandexecutethefollowingqueryfromtheshell:

$mongo

>db.pyMRStreamTest.find().sort({count:-1}).limit(5)

10. ComparetheoutputwiththeoneswegotearlierwhenweexecutedtheMapReducejobsusingMongo’sMapReduceframeworkinChapter3,ProgrammingLanguageDrivers.

Howtodoit…Let’slookatsteps5and6wherewewrotethemapperandreducerfunctions.Wedefinedamapfunctionthatacceptsalistofallthedocuments.Weiteratedthroughtheseyielddocumentswherethe_idfieldisthenameofthekey,andthevaluefieldcounthasthevalue1.Thenumberofdocumentsyieldedwillbethesameasthetotalnumberofinputdocuments.

Finally,weinstantiatedBSONMapper,whichacceptsthemapperfunctionastheparameter.Thefunctionreturnedageneratorobject,whichisthenusedbythisBSONMapperclasstofeedthevaluetotheMapReduceframework.Allweneedtorememberisthatthatthemapperfunctionneedstoreturnagenerator(whichisreturnedaswecallyieldintheloop)andtheninstantiatetheBSONMapperclass,whichisprovidedtousbythepymongo_hadoopmodule.Thoseintriguedenoughmightchoosetolookatthesourcecodeundertheprojectclonedonyourlocalfilesysteminthestreaming/language_support/python/pymongo_hadoop/mapper.pyfileandseewhatitdoes.Itisasmallpieceofcodethatissimpletounderstand.

Forthereducerfunction,wegotthekeyandalistofdocumentsforthiskeyasthevalue.Thekeyisthesameasthevalueofthe_idfieldemittedfromthedocumentinthemapfunction.Wesimplyreturnedanewdocumentherewith_idasthenameofthestateandcountasthenumberofdocumentsforthisstate.Rememberthat,here,wereturnadocumentandhavenotemittedoneaswedidinthemapfunction.Again,finally,weinstantiatedBSONReducerandpassedittothereducerfunction.Thesourcecodeundertheprojectclonedonourlocalfilesystemisinthestreaming/language_support/python/pymongo_hadoop/reducer.pyfile,whichhastheimplementationoftheBSONReducerclassfile.

WefinallyinvokedthecommandfromtheshelltoinitiatetheMapReducejobthatusesstreaming.AfewthingstonoteherearethatweneedtwoJARfiles:oneintheshare/hadoop/tools/libdirectoryoftheHadoopdistributionandoneinthemongo-hadoopconnector,whichispresentinthestreaming/build/libs/directory.Theinputandoutputformatsarecom.mongodb.hadoop.mapred.MongoInputFormatandcom.mongodb.hadoop.mapred.MongoOutputFormat,respectively.

Aswesawearlier,sysoutandsysinformthebackboneofstreaming.So,basically,weneedtoencodeourBSONobjectstowritetosysout;then,weshouldbeabletoreadsysintoconvertthecontenttoBSONobjectsagain.Forthispurpose,themongo-hadoopconnectorprovideduswithtwoframeworkclasses,com.mongodb.hadoop.streaming.io.MongoInputWriterandcom.mongodb.hadoop.streaming.io.MongoOutputReader,toencodeanddecodefromandtoBSONobjects,respectively.Theseclassesextendfromorg.apache.hadoop.streaming.io.InputWriterandorg.apache.hadoop.streaming.io.OutputReader.

Thevalueofthestream.io.identifier.resolver.classpropertyisgivenascom.mongodb.hadoop.streaming.io.MongoIdentifierResolver.Thisclassextendsfrom

org.apache.hadoop.streaming.io.IdentifierResolverandgivesusachancetoregisterourimplementationsoforg.apache.hadoop.streaming.io.InputWriterandorg.apache.hadoop.streaming.io.OutputReaderwiththeframework.WealsoregisteredtheoutputkeyandoutputvalueclassusingourcustomIdentifierResolver.Justremembertousethisresolveralwaysifyouareusingstreamingusingthemongo-hadoopconnector.

Finally,wegavethemapperandthereducerPythonfunctions,whichwediscussedearlier.Animportantthingtorememberis,donotprintoutlogstosysoutfromthemapperandreducerfunctions.Thesysoutandsysinarethemeansofcommunication,andwritinglogstothemcanyieldundesirablebehavior.Aswesawintheexample,writetostandarderror(stderr)or,alternatively,writetoalogfile.

WhenusingamultilinecommandinUnix,youcancontinuethecommandonthenextlineusing\.However,rememberthatthereshouldbenospacesafter\.

RunningaMapReducejobonAmazonEMRThisrecipeinvolvesrunningtheMapReducejobonthecloudusingAWS.YouwillneedanAWSaccountinordertoproceed.RegistertoAWSathttp://aws.amazon.com/.WewillseehowtorunaMapReducejobonthecloudusingAmazonElasticMapReduce(EMR).AmazonEMRisamanagedMapReduceserviceprovidedbyAmazononthecloud.Formoredetails,refertohttps://aws.amazon.com/elasticmapreduce/.AmazonEMRrequiresthedata,binaries/jars,andsoontobepresentintheS3bucketthatitprocesses.ItthenwritestheresultsbacktotheS3bucket.AmazonSimpleStorageService(S3)isanotherservicebyAWSfordatastorageonthecloud.FormoredetailsonAmazonS3,refertohttp://aws.amazon.com/s3/.Thoughwewillusethemongo-hadoopconnector,aninterestingfactisthatwewon’trequireaMongoDBinstancetobeupandrunning.WewillusetheMongoDBdatadumpstoredinanS3bucketanduseitforourdataanalysis.TheMapReduceprogramwillrunontheinputBSONdumpandgeneratetheresultingBSONdumpintheoutputbucket.ThelogsoftheMapReduceprogramwillbewrittentoanotherbucketdedicatedtologs.Thefollowingdiagramgivesusanideaofhowoursetupwilllooklikeatahighlevel:

GettingreadyWewillusethesameJavasampleforthisrecipeastheoneweusedintheWritingourfirstHadoopMapReducejobrecipe.Toknowmoreaboutthemapperandreducerclassimplementation,refertotheHowitworks…sectionoftheWritingourfirstHadoopMapReducejobrecipe.Wehaveamongo-hadoop-emr-testprojectavailablewiththecodethatcanbedownloadedfromthebook’swebsite;thiscodeisusedtocreateaMapReducejobonthecloudusingAWSEMRAPIs.

Tosimplifythings,wewilluploadjustoneJARfiletotheS3buckettoexecutetheMapReducejob.ThisJARfilewillbeassembledusingaBATfileforWindowsandashellscriptonUnix-basedoperatingsystems.Themongo-hadoop-emr-testJavaprojecthasamongo-hadoop-emr-binariessubdirectorythatcontainsthenecessarybinariesalongwiththescriptstoassemblethemintooneJARfile.TheassembledJARfilenamedmongo-hadoop-emr-assembly.jarisalsoprovidedinthesubdirectory.Runningthe.bator.shfilewilldeletethisJARfileandregeneratetheassembledJARfile;itisnotmandatorytodothis.TheassembledJARfilethatisalreadyprovidedisgoodenoughandwillworkjustfine.TheJavaprojectcontainsadatasubdirectorywithapostalCodes.bsonfileinit.ThisistheBSONdumpgeneratedoutofthedatabasethatcontainsthepostalCodescollection.ThemongodumputilityprovidedwiththeMongodistributionisusedtoextractthisdump.

Howtodoit…1. ThefirststepofthisexerciseistocreateabucketonS3.Youmightchoosetousean

existingbucket.However,forthisrecipe,Iamcreatingabucketnamedcom.packtpub.mongo.cookbook.emr-in.RememberthatthenameofthebuckethastobeuniqueacrossalltheS3buckets;otherwise,youwillnotbeabletocreateabucketwiththisveryname.Youwillhavetocreateonewithadifferentnameanduseitinplaceofcom.packtpub.mongo.cookbook.emr-inusedinthisrecipe.

TipDonotcreatebucketnameswithanunderscore(_);useahyphen(-)instead.Bucketcreationwithanunderscorewillnotfail,buttheMapReducejobwillfaillaterasitdoesn’tacceptunderscoresinthebucketnames.

2. WewilluploadtheassembledJARfilesanda.bsonfileforthedatatothenewlycreated(orexisting)S3bucket.Touploadthefiles,wewillusetheAWSwebconsole.ClickontheUploadbuttonandselecttheassembledJARfileandthepostalCodes.bsonfiletobeuploadedontheS3bucket.Afterupload,thecontentsofthebucketwilllooklikethefollowingscreenshot:

ThefollowingstepsaretoinitiatetheEMRjobfromtheAWSconsolewithoutwritingasinglelineofcode.WewillalsoseehowtoinitiatethesameusingtheAWSJavaSDK.Followsteps4to9ifyouarelookingtoinitiatetheEMRjobfromtheAWSconsole.Followsteps10and11tostarttheEMRjobusingtheJavaSDK.

1. WewillfirstinitiateaMapReducejobfromtheAWSconsole.Visithttps://console.aws.amazon.com/elasticmapreduce/andclickontheCreateClusterbutton.IntheClusterConfigurationscreen,enterthedetailsshowninthefollowingscreenshot,exceptfortheloggingbucket.Youwillneedtoselectthebuckettowhichthelogsneedtobewritten.Youmightalsoclickonthefoldericonnexttothetextboxforthebucketnameandselectthebucketpresentforyouraccounttobeusedastheloggingbucket,asshowninthefollowingscreenshot:

2. TheTerminationprotectionoptionissettoNo,asthisisatestinstance.Inthecaseofanyerror,wewouldprefertheinstancestoterminatetoavoidkeepingthemrunningandincurcharges.

3. IntheSoftwareConfigurationsection,selecttheHadoopversionas2.4.0andAMIversionas3.1.0.Removetheadditionalapplicationsbyclickingonthecrossnexttotheirnames,asshowninthefollowingscreenshot:

4. IntheHardwareConfigurationsection,selecttheEC2instancetypeasm1.medium.ThisistheminimumweneedtoselectforHadoopVersion2.4.0.Thenumberofinstancesfortheslaveandtaskinstancesiszero.Thefollowingscreenshotshowstheconfigurationselected:

5. IntheSecurityandAccesssection,leaveallthedefaultvalues.WealsohavenoneedforaBootstrapAction,soleavethisasistoo.

6. ThenextstepistosetupstepsfortheMapReducjob.IntheAddstepdrop-downmenu,selecttheCustomJARoptionandselecttheAuto-terminateoptiontoYes,asshowninthefollowingscreenshot:

7. Now,clickontheConfigureandaddbuttonandenterthedetails.8. ThevalueoftheJARS3Locationfieldisgivenas

s3://com.packtpub.mongo.cookbook.emr-in/mongo-hadoop-emr-assembly.jar.Thisisthelocationinmyinputbucket;youneedtochangetheinputbucketasperyourowninputbucket.ThenameoftheJARfilewillbesame.

9. EnterthefollowingargumentsintheArgumentstextarea.Thenameofthemainclassisfirstinthelist:

com.packtpub.mongo.cookbook.TopStateMapReduceEntrypoint

-Dmongo.job.input.format=com.mongodb.hadoop.BSONFileInputFormat

-Dmongo.job.mapper=com.packtpub.mongo.cookbook.TopStatesMapper

-Dmongo.job.reducer=com.packtpub.mongo.cookbook.TopStateReducer

-Dmongo.job.output=org.apache.hadoop.io.Text

-Dmongo.job.output.value=org.apache.hadoop.io.IntWritable

-Dmongo.job.output.value=org.apache.hadoop.io.IntWritable

-Dmongo.job.output.format=com.mongodb.hadoop.BSONFileOutputFormat

-Dmapred.input.dir=s3://com.packtpub.mongo.cookbook.emr-

in/postalCodes.bson

-Dmapred.output.dir=s3://com.packtpub.mongo.cookbook.emr-out/

10. Again,thevalueofthefinaltwoargumentscontainstheinputandoutputbucketsusedformyMapReducesample.Thisvaluewillchangeaccordingtoyourowninputandoutputbuckets.ThevalueofActiononfailurewillbeTerminatecluster.Thefollowingscreenshotshowsthevaluesfilled.ClickonSaveafteralltheprecedingdetailsareenteredin:

11. Now,clickontheCreateClusterbutton.Thiswilltakesometimetoprovisionandstartthecluster.

12. Inthefollowingfewsteps,wewillcreateaMapReducejobonEMRusingtheAWSJavaAPI.ImporttheEMRTestprojectprovidedwiththecodesamplesintoyourfavoriteIDE.Onceimported,openthecom.packtpub.mongo.cookbook.AWSElasticMapReduceEntrypointclass.

13. Therearefiveconstantsthatneedtobechangedintheclass.Theyareinput;output;logbucket,whichyouwilluseforyourexample;theEC2keyname;theAWSaccess;andthesecretkey.Theaccesskeyandsecretkeyactastheusernameandpassword,respectively,whenyouuseAWSSDK.Changethesevaluesaccordinglyandruntheprogram;onsuccessfulexecution,itshouldgiveyouajobIDforthenewlyinitiatedjob.

14. IrrespectiveofhowyouinitiatedtheEMRjob,visittheEMRconsoleathttps://console.aws.amazon.com/elasticmapreduce/toseethestatusofyoursubmittedID.ThejobIDyouseeinthesecondcolumnofyourinitiatedjobswillbethesameasthejobIDprintedtotheconsolewhenyouexecutedtheJavaprogram(ifyouinitiatedthejobusingtheJavaprogram).Clickonthenameofthejobinitiated;thisshouldnavigateyoutothejob-detailspage.Thehardwareprovisioningwilltakesometimeandthen,finally,yourMapReducestepwillrun.Oncethejobiscomplete,thestatusofthejobwilllooklikethefollowingscreenshot:

15. WhentheStepssectionisexpanded,itwilllooklikethefollowingscreenshot:

16. ClickonthestderrlinkbelowtheLogfilessectiontoviewallthelogs’outputfortheMapReducejob.

17. NowthattheMapReducejobiscomplete,ournextstepistoseetheresultsofit.VisittheS3consoleathttps://console.aws.amazon.com/s3,andvisittheoutbucketset.Inmycase,thefollowingisthecontentoftheoutbucket:

Thepart-r-0000.bsonfileinterestsus.ThisfilecontainstheresultsofourMapReducejob.

18. DownloadthefiletoyourlocalfilesystemandimportitintoarunningMongoinstancelocallyusingthemongorestoreutilityasfollows.Notethattherestoreutilityforthefollowingcommandexpectsamongodinstancetobeupandrunningandlisteningtoport27017andthepart-r-0000.bsonfileinthecurrentdirectory:

$mongorestorepart-r-00000.bson-dtest-cmongoEMRResults

19. Now,connecttothemongodinstanceusingtheMongoshellandexecutethefollowingquery:

>db.mongoEMRResults.find().sort({count:-1}).limit(5)

{"_id":"Maharashtra","count":6446}

{"_id":"Kerala","count":4684}

{"_id":"TamilNadu","count":3784}

{"_id":"AndhraPradesh","count":3550}

{"_id":"Karnataka","count":3204}

20. Theprecedingcommandshowsthetopfiveresults.Ifwecomparetheresultswegot

inChapter3,ProgrammingLanguageDrivers,forusingMongo’sMapReduceframeworkandtheWritingourfirstHadoopMapReducejobrecipeinthischapter,wewillseethattheresultsareidentical.

Howitworks…AmazonEMRisamanagedHadoopservicethattakescareofhardwareprovisioningandkeepsyouawayfromthehassleofsettingupyourowncluster.TheconceptsrelatedtoourMapReduceprogramarealreadycoveredintheWritingourfirstHadoopMapReducejobrecipe,andthereisnothingadditionaltomention.OnethingwedidwastoassembletheJARsthatweneedintoonebigfatJARtoexecuteourMapReducejob.ThisapproachisOKforoursmallMapReducejob.Inthecaseoflargerjobswherealotofthird-partyJARsareneeded,wewillhavetogoforanapproachwherewewilladdtheJARstothelibdirectoryoftheHadoopinstallationandexecuteitinthesamewaywedidinourMapReducejobthatweexecutedlocally.Anotherthingthatwediddifferentlyfromwhatwedidinourlocalsetupwastonotuseamongodinstancetosourcethedataandwritethedata;instead,weusedBSONdumpfilesfromtheMongodatabaseasaninputandwritetheoutputtoBSONfiles.TheoutputdumpwillthenbeimportedtoaMongodatabaselocally,andtheresultswillbeanalyzed.ItisprettycommontohavedatadumpsuploadedtoS3buckets;thus,runninganalyticsjobsonthisdatauploadedtoS3onthecloudusingcloudinfrastructureisagoodoption.ThedataaccessedfromthebucketsbytheEMRclusterneednothavepublicaccess,astheEMRjobrunsusingouraccount’scredentials;wearegoodtoaccessourownbucketstoreadandwritedata/logs.

SeealsoThedeveloper’sguideforEMRathttp://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/.https://github.com/mongodb/mongo-hadoop/tree/master/examples/elastic-mapreducetoseethesampleMapReducejobontheenrondatasetgivenaspartofthemongo-hadoopconnector’sexamples.YoumightchoosetoimplementthisexampleonAmazonEMRasperthegiveninstructions.

Chapter9.OpenSourceandProprietaryToolsInthischapter,wewillcoversomeopensourceandproprietarytools.Wewillcoverthefollowingrecipesinthischapter:

Developingusingspring-data-mongodbAccessingMongoDBusingJavaPersistenceAPIAccessingMongoDBoverRESTInstallingtheGUI-basedclient,MongoVUE,forMongoDB

IntroductionThereisavastarrayoftools/frameworksavailabletoeasethedevelopment/administrationprocessforsoftwarethatusesMongoDB.Wewilllookatsomeoftheseavailableframeworksandtools.Fordeveloperproductivity(Javadevelopersinthiscase),wewilllookatspring-data-mongodb,whichispartofthepopularSpringdatasuite.

JavaPersistenceAPI(JPA)isanobjectrelationalmapping(ORM)specificationthatiswidelyused,particularlywithrelationaldatabases(thiswastheobjectiveofORMframeworks).However,therearefewimplementationsthatletususeitwithNoSQLstores,MongoDBinthiscase.Wewilllookatoneproviderthatprovidesthisimplementationandputittothetestwithasimpleusecase.

Wewillusespring-data-resttoexposeCRUDrepositoriesforMongoDBoveraRESTinterfaceforclientstoinvokevariousoperationssupportedbytheunderlyingspring-data-mongorepository.

QueryingthedatabasefromtheshellisOK,butitwillbenicetohaveagoodGUItoenableustodoalladministrative/development-relatedtasksfromtheGUIratherthanexecutingthecommandsfromtheshelltoperformtheseactivities.Wewilllookatonesuchtool,MongoVUE,inthischapter.

Developingusingspring-data-mongodbFromtheperspectiveofdevelopers,whenaprogramneedstointeractwithaMongoDBinstance,theyneedtousetherespectiveclientAPIsfortheirspecificplatforms.Thetroublewithdoingthisisthatweneedtowritealotofboilerplatecode,anditisnotnecessarilyobject-oriented.Forinstance,wehaveaclasscalledPersonwithvariousattributessuchasname,age,andaddress.ThecorrespondingJSONdocumenttoosharesasimilarstructuretothisPersonclassasfollows:

{

name:"…",

age:..,

address:{lineOne:"…",…}

}

However,tostorethisdocument,weneedtoconvertthePersonclasstoaDBObject,whichisamapwithkeyandvaluepairs.WhatisreallydesiredistoletuspersistthisPersonclassitselfasanobjectinthedatabase,withouthavingtoconvertittoDBObject.

Also,someoftheoperationssuchassearchingbyaparticularfieldofadocument,savinganentity,deletinganentity,andsearchingbyIDareprettycommon,andwetendtorepeatedlywritesimilarboilerplatecode.Inthisrecipe,wewillseehowspring-data-mongodbrelievesusoftheselaboriousandcumbersometasksnotonlytoreducethedevelopmenteffortbutalsotodramaticallyreducethepossibilityofintroducingbugsinthesecommonlywrittenfunctions.

GettingreadyTheSpringDataMongoTestprojectpresentinthebundleinthechapterisaMavenprojectandistobeimportedintoanyIDEofyourchoice.TherequiredMavenartifactswillautomaticallybedownloaded.AsingleMongoDBinstancethatlistenstoport27017isrequiredtobeup-and-running.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,toknowhowtostartastandaloneinstance.

Fortheaggregationexample,wewillusethepostalcodedata.RefertotheCreatingtestdatarecipeinChapter2,Command-lineOperationsandIndexes,forthecreationoftestdata.

Howtodoit…1. Wewillexploretherepositoryfeatureofspring-data-mongodbfirst.Openthetest

caseclassnamedcom.packtpub.mongo.cookbook.MongoCrudRepositoryTestfromyourIDEandexecuteit.IfallgoeswellandtheMongoDBserverinstanceisreachable,thetestcasewillexecutesuccessfully.

2. Anothertestcasenamedcom.packtpub.mongo.cookbook.MongoCrudRepositoryTest2isusedtoexploremorefeaturesoftherepositorysupportprovidedbyspring-data-mongodb.Thistestcasetooshouldgetexecutedsuccessfully.

3. WewillseehowMongoTemplateofspring-data-mongodbcanbeusedtoperformCRUDoperationsandothercommonoperationsonMongoDB.Openthecom.packtpub.mongo.cookbook.MongoTemplateTestclassandexecuteit.

4. Alternatively,ifanIDEisnotused,allthetestscanbeexecutedusingMavenfromthecommandpromptasfollows,withthecurrentdirectorybeingtherootoftheSpringDataMongoTestproject:

$mvncleantest

Howitworks…Wewillfirstlookatwhatwedidincom.packtpub.mongo.cookbook.MongoCrudRepositoryTest,wherewesawtherepositorysupportprovidedbyspring-data-mongodb.Justincaseyoudidn’tnotice,wehaven’twrittenasinglelineofcodefortherepository.ThemagicofimplementingtherequiredcodeforusisdonebytheSpringdataproject.Let’sstartbylookingattherelevantportionsoftheXMLconfigfile:

<mongo:repositoriesbase-package="com.packtpub.mongo.cookbook"/>

<mongo:mongoid="mongo"host="localhost"port="27017"/>

<mongo:db-factoryid="factory"dbname="test"mongo-ref="mongo"/>

<mongo:templateid="mongoTemplate"db-factory-ref="factory"/>

Wewillfirstlookatthelastthreelines:spring-data-mongodbnamespacedeclarationstoinstantiatecom.mongodb.Mongo,instantiatingafactoryforthecom.mongodb.DBinstancesfromtheclient,andatemplateinstance,usedtoperformvariousoperationsonMongoDB,respectively.Wewillseeorg.springframework.data.mongodb.core.MongoTemplateinmoredetaillater.

ThefirstlineisanamespacedeclarationforthebasepackageofalltheCRUDrepositorieswehave.Inthispackage,wehaveaninterfacewiththefollowingbody:

publicinterfacePersonRepositoryextends

PagingAndSortingRepository<Person,Integer>{

/**

*

*@paramlastName

*@return

*/

PersonfindByLastName(StringlastName);

}

ThePagingAndSortingRepositoryinterfaceisfromtheorg.springframework.data.repositorypackageoftheSpringdatacoreprojectandextendsfromCrudRepositoryofthesameproject.TheseinterfacesgiveussomemostcommonmethodssuchassearchingbytheID/primarykey,deletinganentity,insertinganentity,andupdatinganentity.Therepositoryneedsanobjectthatitmapstotheunderlyingdatastore.TheSpringdataprojectsupportsalargenumberofdatastoresthatarenotjustlimitedtoSQL(usingJavaDatabaseConnectivity(JDBC)andJPA)orMongoDBbutalsotootherNoSQLstoressuchasRedisandHadoopandsearchenginessuchasSolrandElasticsearch.Inthecaseofspring-data-mongodb,theobjectismappedtoadocumentinthecollection.

ThePagingAndSortingRepository<Person,Integer>signatureindicatesthatthefirstoneistheentitythattheCRUDrepositoryisbuiltfor,andthesecondoneisthetypeoftheprimarykey/IDfield.

WehaveaddedjustonemethodnamedfindByLastName;thisacceptsonestringvalueforthelastnameasaparameter.Thisisaninterestingoperation;itisspecifictoour

repositoryandnotevenimplementedbyus,butitwillstillworkjustasexpected.ThePersonclassisaPOJOwherewehaveannotatedtheIDfieldwiththeorg.springframework.data.annotation.Idannotation.Nothingelseisreallyspecialaboutthisclass;itjusthassomeplaingettersandsetters.

Withallthesesmalldetails,let’sjointhesedotstogetherbyansweringsomequestionsyou’llhaveinmind.First,wewillseewhichserver,database,andcollectionourdatagoesto.Ifwelookatthemongo:mongoXMLdefinitionfortheconfigfile,wewillseethatweinstantiatedthecom.mongodb.Mongoclassbyconnectingtoalocalhostandport27017.Themongo:db-factorydeclarationisusedtodenotethatthedatabasetobeusedistest.Onefinalquestionis,whichcollection?ThesimplenameofourclassisPerson.Thenameofthecollectionisthesimplenamewiththefirstcharacterlowercased;thus,Personwillbecomeperson,andsomethinglikeBillingAddresswillbecomethebillingAddresscollection.Thesearethedefaultvalues.However,ifyouneedtooverridethisvalue,youcanannotateyourclasswiththeorg.springframework.data.mongodb.core.mapping.Documentannotationanduseitsattributecollectiontogiveanynameofyourchoice,aswewillseeinanexamplelater.

Toviewthedocumentinthecollection,executejustonetestcasemethodcalledsaveAndQueryPersonfromthecom.packtpub.mongo.cookbook.MongoCrudRepositoryTestclass.Now,connecttotheMongoDBinstancefromtheMongoshellandexecutethefollowingquery:

>usetest

>db.person.findOne({_id:1})

{

"_id":1,

"_class":"com.packtpub.mongo.cookbook.domain.Person",

"firstName":"Steve",

"lastName":"Johnson",

"age":20,

"gender":"Male"

}

Aswecanseeintheprecedingresult,thecontentsofthedocumentaresimilartotheobjectwepersistedusingtheCRUDrepository.ThenamesofthefieldinthedocumentarethesameasthenamesoftherespectiveattributesintheJavaobject,withtwoexceptions.Thefieldannotatedwith@Idisnow_id,irrespectiveofthenameofthefieldintheJavaclass,andanadditional_classattributeisaddedtothedocumentwhosevalueisthefullyqualifiednameoftheJavaclassitself.Thisisnotofanyusefortheapplicationbutisusedbyspring-data-mongodbasmetadata.

Now,itmakesmoresenseandgivesusanideaaboutwhatspring-data-mongodbmustbedoingforallthebasicCRUDmethods.AlltheoperationsweperformwillusetheMongoTemplateclass(MongoOperationstobeprecise,whichisaninterfacethatMongoTemplateimplements)fromthespring-data-mongodbproject.Tofinditbyusingtheprimarykey,itwillinvokeafindquerybythe_idfieldonthecollectionderivedusingthePersonentityclass.Thesavemethodsimplycallsthesavemethodon

MongoOperations;thisinturncallsthesavemethodonthecom.mongodb.DBCollectionclass.

Westillhaven’tansweredhowthefindByLastNamemethodworked.HowdoesSpringknowwhatquerytoinvoketoreturnthedata?Thesearethespecialtypesofmethodsthatbeginwithfind,findBy,get,orgetBy.Therearesomerulesthatoneneedstofollowwhilenamingamethod,andtheproxyobjectontherepositoryinterfaceisabletocorrectlyconvertthismethodintoanappropriatequeryonthecollection.Forinstance,thefindByLastNamemethodintherepositoryofthePersonclasswillexecuteaqueryonthelastNamefieldinthedocumentofthePersonclass.Hence,thefindByLastName(StringlastName)methodwillfirethefollowingqueryinthedatabase:

db.person.find({'lastName':lastName})

Basedonthereturntypeofthemethoddefined,itwillreturneitheralistorthefirstresultinthereturnedresultfromthedatabase.WehaveusedfindByinourqueries.However,foranythingthatbeginswithfind,havinganytextinbetweenandhavingtextthatendsinByworks.Forinstance,findPersonByisthesameasfindBy.

FormoreinformationonthesefindBymethods,wehaveanothertestclass,MongoCrudRepositoryTest2.OpenthisclassinyourIDEwhereitcanbereadalongwiththistext.Wehavealreadyexecutedthistestcase;now,let’sseethesefindBymethodsusedandtheirbehavior.ThisclasshassevenfindBymethodsinit,withoneofthemethodsbeingavariantofanothermethodinthesameinterface.Togetaclearideaofthequeries,wewillfirstlookatoneofthedocumentsinthepersonTwocollectioninthetestdatabase.ExecutethefollowingcommandsontheMongoshellconnectedtotheMongoDBserverthatrunsonalocalhost:

>usetest

>db.personTwo.findOne({firstName:'Amit'})

{

"_id":2,

"_class":"com.packtpub.mongo.cookbook.domain.Person2",

"firstName":"Amit",

"lastName":"Sharma",

"age":25,

"gender":"Male",

"residentialAddress":{

"addressLineOne":"20,Centralstreet",

"city":"Mumbai",

"state":"Maharashtra",

"country":"India",

"zip":"400101"

}

}

Also,notethattherepositoryusesthePerson2class.However,thenameofthecollectionusedispersonTwo.Thiswaspossiblebecauseweusedthe@Document(collection="personTwo")annotationontopofthePerson2class.

Gettingbacktothesevenmethodsinthe

com.packtpub.mongo.cookbook.PersonRepositoryTworepositoryclass,let’slookatthemonebyone:

Method Description

findByAgeGreaterThanEqual

Thismethodwillfirethe{'age':{'$gte':<age>}}queryonthepersonTwocollection.

Thesecretliesinthenameofthemethod.Ifwebreakitup,whatwehaveafterfindBytellsuswhatwewant.Theageproperty(withfirstcharacterlowercased)isthefieldthatwouldbequeriedonthedocumentwiththe$gteoperatorbecausewehaveGreaterThanEqualinthenameofthemethod.Thevaluethatwouldbeusedforthecomparisonwouldbethevalueoftheparameterpassed.TheresultisacollectionofPerson2entities,aswewillhavemultiplematches.

findByAgeBetween

Thismethodwillqueryonagebutwilluseacombinationof$gtand$lttofindthematchingresult.Thequeryinthiscasewillbe{'age':{'$gt':from,'$lt':to}}.Itisimportanttonotethatboththefromandtovaluesareexclusiveintherange.Therearetwomethodsinthetestcase:findByAgeBetweenandfindByAgeBetween2.Thesemethodsdemonstratethebehaviorofthebetweenqueryfordifferentinputvalues.

findByAgeGreaterThan

Thismethodisaspecialmethodthatalsosortstheresultbecausetherearetwoparameterstothemethod:thevalueagainstwhichtheageparameterwillbecomparedisthefirstparameter,andthesecondparameteristhefieldoftypeorg.springframework.data.domain.Sort.Formoredetails,refertotheJavadocforspring-data-mongodb.

findPeopleByLastNameLike

Thismethodisusedtofindresultsbythelastnamethatmatchesapattern.Regularexpressionsareusedforthematchingpurpose.Forinstance,inthiscase,thequeryfiredwillbe{'lastName':<lastNameasregex>}.Thismethod’snamebeginswithfindPeopleByinsteadoffindBy,whichworksinthesamewayasfindBy.Thus,whenwesayfindByinallthedescriptions,weactuallymeanfind…By.

Thevalueprovidedastheparameterwillbeusedtomatchthelastname.

findByResidentialAddressCountry

Thisisaninterestingmethodtolookat.Here,wearelookingtosearchbythecountryoftheresidentialaddress.Thisis,infact,afieldintheAddressclassintheresidentialAddressfieldoftheperson.TakealookatthedocumentfromthepersonTwocollectionmentionedearlierforhowthequerywillbeused.

WhentheSpringdatafindsthenameasResidentialAddressCountry,itwilltrytofindvariouscombinationsusingthisstring.Forinstance,itcanlookatresidentialAddressCountryinPersonorinresidential.addressCountry,residentialAddress.country,orresidential.address.country.Iftherearenoconflictingvalues,asinourcase,residentialAddress.countryisasuccesspathinthePerson2objecttree,andthus,thiswillbeusedinthequery.

However,ifthereareconflicts,thenunderscorescanbeusedtoclearlyspecifywhatwearelookingat.Inthiscase,themethodcanberenamedtofindByResidentialAddress_countrytoclearlyspecifywhatweexpectastheresult.ThefindByCountry2testcasemethoddemonstratesthis.

Thisisaninterestingmethodaswell.Wearenotalwaysabletousethemethodnamestoimplementwhatweactuallywantto,asthenameofthemethodrequiredforSpringtoautomaticallyimplementthequerymightbebitawkwardtouseasis.Forinstance,findByCountryOfResidencesoundsbetterthanfindByResidentialAddressCountry.However,wearestuckwiththelatter,as

findByFirstNameAndCountry

thisishowspring-data-mongodbwillconstructthequery.UsingfindByCountryOfResidencegivesnodetailsonhowtoconstructthequerytoSpringdata.

However,thereisasolutiontothis.Youmightchoosetousethe@Queryannotationandspecifythequerytobeexecutedwhenthemethodisinvoked.Thefollowingistheannotationweusedinourcase:

@Query("{'firstName':?0,'residentialAddress.country':?1}")

Wewritethevalueasaquerythatwillgetexecuted,andwebindtheparametersofthefunctionstothequeryasnumberedparametersthatstartfrom0.Thus,thefirstparameterofthemethodwillbeboundto?0,thesecondto?1,andsoon.

WesawhowthefindByorgetBymethodisautomaticallytranslatedtoqueriesforMongoDB.Similarly,wehavesomewell-knownprefixesforthemethods.ThecountByprefixreturnsthelongnumberforthecountforagivencondition,whichisderivedfromtherestofthemethodnamesthataresimilartofindBy.WecanhavedeleteByorremoveBytodeletethedocumentsbythederivedcondition.Also,onethingtonoteaboutthecom.packtpub.mongo.cookbook.domain.Person2classisthatitdoesnothaveano-argumentconstructororsettertosetthevalues.Springwill,instead,usereflectiontoinstantiatethisobject.

AlotoffindBymethodsaresupportedbyspring-data-mongodb,andallarenotcoveredhere.Formoredetails,refertothespring-data-mongodbreferencemanual.AlotofXML-basedorJava-basedconfigurationoptionsareavailabletooandcanbefoundinthereferencemanual.ThelinksaregivenintheSeealsosectionlaterinthisrecipe.

Wearenotdoneyet,though;wehaveanothertestcase,com.packtpub.mongo.cookbook.MongoTemplateTest.Thisusesorg.springframework.data.mongodb.core.MongoTemplatetoperformvariousoperations.Wecanopenthetestcaseclass,andwewillseewhatoperationsareperformedandwhichmethodsoftheMongoTemplateclassareinvoked.

Let’slookatsomeoftheimportantandfrequentlyusedmethodsoftheMongoTemplateclass:

Method Description

save

Thismethodisusedtosave(insertifnew;otherwise,update)anentityinMongoDB.Themethodtakesoneparameter,theentity,andfindsthetargetcollectionbasedonitsnameorthe@Documentannotationpresentinit.

Thereisanoverloadedversionofthesavemethodthatalsoacceptsthesecondparameter,whichisthenameofthecollectiontowhichthedataentitypassedneedstobepersisted.

remove

Thismethodwillbeusedtoremovedocumentsfromthecollection.Ithasgotsomeoverloadedversionsinthisclass.Allofthemaccepteitheranentitytobedeletedortheorg.springframework.data.mongodb.core.query.Queryinstance,whichisusedtodeterminethedocument(s)tobedeleted.Thesecondparameteristhenthenameofthecollectionfromwhichthedocumentistobedeleted.Whenanentityisprovided,thenameofthecollectioncanbederived.WithaQueryinstanceprovided,wehavetogiveeitherthenameofthecollectionortheentityclassname,whichinturnwillbeusedtoderivethenameofthecollection.

updateMulti

Thisisthefunctioninvokedtoupdatemultipledocumentswithoneupdatecall.Thefirstparameteristhequerythatwillbeusedtomatchthedocuments.Thesecondparameterisaninstanceoforg.springframework.data.mongodb.core.query.Update.Thisistheupdatethatwillbeexecutedonthedocumentsselectedusingthefirstqueryobject.Thenextparameteristheentityclassorthecollectionnametoexecutetheupdateon.Formoredetailsonthemethodanditsvariousoverloadedversions,refertotheJavadoc.

updateFirst

ThisistheoppositeoftheupdateMultimethod.Thisoperationwillupdatejustthefirstmatchingdocument.Wehavenotcoveredthismethodinourunittestcase.

insert

Wementionedthatthesavemethodcanperforminsertionsandupdates.TheinsertmethodinthetemplatecallstheinsertmethodoftheunderlyingMongoclient.Ifoneentitydocumentistobeinserted,thereisnodifferenceincallingtheinsertorsavemethod.

However,aswesawinthetestcaseintheinsertMultiplemethod,wecreatedalistofthreePersoninstancesandpassedthemtotheinsertmethod.AllthethreedocumentsforthethreePersoninstanceswillgototheserveraspartofonecall.Thebehaviorofaninsertwhenitfailsisdeterminedbythecontinueonerrorparameterofthewriteconcern.Itwilldeterminewhetherthebulkinsertfailsonthefirstfailureorcontinuesevenaftererrorsthatreportonlythelasterror.Thepageathttp://docs.mongodb.org/manual/core/bulk-inserts/givesmoredetailsonbulkinsertsandvariouswriteconcernparametersthatcanalterthebehavior.

findAndRemove/findAllAndRemove

Boththeseoperationsareusedtofindandthenremovethedocument(s).ThefindAndRemovemethodfindsonedocumentandthenreturnsthedeleteddocument.Thisoperationisatomic.However,thefindAllAndRemovemethodfindsallthedocumentsandremovesthembeforereturningthelistofalltheentitiesofallthedocumentsdeleted.

findAndModify

ThismethodisfunctionallysimilartofindAndModifythatwehavewiththeMongoclientlibrary.Itwillatomicallyfindandmodifythedocument.Ifthequerymatchesmorethanonedocument,onlythefirstmatchwillbeupdated.Thefirsttwoparametersofthismethodarethequeryandtheupdatetoexecute.Thenextparameteriseithertheentityclassorthecollectionnametoexecutetheoperationon.Also,thereisaspecialclasscalledorg.springframework.data.mongodb.core.FindAndModifyOptionsthatmakessenseonlyforthefindAndModifyoperation.Thisinstancetellsuswhetherwearelookingforthenewinstanceortheoldinstanceaftertheoperationisperformed,andwhethertheupsertoperationistobeperformedanditisrelevantonlyifthedocumentwiththematchingquerydoesn’texist.ThereisanadditionalBooleanflagtotelltheclientwhetherthisisafindandremoveoperation.Infact,thefindAndRemoveoperationwesawearlierisjustaconveniencefunctionthatdelegatestofindAndModifywiththisremoveflagset.

Intheprecedingtable,wementionedtheQueryandUpdateclasseswhentalkingaboutupdate.Thesearespecialconvenienceclassesinspring-data-mongodb;theyletusbuildMongoDBqueriesusingasyntaxthatiseasytounderstandandimprovesreadability.Forinstance,thequerytocheckwhetherlastNameisJohnsonintheMongoquerylanguageis{'lastName':'Johnson'}.Thesamequerycanbeconstructedinspring-data-mongodbasfollows:

newQuery(Criteria.where("lastName").is("Johnson"))

ThissyntaxlooksneatascomparedtogivingthequeryinJSON.Let’stakeanotherexamplewherewewanttofindallfemalesunder30yearsofageinourdatabase.Thequerywillnowbebuiltasfollows:

newQuery(Criteria.where("age").lt(30).and("gender").is("Female"))

Similarly,forupdate,wewanttosetayoungCustomerBooleanflagtobetrueforsomeofthecustomers,basedonsomeconditions.Tosetthisflaginthedocument,theMongoDBformatwillbeasfollows:

{'$set':{'youngCustomer':true}}

Inspring-data-mongodb,thesamewillbeachievedasfollows:

newUpdate().set("youngCustomer",true)

RefertotheJavadocforallthepossiblemethodsthatareavailabletobuildthequeryandupdatesinspring-data-mongodbthataretobeusedwithMongoTemplate.

ThemethodsmentionedearlierarebynomeanstheonlyonesavailableintheMongoTemplateclass.Therearealotofothermethodsforgeospatialindexes,conveniencemethodstogetthecountofdocumentsinthecollection,aggregation,andMapReducesupport,andsoon.RefertotheJavadocofMongoTemplateformoredetailsandmethods.

Speakingofaggregation,wealsohaveatestcasemethodcalledaggregationTesttoperformtheaggregationoperationonthecollection.WehaveapostalCodescollectioninMongoDB;thiscollectioncontainsthepostalcodedetailsofvariouscities.Anexampledocumentinthecollectionisasfollows:

{

"_id":ObjectId("539743b26412fd18f3510f1b"),

"postOfficeName":"ASDMelloRoadFullerMarg",

"pincode":400001,

"districtsName":"Mumbai",

"city":"Mumbai",

"state":"Maharashtra"

}

Ouraggregationoperationintendstofindthetopfivestatesbythenumberofdocumentsinthecollection.InMongo,theaggregationpipelinewilllookasfollows:

[

{'$project':{'state':1,'_id':0}},

{'$group':{'_id':'$state','count':{'$sum':1}}}

{'$sort':{'count':-1}},

{'$limit':5}

]

Inspring-data-mongodb,weinvokedtheaggregationoperationusingtheMongoTemplateclassasfollows:

Aggregationaggregation=newAggregation(

project("state","_id"),

group("state").count().as("count"),

sort(Direction.DESC,"count"),

limit(5)

);

AggregationResults<DBObject>results=mongoTemplate.aggregate(

aggregation,

"postalCodes",

DBObject.class);

Thekeyisincreatinganinstanceoftheorg.springframework.data.mongodb.core.aggregation.Aggregationclass.ThenewAggregationmethodisstaticallyimportedfromthesameclassandacceptsvarargsfordifferentinstancesoforg.springframework.data.mongodb.core.aggregation.AggregationOperationthatcorrespondtotheoneoperationinthechain.TheAggregationclasshasvariousstaticmethodstocreatetheinstancesofAggregationOperation.Wehaveusedafewofthem,suchasproject,group,sort,andlimit.Formoredetailsandavailablemethods,refertotheJavadoc.TheaggregatemethodinMongoTemplatetakesthreearguments.ThefirstoneistheinstanceoftheAggregationclass,thesecondoneisthenameofthecollection,andthethirdoneisthereturntypeoftheaggregationresult.Refertotheaggregationoperationtestcaseformoredetails.

SeealsoTheJavadocathttp://docs.spring.io/spring-data/mongodb/docs/current/api/formoredetailsandAPIdocumentationThereferencemanualforthespring-data-mongodbprojectathttp://docs.spring.io/spring-data/data-mongodb/docs/current/reference/

AccessingMongoDBusingJavaPersistenceAPIInthisrecipe,wewilluseaJPAproviderthatallowsustouseJPAentitiestoachieveobject-to-documentmappingwithMongoDB.

GettingreadyStartthestandaloneserverinstancethatlistenstoport27017.ThisisaJavaprojectusingJPA.FamiliaritywithJPAanditsannotationsisexpected,thoughwhatwewillbelookingatisfairlybasic.RefertotheConnectingtoasinglenodefromaJavaclientrecipeinChapter1,InstallingandStartingtheMongoDBServer,toknowhowtosetupMavenifyouarenotawareofit.DownloadtheDataNucleusMongoJPAprojectfromthecodebundleprovidedwiththebook.Thoughwewillexecutethetestcasesfromthecommandprompt,youmayimporttheprojectinyourfavoriteIDEtoviewthesourcecode.

Howtodoit…1. GototherootdirectoryoftheDataNucleusMongoJPAprojectandexecutethe

followingcommandintheshell:

$mvncleantest

2. Thiswilldownloadthenecessaryartifactsneededtobuildandruntheproject;then,executethetestcasessuccessfully.

3. Oncethetestcasesgetexecuted,openaMongoshellandconnecttothelocalinstance.

4. Executethefollowingqueryintheshell:

>usetest

>db.personJPA.find().pretty()

Howitworks…First,let’slookatasampledocumentthatgotcreatedinthepersonJPAcollection:

{

"_id":NumberLong(2),

"residentialAddress":{

"residentialAddress_zipCode":"400101",

"residentialAddress_state":"Maharashtra",

"residentialAddress_country":"India",

"residentialAddress_city":"Mumbai",

"residentialAddress_addressLineOne":"20,Centralstreet"

},

"lastName":"Sharma",

"gender":"Male",

"firstName":"Amit",

"age":25

}

Thestepsweexecutedareprettysimple;let’slookattheclassesusedonebyone.Wewillstartwiththecom.packtpub.mongo.cookbook.domain.Personclass.Atthetopoftheclass(afterthepackageandimports),wehavethefollowing:

@Entity

@Table(name="personJPA")

publicclassPerson{

ThisdenotesthatthePersonclassisanentity,andthecollectiontowhichitwillpersistispersonJPA.NotethatJPAwasdesignedprimarilyasanORMtool;thus,theterminologiesusedaremoreforarelationaldatabase.AtableinRDBMSissynonymouswithacollectioninMongoDB.Therestoftheclasscontainstheattributesofthepersonandthecolumnsannotatedwith@Columnand@Idfortheprimarykey.ThesearesimpleJPAannotations.Whatisinterestingtolookatisthecom.packtpub.mongo.cookbook.domain.ResidentialAddressclass,whichisstoredastheresidentialAddressvariableinthePersonclass.Ifwelookatthepersondocumentwegaveearlier,allthevaluesgiveninthe@Columnannotationarethenamesofthekeysforperson.Also,noticehowtheenumtoogetsconvertedtothestringvalue.However,theresidentialAddressfieldisthenameofthevariableinthePersonclassagainstwhichtheaddressinstanceisstored.IfwelookattheResidentialAddressclass,wewillseethe@Embeddableannotationontop,abovetheclassname.ThisisagainaJPAannotationthatdenotesthatthisinstanceisnotanentityuntoitselfbutisembeddedinanotherentityoranotherembeddableclass.Notethenamesofthefieldsinthedocument.Theyhavethefollowingformat:

<nameofthevariableinpersonclass>_<valueofthevariablenamein

ResidentialAddressclass>

Thereisoneproblemwenoticehere.Thenamesofthefieldsaretoolong,thusconsumingunnecessaryspace.Thesolutionistohaveashortervalueinthe@Columnannotation.Forinstance,havethefollowingannotation:

@Column(name="ln")insteadof@Column(name="lastName")

Thiswillcreatethekeywiththenamelninthedocument.Unfortunately,thisdoesn’tworkwiththeembeddedResidentialAddressclass;inthiscaseyouwillhavetodealwithshortervariablenames.Nowthatwehaveseentheentityclasses,let’slookatthepersistence.xmlfile:

<persistence-unitname="DataNucleusMongo">

<class>com.packtpub.mongo.cookbook.domain.Person</class>

<properties>

<propertyname="javax.persistence.jdbc.url"

value="mongodb:localhost:27017/test"/>

</properties>

</persistence-unit>

Wehavejustgotthepersistence-unitdefinitionhere,withthenameasDataNucleusMongo.Thereisoneclassnode,whichistheentitythatwewilluse.Notethattheembeddedaddressclassisnotmentionedhereasitisnotanindependententity.Intheproperties,wementionedtheURLofthedatastoretoconnectto.Inthiscase,weconnectedtotheinstanceonthelocalhost,whichisport27017,andthetestdatabase.

Now,let’slookattheclassthatqueriesandinsertsthedata.Thisisourtestclass,com.packtpub.mongo.cookbook.DataNucleusJPATest.Wewillcreatejavax.persistence.EntityManagerFactoryasPersistence.createEntityManagerFactory("DataNucleusMongo").Thisisathread-safeclass,anditsinstanceissharedacrossthreads.Also,thestringargumentisthesameasthenameofthepersistenceunitweusedinpersistence.xml.Allotherinvocationsonjavax.persistence.EntityManagertopersistorquerythecollectionrequireustocreateaninstanceusingEntityManagerFactory,useit,andthencloseitoncetheoperationiscomplete.AlltheoperationsperformedareaspertheJPAspecification.Thetestcaseclasspersistsentitiesandalsoqueriesthem.

Finally,wewilllookatpom.xml,particularly,theenhancerpluginweused;itisasfollows:

<plugin>

<groupId>org.datanucleus</groupId>

<artifactId>datanucleus-maven-plugin</artifactId>

<version>4.0.0-release</version>

<configuration>

<log4jConfiguration>${basedir}/src/main/resources/log4j.properties</log4jCo

nfiguration>

<verbose>true</verbose>

</configuration>

<executions>

<execution>

<phase>process-classes</phase>

<goals>

<goal>enhance</goal>

</goals>

</execution>

</executions>

</plugin>

TheentitieswewroteneedtobeenhancedtobeusedasJPAentitiesthatuseadatanucleus.Thisprecedingpluginwillbeattachedtotheprocess-classesphaseandthencalltheplugin’senhance.

Seealsohttp://www.datanucleus.org/products/datanucleus/jdo/enhancer.htmlforpossibleoptions.ThereisalsoapluginforEclipsetoallowentityclassestobeenhanced/instrumentedforadatanucleus.TheJPA2.1specificationathttps://www.jcp.org/aboutJava/communityprocess/final/jsr338/index.html.

AccessingMongoDBoverRESTInthisrecipe,wewillseehowtoaccessMongoDBandperformCRUDoperationsusingRESTAPIs.Wewillusespring-data-restforRESTaccessandspring-data-mongodbtoperformtheCRUDoperations.Beforeyoucontinuewiththisrecipe,itisimportanttoknowhowtoimplementCRUDrepositoriesusingspring-data-mongodb.RefertotheDevelopingusingspring-data-mongodbrecipetoknowhowtousethisframework.

Thequestionthatonemusthaveis,whyaRESTAPIisneeded?Therearescenarioswherethereisadatabasethatisbeingsharedbymanyapplications,possiblywrittenindifferentlanguages.WritingJPADAOorusingspring-data-mongodbisgoodenoughforJavaclients,butnotforclientsinotherlanguages.HavingAPIslocallywiththeapplicationdoesn’tevengiveusacentralizedwaytoaccessthedatabase.ThisiswhereRESTAPIscomeintoplay.Wecandeveloptheserver-sidedataaccesslayer,whichistheCRUDrepositoryinJava(spring-data-mongodbtobeprecise),andthenexposeitoveraRESTinterfaceforaclientwritteninanylanguagetoinvokeit.Now,wewillinvokeourAPIinaplatform-independentwayandthiswillalsogiveusasinglepointofentryintoourdatabase.

GettingreadyApartfromtheprerequisitesoftheDevelopingusingspring-data-mongodbrecipe,wehaveafewmoreforthisrecipe.ThefirstthingistodownloadtheSpringDataRestTestprojectfromthebook’swebsiteandimportitintoyourIDEasaMavenproject.Alternatively,ifyoudonotwishtoimportintotheIDE,youcanruntheserverthatservicestherequestsfromthecommandprompt,whichwewillseeinthenextsection.ThereisnospecificclientapplicationusedtoperformtheCRUDoperationsoverREST.IwilldemonstratetheconceptsusingtheChromebrowseranduseaspecialpluginofthebrowsercalledAdvancedRESTClienttosendHTTPPOSTrequeststotheserver.ThetoolscanbefoundundertheDeveloperToolssectionontheChromewebstore.

Howtodoit…1. IfyouhaveimportedtheprojectinyourIDEasaMavenproject,executethe

com.packtpub.mongo.cookbook.rest.RestServerclass,whichisthebootstrapclass,andlocallystarttheserverthatwillacceptclientconnections.

2. IftheprojectistobeexecutedfromthecommandpromptasaMavenproject,gototherootdirectoryoftheprojectandrunthefollowingcommand:

mvnspring-boot:run

3. Thefollowingoutputwillbeseeninthecommandpromptifallgoeswellandtheserverisstarted:

[INFO]Attachingagents:[]

4. Afterstartingtheserverineitherway,enterhttp://localhost:8080/peopleinthebrowser’saddressbar,andwewillseethefollowingJSONresponse.Thefollowingresponseisseenbecausetheunderlyingcollection,person,isempty:

{

"_links":{

"self":{

"href":"http://localhost:8080/people{?page,size,sort}",

"templated":true

},

"search":{

"href":"http://localhost:8080/people/search"

}

},

"page":{

"size":20,

"totalElements":0,

"totalPages":0,

"number":0

}

}

5. WewillnowinsertanewdocumentinthepersoncollectionusingtheHTTPPOSTrequesttohttp://localhost:8080/people.WewillsendaPOSTrequesttotheserverusingtheAdvancedRESTClientchromeextension.Thedocumentpostedisasfollows:

{"lastName":"Cruise","firstName":"Tom","age":52,"id":1}

Therequest’scontenttypeisapplication/json

6. ThefollowingscreenshotshowsthePOSTrequestsenttotheserverandtheresponsefromtheserver:

7. Wewillnowquerythisdocumentfromthebrowserusingthe_idfield,whichis1inthiscase.Enterhttp://localhost:8080/people/1inthebrowser’saddressbar.Youwillseethedocumentweinsertedinstep5.

8. Nowthatwehaveonedocumentinthecollection(youmighttrytoinsertmoredocumentsforpeoplewithdifferentnamesand,moreimportantly,auniqueID),wewillquerythedocumentusingthelastname.However,firsttypeinhttp://localhost:8080/people/searchinthebrowser’saddressbartoviewallthesearchoptionsavailable.Wewillseeonesearchmethod,findByLastName,thatacceptsacommand-lineparameter,lastName.

9. Tosearchbythelastname,Cruiseinourcase,enterhttp://localhost:8080/people/search/findByLastName?lastName=Cruiseinthebrowser’saddressbar.

10. WewillnowupdatethelastnameandageofthepersonwithID1;TomCruiseitisfornow.Let’supdatethelastnametoHanksandageto58.Todothis,wewillusetheHTTPPATCHrequest,andtherequestwillbesenttohttp://localhost:8080/people/1,whichuniquelyidentifiesthedocumenttoupdate.ThebodyoftheHTTPPATCHrequestis{"lastName":"Hanks","age":58}.Refertothefollowingscreenshotfortherequestwesentoutforupdate:

11. Tovalidatewhetherourupdatewentthroughsuccessfullyornot(weknowitdidaswegotaresponsestatus204afterthePATCHrequest),enterhttp://localhost:8080/people/1againinthebrowser’saddressbar.

12. Finally,wewilldeletethedocument.Thisisstraightforward,andwewillsimplysendaDELETErequesttohttp://localhost:8080/people/1.OncetheDELETErequestissuccessful,sendanHTTPGETrequestfromthebrowsertohttp://localhost:8080/people/1,andwewillnotgetanydocumentinreturn.

Howitworks…Wewillnotbereiteratingthespring-data-mongodbconceptsinthisrecipe,butwewilllookatsomeoftheannotationsweaddedspecificallyfortheRESTinterfacetotherepositoryclass.Thefirstoneisontopoftheclassnameasfollows:

@RepositoryRestResource(path="people")

publicinterfacePersonRepositoryextends

PagingAndSortingRepository<Person,Integer>{

ThisisusedtoinstructtheserverthatthisCRUDrepositorycanbeaccessedusingthepeopleresource.ThisisthereasonwhywealwaysmakeHTTPGETandPOSTrequestsonhttp://localhost:8080/people/.

ThesecondannotationisinthefindByLastNamemethod.Wehavethefollowingmethodsignature:

PersonfindByLastName(@Param("lastName")StringlastName);

Here,thelastNamemethodparameterisannotatedwiththe@Paramannotation,whichisusedtoannotatethenameoftheparameterthatwillhavethevalueofthelastNameparameterthatwillbepassedwhileinvokingthismethodontherepository.Ifwelookatstep9intheprevioussection,wewillseethatfindByLastNameisinvokedusinganHTTPGETrequest,andthevalueoftheURLparameter,lastName,isusedasthestringvaluepassedwhileinvokingtherepositorymethod.

Ourexamplehereisprettysimplewithjustoneparameterusedforthesearchoperation.Wecanhavemultipleparametersfortherepositorymethodand,accordingly,anequalnumberofparametersintheHTTPrequest,whichwillbemappedtotheseparametersforthemethodtobeinvokedontheCRUDrepository.Forsomedatatypesuchasdatestobesentout,usethe@DateTimeFormatannotation,whichwillbeusedtospecifythedateandtimeformat.Formoreinformationonthisannotationanditsusage,refertotheSpringJavadocathttp://docs.spring.io/spring/docs/current/javadoc-api/.

ThatwasallabouttheGETrequestwemaketotheRESTinterfacetoqueryandsearchdata.Initially,wecreateddocumentdatasendinganHTTPPOSTrequesttotheserver.Tocreatenewdocuments,wewillalwayssendaPOSTrequestwiththedocumenttobecreatedasabodyoftherequesttotheURLthatidentifiestheRESTendpoint,inourcase,http://localhost:8080/people/.AlldocumentspostedtothiscollectionwillmakeuseofPersonRepositorytopersistapersoninthecorrespondingcollection.

Ourfinalthreestepsweretoupdateanddeletetheperson.TheHTTPrequesttypestoperformtheseoperationsarePATCHandDELETE,respectively.Instep10,weupdatedthedocumentfortheperson,TomCruise,andupdatedhislastnameandage.Toachievethis,ourPATCHrequestwassenttoaURLhttp://localhost:8080/people/1;thisURLidentifiesaspecificpersoninstance.Notethat,whenwewantedtocreateanewperson,ourPOSTrequestwasalwayssenttohttp://localhost:8080/peopleasagainstthePATCHandDELETErequests,wherewesenttheHTTPrequesttoaURLthatrepresentsthespecificpersonwewanttoupdateordelete.Inthecaseofupdate,thebodyofthePATCH

requestisaJSONdocumentwhoseprovidedfieldswillreplacethecorrespondingfieldsinthetargetdocumenttoupdate.Alltheotherfieldswillbeleftastheyare.Inourcase,lastNameandageofthetargetdocumentwereupdated,andfirstNamewasleftuntouched.Inthecaseofdelete,themessagebodywasnotempty,andtheDELETErequestitselfinstructsthatthetargettowhichtherequestwassentshouldbedeleted.

YoumightalsosendaPUTrequestinsteadofPATCHtoaURLthatidentifiesaspecificperson;inthiscase,theentiredocumentinthecollectionwillgetupdatedorreplacedwiththedocumentprovidedaspartofthePUTrequest.

SeealsoThespring-data-resthomepageathttp://projects.spring.io/spring-data-rest/,whereyoucanfindlinkstoitsGitrepository,referencemanual,andJavadocURLs

InstallingtheGUI-basedclient,MongoVUE,forMongoDBInthisrecipe,wewilllookataGUI-basedclientforMongoDB.Throughoutthebook,wehaveusedtheMongoshelltoperformvariousoperationsweneed.Itsadvantagesareasfollows:

ItcomespackagedwiththeMongoDBinstallationBeinglightweight,youneednotworryaboutittakingupyoursystem’sresourcesOnserverswhereGUI-basedinterfacesarenotpresent,theshellistheonlyoptiontoconnect,query,andadministertheserverinstance

Havingsaidthis,ifyouarenotonaserverandwanttoconnecttoadatabaseinstancetoquery,viewtheplanofaquery,administer,andsoon,itisnicetohaveaGUIwiththesefeaturestoletyoudothingsataclickofabutton.Asadeveloper,wealwaysqueryourrelationaldatabasewithaGUI-basedthickclient;sowhynotdothesameforMongoDB?

Inthisrecipe,wewillseehowtoinstallsomefeaturesofaMongoDBclient,MongoVUE.ThisclientisonlyavailableonWindowsmachines.Thisproducthasbothapaidversion(withvariouslevelsoflicensingpernumberofusers)andafreeversionthathassomelimitations.Forthisrecipe,we’lllookatthefreeversion.

GettingreadyForthisrecipe,thefollowingstepsarenecessary:

1. StartasingleinstanceoftheMongoDBserver.Theportonwhichtheconnectionsareacceptedwillbethedefaultone,27017.

2. ImportthefollowingtwocollectionsfromthecommandpromptaftertheMongoDBserverisstarted:

$mongoimport--typejsonpersonTwo.json-cpersonTwo-dtest--drop

$mongoimport--typecsv-cpostalCodes-dtestpincodes.csv--

headerline--drop

Howtodoit…1. DownloadtheinstallerZIPfortheMongoVUEfrom

http://www.mongovue.com/downloads/.Oncedownloaded,itisamatterofafewclicksandthesoftwaregetsinstalled.

2. Opentheinstalledapplication.Asthisisafreeversion,wewillhaveallthefeaturesavailableforthefirst14days,afterwhichsomeofthefeatureswillnotbeavailable.Thedetailsofthiscanbeseenathttp://www.mongovue.com/purchase/.

3. Thefirstthingwewilldoisaddadatabaseconnectionasfollows:

1. Oncethefollowingwindowisopened,clickonthe+buttontoaddanewconnection.

2. Onceopened,wewillgetanotherwindowinwhichwewillfillintheserver-connectiondetails.FillinthefollowingdetailsinthenewwindowandclickonTest.Thisshouldsucceediftheconnectionworks.Finally,clickonSave,asshowninthefollowingscreenshot:

3. Onceadded,connecttotheinstance.

4. Intheleft-hand-sidenavigationpanel,wewillseetheinstancesaddedandthedatabasesinthem,asshowninthefollowingscreenshot:

Asweseeintheprecedingscreenshot,hoveringthemouseoverthenameofthecollectionshowsusthesizeandcountofthedocumentsinthecollection

5. Let’sseehowtoqueryacollectionandgetallthedocuments.WewillusethepostalCodescollectionforourtest.Right-clickonthecollectionnameandthenclickonView.WewillseethecontentsofthecollectionshownasTreeView,wherewecanexpandandseethecontents;TableView,whichshowsthecontentsinatabulargrid;orTextView,whichshowsthecontentsasnormalJSONtext.

6. Let’sseewhathappenswhenwequeryacollectionwithnesteddocuments;personTwoisacollectionwiththefollowingsampledocumentinit:

{

"_id":1,

"_class":"com.packtpub.mongo.cookbook.domain.Person2",

"firstName":"Steve",

"lastName":"Johnson",

"age":30,

"gender":"Male",

"residentialAddress":{

"addressLineOne":"20,Centralstreet",

"city":"Sydney",

"state":"NSW",

"country":"Australia"

}

}

7. Whenwequerytoseeallthedocumentsinthecollection,wewillseethefollowingscreenshot:

8. TheresidentialAddresscolumnshowsthatthevalueisanesteddocumentwiththegivennumberoffieldspresentinit.Hoveringyourmouseoveritshowsthenesteddocument.Alternatively,youcanclickonthecolumntoshowthecontentsinthisdocumentagainasagrid.Oncethenesteddocument(s)areshown,youcanclickonthetopofthegridtocomebackonelevel.

Let’sseehowtowritequeriestoretrievetheselecteddocuments:

1. Right-clickonthepostalCodescollectionandclickonFind.Wewilltypethefollowingqueryinthe{Find}textboxandinthe{Sort}field,andclickontheFindbutton:

2. Wecanchoosefromthetabthetypeofviewwewant,suchasTreeView,TableView,orTextView.Theplanofthequeryisalsoshown.Wheneveranyoperationisrun,theLearnshellatthebottomshowstheactualMongoqueryexecuted.Inthiscase,wewillseethefollowingquery:

[11:17:07PM]

db.postalCodes.find({"city":/Mumbai/i}).limit(50);

db.postalCodes.find({"city":/Mumbai/i}).limit(50).explain();

3. Theplanofaqueryisalsoshowneverytimeand,asofthecurrentversion1.6.9.0,thereisnowaytodisablethesettingthatshowsthequeryplanwiththequery.

4. IntheTreeViewtab,right-clickingonadocumentwillgiveyoumoreoptionssuchasexpandingit,copyingtheJSONcontents,addingkeystothisdocument,andremovingthedocument.Trytoremoveadocumentfromthiscollectionwitharight-clickandalsotryaddinganyadditionalkeystothedocument.

5. YoumightchoosetorestorethedocumentsbyreimportingthedatafromthepostalCodescollection.

Toinsertadocumentinthecollection,performthefollowingsteps.WewillinsertadocumentinthepersonTwocollection.

1. Right-clickonthepersonTwocollectionnameandclickonInsert/ImportDocuments…,asshowninthefollowingscreenshot:

2. Anotherpop-upwindowwillcomeupwhereyoucanchoosetoenterasingleJSONdocumentoravalidtextfilewithJSONdocumentstobeimported.Wewillimportthefollowingdocumentbyimportingasingledocument:

{

"_id":4,

"firstName":"Jack",

"lastName":"Jones",

"age":35,

"gender":"Male"

}

3. Querythecollectiononcethedocumentisimportedsuccessfully;wewillseethenewlyimporteddocumentalongwiththeoldones.

Let’sseehowtoupdatethedocument:

1. Youcaneitherright-clickonthecollectionnameontheleft-handsideandclickonUpdateorselecttheUpdateoptionatthetop.Ineithercase,wewillhavethefollowingwindow.Here,wewillupdatetheageofthepersonweinsertedinthepreviousstepasfollows:

SomethingstonoteinthisGUIarethequerytextboxontheleft-handsidetofindthedocumenttobeupdated,andtheupdateJSONontheright-handside,whichwillbeappliedtotheselecteddocument(s).

2. Beforeyouupdate,youmightchoosetohittheCountbuttontoseethenumberofdocumentsthatcanbeupdated(inthiscase,one).OnclickingonFind,wecanseethedocumentsinthetreeform.Ontheright-handside,belowtheupdateJSONtext,wehavetheoptiontoupdateonedocumentandmultipledocumentsbyclickingonUpdate1orUpdateAll,respectively.

YoumightchoosewhethertheUpsertoperationistobedoneornotifthedocument(s)forthegivenfindconditionarenotfoundTheradiobuttonsinthebottom-rightcornerofthescreen,asshownintheprecedingscreenshot,eithershowtheoutputofthegetLastErroroperationortheresultafterupdate;inthiscase,aquerywillbeexecutedtofindthedocument(s)updatedHowever,thefindqueryisnotfoolproofandmightreturndifferentresultsfromthosetrulyupdatedasaseparatequery,thesameasinthe{Find}textbox,andtheupdateandfindoperationsarenotatomic

Wehavequeriedonsmallcollectionssofar.Asthesizeofthecollectionincreases,queriesthatperformfullcollectionscansarenotacceptable,andweneedtocreateindexesasfollows:

1. TocreateanindexbylastNameintheascendingorderandageinthedescendingorderforinstance,wewillinvokedb.personTwo.ensureIndex({'lastName':1,'age':-1}).

2. UsingMongoVUE,thereisawaytovisuallycreateanindexbyright-clickingonthecollectionnameontheleft-handsideofthescreenandselectingAddIndex….

3. Inthenewpop-upwindow,enterthenameoftheindexandselecttheVisualtab,asshowninthefollowingscreenshot.SelectthelastNameandagefieldswiththe

Ascending(1)andDescending(-1)values,respectively.

4. Oncetheprecedingdetailsarefilledin,clickonCreate.ThiswillcreatetheindexforusbyfiringtheensureIndexcommand,aswecanseeintheLearnShellbelow.

5. Youcanoptfortheindextobeuniqueanddropduplicates(whichwillbeenabledwhenUniqueisselected)orevencreatebig,long-runningindexcreationsinthebackground.NotetheJsontabnexttotheVisualtab.ThatistheplacewhereyoucantypeintheensureIndexcommandasyoudofromtheshelltocreatetheindex.

Now,wewillseehowtodropanindex:

1. Simplyexpandthetreeontheleft-handside.2. Onexpandingthecollection,wewillseealltheindexescreatedonit.3. Exceptforthedefaultindexonthe_idfield,allotherindexescanbedropped.4. Simplyright-clickonthenameandselectDropindextodroporclickonProperties

toviewitsproperties.

AfterseeinghowtodothebasicCRUDoperationsandaftercreatingtheindex,let’slookathowtoexecutetheaggregationoperations.

1. Therearenovisualtoolssuchasindexcreationincaseofaggregationbutsimplyatextareawherewecanenterouraggregationpipeline.Inthefollowingsample,wewillperformaggregationonthepostalCodescollectiontofindthetopfivestatesbythenumberoftimestheyappearinthecollection:

{'$project':{'state':1,'_id':0}},

{'$group':{'_id':'$state','count':{'$sum':1}}},

{'$sort':{'count':-1}},

{'$limit':5}

2. Wewillhavethefollowingaggregationpipelineentered:

3. Oncethepipelineisentered,hittheAggregatebuttontogettheaggregationresults.

ExecutingMapReduceisevencooler.Theusecasethatwewillexecuteissimilartotheoneweusedearlier,butwewillseehowtoimplementtheMapReduceoperationusingMongoVUE:

1. ToexecuteaMapReducejob,right-clickonthecollectionnameintheleft-hand-sidemenuandclickonMapReduce.

2. ThisoptionisrightabovetheAggregateoption,asseeninthepreviousscreenshot.ThisgivesusaprettyneatGUItoentertheMap,Reduce,Finalize,andIn&Outoptions,asshowninthefollowingscreenshot:

3. TheMapfunctionissimplyasfollows:

functionMap(){

emit(this.state,1)

}

4. TheReducefunctionisasfollows:

functionReduce(key,values){

returnArray.sum(values)

}

5. LeavetheFinalizemethodunimplementedandintheIn&Outsection,fillinthefollowingdetails:

6. ClickonGotostartexecutingtheMapReducejob.7. WewillhavetheoutputtothemongoVue_mrcollection.QuerythemongoVue_mr

collectionusingthefollowingquery:

>db.mongoVue_mr.find().sort({value:-1}).limit(5)

8. Verifytheresultsagainstthosewegotusingaggregation.TheformatofMapReducewaschosenasReduce.Formoreoptionsandtheirbehavior,visithttp://docs.mongodb.org/manual/reference/command/mapReduce/#mapreduce-out-cmd.

MonitoringtheserverinstancesisnoweasilypossibleusingMongoVUE.Todothis,performthefollowingsteps:

1. Tomonitoraninstance,navigatetoTools|Monitoringinthetopmenu.2. Bydefault,noserverwillbeadded,andwewillhavetoclickon+AddServerto

addaserverinstance.3. Selectthelocalinstanceaddedoranyserveryouwanttomonitorandclickon

Connect.4. Wewillseequitealotofmonitoringdetails.MongoVUEusesthedb.serverStatus

commandtoservethesestats.Thus,tolimitthefrequencyatwhichweexecutethiscommandonbusyserverinstances,wecanchooseRefreshIntervalatthetopofthescreen,asshowninthefollowingscreenshot:

Howitworks…Whatwecoveredintheprevioussectionswasprettystraightforward;usingthisinformation,wecanperformthemajorityofouractivitiesasadeveloperandadministrator.

SeealsoChapter4,AdministrationChapter6,MonitoringandBackupshttp://www.mongovue.com/tutorials/forvarioustutorialsonMongoVUE

AppendixA.ConceptsforReferenceThisappendixcontainssomeadditionalinformationthatwillhelpyouunderstandtherecipesbetter.Wewilldiscusswriteconcernandreadpreferenceinasmuchdetailaspossible.

WriteconcernanditssignificanceWriteconcernistheminimumguaranteethattheMongoDBserverprovideswithrespecttothewriteoperationdonebytheclient.Therearevariouslevelsofwriteconcernthataresetbytheclientapplication,togetaguaranteefromtheserverthatacertainstagewillbereachedinthewriteprocessontheserverside.

Thestrongertherequirementforaguarantee,thegreaterthetimetaken(potentially)togetaresponsefromtheserver.Withwriteconcern,wedon’talwaysneedtogetanacknowledgementfromtheserveraboutthewriteoperationbeingcompletelysuccessful.Forsomelesscrucialdatasuchaslogs,wemightbemoreinterestedinsendingmorewritespersecondoveraconnection.Ontheotherhand,whenwearelookingtoupdatesensitiveinformation,suchascustomerdetails,wewanttobesureofthewritebeingsuccessful(consistentanddurable);dataintegrityiscrucialandtakesprecedenceoverthespeedofthewrites.

Anextremelyusefulfeatureofwriteconcernistheabilitytocompromisebetweenoneofthefactors:thespeedofwriteoperationsandtheconsistencyofthedatawritten,onacase-to-casebasis.However,itneedsadeepunderstandingoftheimplicationsofsettingupaparticularwriteconcern.Thefollowingdiagramrunsfromtheleftandgoestotheright,andshowstheincreasinglevelofwriteguarantees:

AswemovefromItoIV,theguaranteefortheperformedwritegetsstrongerandstronger,butthetimetakentoexecutethewriteoperationislongerfromaclient’sperspective.AllwriteconcernsareexpressedhereasJSONobjects,usingthreedifferentkeys,namely,w,j,andfsync.Additionally,anotherkeycalledwtimeoutisusedtoprovidetimeoutvaluesforthewriteoperation.Let’sseethethreekeysindetail:

w:Thisisusedtoindicatewhethertowaitfortheserver’sacknowledgementornot,

whethertoreportwriteerrorsduetodataissuesornot,andaboutthedatabeingreplicatedtosecondary.Itsvalueisusuallyanumberandaspecialcasewherethevaluecanbemajority,whichwewillseelater.j:ThisisrelatedtojournalinganditsvaluecanbeaBoolean(true/falseor1/0).fsync:ThisisaBooleanvalueandisrelatedtowhetherthewriteshouldwaittillthedataisflushedtodiskornotbeforeresponding.wtimeout:Thisspecifiesthetimeoutforwriteoperations,wherebythedriverthrowsanexceptiontotheclientiftheserverdoesn’trespondbackinsecondswithintheprovidedtime.Wewillseetheoptioninsomedetailsoon.

InpartI,whichwehavedemarcatedtilldriver,wehavetwowriteconcerns,namely,{w:-1}and{w:0}.Boththesewriteconcernsarecommon,inasensethattheyneitherwaitfortheserver’sacknowledgementuponreceivingthewriteoperation,nordotheyreportanyexceptionontheserversidecausedbyuniqueindexviolation.Theclientwillgetanokresponseandwilldiscoverthewritefailureonlywhentheyquerythedatabaseatsomelaterpointoftimeandfindthedatamissing.Thedifferenceisinthewayboththeserespondonthenetworkerror.Whenweset{w:-1},theoperationdoesn’tfailandawriteresponseisreceivedbytheuser.However,itwillcontainaresponsestatingthatanetworkerrorpreventedthewriteoperationfromsucceedingandnoretriesforwritemustbeattempted.Ontheotherhand,with{w:0},ifanetworkerroroccurs,thedrivermightchoosetoretrytheoperationandthrowanexceptiontotheclientifthewritefailsduetonetworkerror.Boththesewriteconcernsgiveaquickresponsebacktotheinvokingclientatthecostofdataconsistency.Thesewriteconcernsareokforusecasessuchaslogging,whereoccasionallogwritemissesarefine.InolderversionsofMongoDB,{w:0}wasthedefaultwriteconcernifnonewasmentionedbytheinvokingclient.Atthetimeofwritingthisbook,thishaschangedto{w:1}bydefaultandtheoption{w:0}isdeprecated.

InpartIIofthediagram,whichfallsbetweenthedriverandtheserver,thewriteconcernwearetalkingaboutis{w:1}.Thedriverwaitsforanacknowledgementfromtheserverforthewriteoperationtocomplete.Notethattheserverrespondingdoesn’tmeanthatthewriteoperationwasmadedurable.Itmeansthatthechangejustgotupdatedintothememory,alltheconstraintswerechecked,andanyexceptionwillbereportedtotheclient,unliketheprevioustwowriteconcernswesaw.Thisisarelativelysafewriteconcernmode,whichwillbefast,butthereisstillaslimchanceofthedatabeinglostifitcrashesinthosefewmillisecondswhenthedatawaswrittentothejournalfromthememory.Formostusecases,thisisagoodoptiontoset.Hence,thisisthedefaultwriteconcernmode.

Movingon,wecometopartIIIofthediagram,whichisfromtheentrypointintotheserverasfarasthejournal.Thewriteconcernwearelookingforhereisat{j:1}or{j:true}.Thiswriteconcernensuresaresponsetotheinvokingclientonlywhenthewriteoperationiswrittentothejournal.Whatisajournalthough?ThisissomethingthatwesawindepthinChapter4,Administration,butfornow,wewilljustlookatamechanismthatensuresthatthewritesaremadedurableandthedataonthediskdoesn’tgetcorruptedintheeventofservercrashes.

Finally,let’scometopartIVofthediagram;thewriteconcernwearetalkingaboutis

{fsync:true}.Thisrequiresthatthedatabeflushedtodisktogetbeforesendingtheresponsebacktotheclient.Inmyopinion,whenjournalingisenabled,thisoperationdoesn’treallyaddanyvalue,asjournalingensuresdatapersistenceevenonservercrash.Onlywhenjournalingisdisableddoesthisoptionensurethatthewriteoperationissuccessfulwhentheclientreceivesasuccessresponse.Ifthedataisreallyimportant,journalingshouldneverbedisabledinthefirstplaceasitalsoensuresthatthedataonthediskdoesn’tgetcorrupted.

Wehaveseensomebasicwriteconcernsforasingle-nodeserverorthoserelevanttotheprimarynodeonlyinareplicaset.

NoteAninterestingthingtodiscussis,whatifwehaveawriteconcernsuchas{w:0,j:true}?Wedonotwaitfortheserver’sacknowledgementandalsoensurethatthewritehasbeenmadetothejournal.Inthiscase,journalingflagtakesprecedenceandtheclientwaitsfortheacknowledgementofthewriteoperation.Oneshouldavoidsettingsuchambiguouswriteconcernstoavoidunpleasantsurprises.

Wewillnowtalkaboutwriteconcernwhenitinvolvessecondarynodesofareplicasetaswell.Let’stakealookatthefollowingdiagram:

Anywriteconcernwithawvaluegreaterthanoneindicatesthatsecondarynodestooneedtoacknowledgebeforesendingaresponseback.Asseenintheprecedingdiagram,whenaprimarynodegetsawriteoperation,itpropagatesthatoperationtoallsecondarynodes.Assoonasitgetsaresponsefromapredeterminednumberofsecondarynodes,itacknowledgestheclientthatthewritehasbeensuccessful.Forexample,whenwehaveawriteconcern{w:3},itmeansthattheclientshouldbesentaresponseonlywhenthreenodesintheclusteracknowledgethewrite.Thesethreenodesincludetheprimarynode.Hence,itisnowdowntotwosecondarynodestorespondbackforasuccessfulwriteoperation.

However,thereisaproblemwithprovidinganumberforthewriteconcern.Weneedto

knowthenumberofnodesintheclusterandaccordinglysetthevalueofw.Alowvaluewillsendanacknowledgementtoafewnodesreplicatingthedata.Avaluetoohighmayunnecessarilyslowtheresponsebacktotheclient,orinsomecases,mightnotsendaresponseatall.Supposeyouhaveathree-nodereplicasetandwehave{w:4}asthewriteconcern,theserverwillnotsendanacknowledgementtillthedataisreplicatedtothreesecondarynodes,whichdonotexistaswehavejusttwosecondarynodes.Thus,theclientwaitsforaverylongtimetohearfromtheserveraboutthewriteoperation.Thereareacoupleofwaystoaddressthisproblem:

Usethewtimeoutkeyandspecifythetimeoutforthewriteconcern.Thiswillensurethatawriteoperationwillnotblockforlongerthanthetimespecified(inmilliseconds)forthewtimeoutfieldofthewriteconcern.Forexample,{w:3,wtimeout:10000}ensuresthatthewriteoperationwillnotblockmorethan10seconds(10,000ms),afterwhichanexceptionwillbethrowntotheclient.InthecaseofJava,aWriteConcernExceptionwillbethrownwiththerootcausemessagestatingthereasonastimeout.Notethatthisexceptiondoesnotrollbackthewriteoperation.Itjustinformstheclientthattheoperationdidnotgetcompletedinthespecifiedamountoftime.Itmightlaterbecompletedontheserverside,sometimeaftertheclientreceivesthetimeoutexception.Itisuptotheapplicationprogramtodealwiththeexceptionandprogrammaticallytakethecorrectivesteps.Themessageforthetimeoutexceptiondoesconveysomeinterestingdetails,whichwewillseeonexecutingthetestprogramforthewriteconcern.Abetterwaytospecifythevalueofw,inthecaseofreplicasets,isbyspecifyingthevalueasmajority.Thiswriteconcernautomaticallyidentifiesthenumberofnodesinareplicasetandsendsanacknowledgementbacktotheclientwhenthedataisreplicatedtoamajorityofnodes.Forexample,ifthewriteconcernis{w:"majority"}andthenumberofnodesinareplicasetisthree,thenmajoritywillbe2.Whereas,atthelaterpointintime,whenwechangethenumberofnodestofive,themajoritywillbe3nodes.Thenumberofnodestoformamajorityautomaticallygetscomputedwhenthewriteconcern’svalueisgivenasmajority.

Now,letusputtheconceptswediscussedintouseandexecuteatestprogramthatwilldemonstratesomeoftheconceptswejustsaw.

SettingupareplicasetTosetupareplicaset,youshouldknowhowtostartthebasicreplicasetwiththreenodes.RefertotheStartingmultipleinstancesaspartofareplicasetrecipeinChapter1,InstallingandStartingtheMongoDBServer.Thisrecipeisbuiltonthatrecipebecauseitneedsanadditionalconfigurationwhilestartingthereplicaset,whichwewilldiscussinthenextsection.Notethatthereplicausedherehasaslightchangeinconfigurationtotheoneyouhaveusedearlier.

Here,wewilluseaJavaprogramtodemonstratevariouswriteconcernsandtheirbehavior.TheConnectingtoasinglenodefromaJavaclientrecipeinChapter1,InstallingandStartingtheMongoDBServer,shouldbevisiteduntilMavenissetup.Thiscanbeabitinconvenientifyouarecomingfromanon-Javabackground.

NoteTheJavaprojectnamedMongoJavaisavailablefordownloadatthebook’swebsite.Ifthesetupiscomplete,youcantesttheprojectjustbyexecutingthefollowingcommand:

mvncompileexec:java-

Dexec.mainClass=com.packtpub.mongo.cookbook.FirstMongoClient

Thecodeforthisprojectisavailablefordownloadatthebook’swebsite.DownloadtheprojectnamedWriteConcernTestandkeepitonalocaldrivereadyforexecution.

So,let’sgetstarted:

1. Preparethefollowingconfigurationfileforthereplicaset.ThisisidenticaltotheconfigfilethatwesawintheStartingmultipleinstancesaspartofareplicasetrecipeinChapter1,InstallingandStartingtheMongoDBServer,wherewesetupthereplicaset,asfollows,withjustonedifference,slaveDelay:5,priority:0:

cfg={

_id:'repSetTest',

members:[

{_id:0,host:'localhost:27000'},

{_id:1,host:'localhost:27001'},

{_id:2,host:'localhost:27002',slaveDelay:5,priority:0}

]

}

2. Usethisconfigtostartathree-nodereplicaset,withonenodelisteningtoport27000.Theotherscanbeanyportsofyourchoice,butstickto27001and27002ifpossible(weneedtoupdatetheconfigaccordinglyifwedecidetouseadifferentportnumber).Also,remembertosetthenameofthereplicasetasreplSetTestforthereplSetcommand-lineoptionwhilestartingthereplicaset.Givesometimetothereplicasettocomeupbeforegoingaheadwithnextstep.

3. Atthispoint,thereplicasetwiththeearliermentionedspecificationsshouldbeupandrunning.WewillnowexecutethetestcodeprovidedinJava,toobservesomeinterestingfactsandbehaviorsofdifferentwriteconcerns.NotethatthisprogramalsotriestoconnecttoaportwherenoMongoprocessislisteningforconnections.

Theportchosenis20000;ensurethatbeforerunningthecode,noserverisupandrunningandlisteningtoport20000.

4. GototherootdirectoryoftheWriteConcernTestprojectandexecutethefollowingcommand:

mvncompileexec:java-

Dexec.mainClass=com.packtpub.mongo.cookbook.WriteConcernTests

Itshouldtakesometimetoexecutecompletely,dependingonyourhardwareconfiguration.Roughlyaround35to40secondsweretakenonmymachine,whichhasaspinningdiskdrivewitha7200RPM.

Beforewecontinueanalyzingthelogs,letusseewhatthosetwoadditionalfieldsaddedtotheconfigfiletosetupthereplicawere.TheslaveDelayfieldindicatesthattheparticularslave(theonelisteningonport27002inthiscase)willlagbehindtheprimaryby5seconds.Thatis,thedatabeingreplicatedcurrentlyonthisreplicanodewillbetheonethatwasaddedontotheprimary5secondsago.Secondly,thisnodecanneverbeaprimaryandhence,thepriorityfieldhastobeaddedwiththevalue0.WehavealreadyseenthisindetailinChapter4,Administration.

Letusnowanalyzetheoutputfromtheprecedingcommand’sexecution.TheJavaclassprovidedneednotbelookedathere;theoutputontheconsoleissufficient.Someoftherelevantportionsoftheoutputconsoleareasfollows:

[INFO]---exec-maven-plugin:1.2.1:java(default-cli)@mongo-cookbook-

wctest---

Tryingtoconnecttoserverrunningonport20000

Tryingtowritedatainthecollectionwithwriteconcern{w:-1}

ErrorreturnedintheWriteResultisNETWORKERROR

Tryingtowritedatainthecollectionwithwriteconcern{w:0}

CaughtMongoException.Networktryingtowritetocollection,messageis

Writeoperationtoserverlocalhost/127.0.0.1:20000failedondatabasetest

Connectedtoreplicasetwithonenodelisteningonport27000locally

Insertingduplicatekeyswith{w:0}

Noexceptioncaughtwhileinsertingdatawithduplicate_id

Nowinsertingthesamedatawith{w:1}

CaughtDuplicateException,exceptionmessageis{"serverUsed":

"localhost/127.0.0.1:27000","err":"E11000duplicatekeyerrorindex:

test.writeConcernTest.$_id_dupkey:{:\"a\"}","code":11000,"n":

0,"lastOp":{"$ts":1386009990,"$inc":2},"connectionId":157,

"ok":1.0}

AveragerunningtimewithWriteConcern{w:1,fsync:false,j:false}is0ms

AveragerunningtimewithWriteConcern{w:2,fsync:false,j:false}is12ms

AveragerunningtimewithWriteConcern{w:1,fsync:false,j:true}is40ms

AveragerunningtimewithWriteConcern{w:1,fsync:true,j:false}is44ms

AveragerunningtimewithWriteConcern{w:3,fsync:false,j:false}is5128

ms

CaughtWriteConcernexceptionfor{w:5},withfollowingmessage{

"serverUsed":"localhost/127.0.0.1:27000","n":0,"lastOp":{"$ts":

1386009991,"$inc":18},"connectionId":157,"wtimeout":true,

"waited":1004,"writtenTo":[{"_id":0,"host":"localhost:27000"}

,{"_id":1,"host":"localhost:27001"}],"err":"timeout","ok":

1.0}

[INFO]-------------------------------------------------------------------

-----

[INFO]BUILDSUCCESS

[INFO]--------------------------------------------------------------------

----

[INFO]Totaltime:36.671s

[INFO]Finishedat:TueDec0300:16:57IST2013

[INFO]FinalMemory:13M/33M

[INFO]--------------------------------------------------------------------

----

ThefirststatementinthelogstatesthatwetrytoconnecttoaMongoprocesslisteningonport20000.AsthereshouldnotbeaMongoserverrunningandlisteningtothisportforclientconnections,allourwriteoperationstothisservershouldnotsucceed,andthiswillnowgiveusachancetoseewhathappenswhenweusethewriteconcerns{w:-1}and{w:0}andwritetothisnonexistentserver.

Thenexttwolinesintheoutputshowthatwhenwehavethewriteconcern{w:-1},wedogetawriteresultback,butitcontainstheerrorflagsettoindicateanetworkerror.However,noexceptionisthrown.Inthecaseofthewriteconcern{w:0},wedogetanexceptionintheclientapplicationforanynetworkerrors.Ofcourse,allotherwriteconcernsensuringastrictguaranteewillthrowanexceptioninthiscasetoo.

Nowwecometotheportionofthecodethatconnectstothereplicasetwhereoneofthenodesislisteningtoport27000(ifnot,thecodewillshowtheerrorontheconsoleandterminate).Now,weattempttoinsertadocumentwithaduplicate_idfield({'_id':'a'})intoacollection,oncewiththewriteconcern{w:0}andoncewith{w:1}.Asweseeintheconsole,theformer({w:0})didn’tthrowanexceptionandtheinsertwentthroughsuccessfullyfromtheclient’sperspective,whereasthelatter({w:1})threwanexceptiontotheclient,indicatingaduplicatekey.Theexceptioncontainsalotofinformationabouttheserver’shostnameandport,atthetimewhentheexceptionoccurred:thefieldforwhichtheuniqueconstraintfailed;theclientconnectionID;errorcode;andthevaluethatwasnotuniqueandcausedtheexception.Thefactisthat,evenwhentheinsertwasperformedusing{w:0}asthewriteconcern,itfailed.However,asthedriverdidn’twaitfortheserver’sacknowledgement,itwasnevercommunicatedaboutthefailure.

Movingon,wenowtrytocomputethetimetakenforthewriteoperationtocomplete.Thetimeshownhereistheaverageofthetimetakentoexecutethesameoperationwithagivenwriteconcernfivetimes.Notethatthesetimeswillvaryondifferentinstancesofexecutionoftheprogram,andthismethodisjustmeanttogivesomeroughestimatesforourstudy.Wecanconcludefromtheoutputthatthetimetakenforthewriteconcern{w:1}islessthanthatof{w:2}(askingforanacknowledgementfromonesecondarynode)andthetimetakenfor{w:2}islessthan{j:true},whichinturnislessthan{fsync:true}.Thenextlineoftheoutputshowsusthattheaveragetimetakenforthewriteoperationtocompleteisroughly5secondswhenthewriteconcernis{w:3}.Anyguessesonwhythatisthecase?Whydoesittakesolong?Thereasonis,whenwis3,we

sendanacknowledgementtotheclientonlywhentwosecondarynodesacknowledgethewriteoperation.Inourcase,oneofthenodesisdelayedfromtheprimarybyabout5seconds,andthus,itcanacknowledgethewriteonlyafter5seconds,andhence,theclientreceivesaresponsefromtheserverinroughly5seconds.

Letusdoaquickexercisehere.Whatdoyou’llthinkwouldbetheapproximateresponsetimewhenwehavethewriteconcernas{w:'majority'}?Thehinthereis,forareplicasetofthreenodes,twoisthemajority.

Finallyweseeatimeoutexception.Timeoutissetusingthewtimeoutfieldofthedocumentandisspecifiedinmilliseconds.Inourcase,wegaveatimeoutof1000ms,thatis1second,andthenumberofnodesinthereplicasettogetanacknowledgementfrombeforesendingtheresponsebacktotheclientis5(foursecondaryinstances).Thus,wehavethewriteconcernas{w:5,wtimeout:1000}.Asourmaximumnumberofnodesisthree,theoperationwiththevalueofwsetto5willwaitforaverylongtimeuntiltwomoresecondaryinstancesareaddedtothecluster.Withthetimeoutset,theclientreturnsandthrowsanerrortotheclient,conveyingsomeinterestingdetails.ThefollowingistheJSONsentasanexceptionmessage:

{"serverUsed":"localhost/127.0.0.1:27000","n":0,"lastOp":{"$ts"

:1386015030,"$inc":1},"connectionId":507,"wtimeout":true,

"waited":1000,"writtenTo":[{"_id":0,"host":"localhost:27000"}

,{"_id":1,"host":"localhost:27001"}],"err":"timeout","ok":

1.0}

Letuslookattheinterestingfields.Westartwiththenfield.Thisindicatesthenumberofdocumentsupdated.Asinthiscaseitisaninsertandnotanupdate,itstays0.Thewtimeoutandwaitedfieldstelluswhetherthetransactiondidtimeoutandtheamountoftimeforwhichtheclientwaitedforaresponse;inthiscase1000ms.ThemostinterestingfieldiswrittenTo.Inthiscase,theinsertwassuccessfulonthesetwonodesofthereplicasetwhentheoperationtimedout,andhence,itisseeninthearray.ThethirdnodehasaslaveDelayvalueof5secondsand,hence,thedataisstillnotwrittentoit.Thisprovesthatthetimeoutdoesn’trollbacktheinsertanditdoesgothroughsuccessfully.Infact,thenodewithslaveDelaywillalsohavethedataafter5seconds,eveniftheoperationtimesout,andthismakesperfectsenseasitkeepstheprimaryandsecondaryinstancesinsync.Itistheresponsibilityoftheapplicationtodetectsuchtimeoutsandhandlethem.

ReadpreferenceforqueryingIntheprevioussection,wesawwhatawriteconcernisandhowitaffectsthewriteoperations(insert,update,anddelete).Inthissection,wewillseewhatareadpreferenceisandhowitaffectsqueryoperations.We’lldiscusshowtouseareadpreferenceinseparaterecipes,tousespecificprogramminglanguagedrivers.

Whenconnectedtoanindividualnode,queryoperationswillbeallowedbydefaultwhenconnectedtoaprimary,andincaseifitisconnectedtoasecondarynode,weneedtoexplicitlystatethatitisoktoqueryfromsecondaryinstancesbyexecutingrs.slaveOk()fromtheshell.

However,considerconnectingtoaMongoreplicasetfromanapplication.Itwillconnecttothereplicasetandnotasingleinstancefromtheapplication.Dependingonthenatureoftheapplication,itmightalwayswanttoconnecttoaprimary;alwaystoasecondary;preferconnectingtoaprimarynodebutwouldbeoktoconnecttoasecondarynodeinsomescenariosandviceversaandfinally,itmightconnecttotheinstancegeographicallyclosetoit(well,mostofthetime).

Thus,thereadpreferenceplaysanimportantrolewhenconnectedtoareplicasetandnottoasingleinstance.Inthefollowingtable,wewillseethevariousreadpreferencesthatareavailableandwhattheirbehaviorisintermsofqueryingareplicaset.Therearefiveofthemandthenamesareself-explanatory:

Readpreference Description

primary

Thisisthedefaultmodeanditallowsqueriestobeexecutedonlyonprimaryinstances.Itistheonlymodethatguaranteesthemostrecentdata,asallwriteshavetogothroughaprimaryinstance.Readoperationshoweverwillfailifnoprimaryisavailable,whichhappensforafewmomentswhenaprimarygoesdownandcontinuestillanewprimaryischosen.

primaryPreferred

Thisisidenticaltotheprecedingprimaryreadpreference,exceptthatduringafailover,whennoprimaryisavailable,itwillreaddatafromthesecondaryandthosearethetimeswhenitpossiblydoesn’treadthemostrecentdata.

secondary

Thisisexactlytheoppositetothedefaultprimaryreadpreference.Thismodeensuresthatreadoperationsnevergotoaprimaryandasecondaryischosenalways.Thechancesofreadinginconsistentdatathatisnotupdatedtothelatestwriteoperationaremaximalinthismode.It,however,isok(infact,preferred)forapplicationsthatdonotfaceendusersandareusedforsomeinstancestogethourlystatisticsandanalyticsjobsusedforin-housemonitoring,wheretheaccuracyofthedataisleastimportant,butnotaddingaloadtotheprimaryinstanceiskey.Ifnosecondaryinstanceisavailableorreachable,andonlyaprimaryinstanceis,thereadoperationwillfail.

secondaryPreferredThisissimilartotheprecedingsecondaryreadpreference,inallaspectsexceptthatifnosecondaryisavailable,thereadoperationswillgototheprimaryinstance.

nearest

This,unlikealltheprecedingreadpreferences,canconnecteithertoaprimaryorasecondary.Theprimaryobjectiveforthisreadpreferenceisminimumlatencybetweentheclientandaninstanceofareplicaset.Inthemajorityofthecases,owingtothenetworklatencyandwithasimilarnetworkbetweentheclientandallinstances,theinstancechosenwillbeonethatisgeographicallyclose.

Similartohowwriteconcernscanbecoupledwithshardtags,readpreferencescanalsobeusedalongwithshardtags.AstheconceptoftagshasalreadybeenintroducedinChapter4,Administration,youcanrefertoitformoredetails.

Wejustsawwhatthedifferenttypesofreadpreferencesare(exceptforthoseusingtags)butthequestionis,howdoweusethem?WehavecoveredPythonandJavaclientsinthisbookandwillseehowtousethemintheirrespectiverecipes.Wecansetreadpreferencesatvariouslevels:attheclientlevel,collectionlevel,andquerylevel,withtheonespecifiedatthequeryleveloverridinganyotherreadpreferencesetpreviously.

Letusseewhatthenearestreadpreferencemeans.Conceptually,itcanbevisualizedassomethinglikethefollowingdiagram:

AMongoreplicasetissetupwithonesecondary,whichcanneverbeaprimary,inaseparatedatacenterandtwo(oneprimaryandasecondary)inanotherdatacenter.Anidenticalapplicationdeployedinboththedatacenters,withaprimaryreadpreference,willalwaysconnecttotheprimaryinstanceinDataCenterI.Thismeans,fortheapplicationinDataCenterII,thetrafficgoesoverthepublicnetwork,whichwillhavehighlatency.However,iftheapplicationisokwithslightlystaledata,itcansetthereadpreferenceasthenearest,whichwillautomaticallylettheapplicationinDataCenterIconnecttoaninstanceinDataCenterIandwillallowanapplicationinDataCenterIItoconnecttothesecondaryinstanceinDataCenterII.

Butthenthenextquestionis,howdoesthedriverknowwhichoneisthenearest?Theterm“geographicallyclose”ismisleading;itisactuallytheonewiththeminimumnetworklatency.Theinstancewequerymightbegeographicallyfurtherthananotherinstanceinthereplicaset,butitcanbechosenjustbecauseithasanacceptableresponsetime.Generally,betterresponsetimemeansgeographicallycloser.

Thefollowingsectionisforthoseinterestedininternaldetailsfromthedriveronhowthenearestnodeischosen.Ifyouarehappywithjusttheconceptsandnottheinternaldetails,youcansafelyskiptherestofthecontents.

KnowingtheinternalsLetusseesomepiecesofcodefromaJavaclient(driver2.11.3isusedforthispurpose)andmakesomesenseoutofit.Ifwelookatthecom.mongodb.TaggableReadPreference.NearestReadPreference.getNodemethod,weseethefollowingimplementation:

@Override

ReplicaSetStatus.ReplicaSetNodegetNode(ReplicaSetStatus.ReplicaSetset){

if(_tags.isEmpty())

returnset.getAMember();

for(DBObjectcurTagSet:_tags){

List<ReplicaSetStatus.Tag>tagList=getTagListFromDBObject(curTagSet);

ReplicaSetStatus.ReplicaSetNodenode=set.getAMember(tagList);

if(node!=null){

returnnode;

}

}

returnnull;

}

Fornow,ifweignorethecontentswheretagsarespecified,allitdoesisexecuteset.getAMember().

Thenameofthismethodtellsusthatthereisasetofreplicasetmembersandwereturnedoneofthemrandomly.Thenwhatdecideswhetherthesetcontainsamemberornot?Ifwedigabitfurtherintothismethod,weseethefollowinglinesofcodeinthecom.mongodb.ReplicaSetStatus.ReplicaSetclass:

publicReplicaSetNodegetAMember(){

checkStatus();

if(acceptableMembers.isEmpty()){

returnnull;

}

returnacceptableMembers.get(random.nextInt(acceptableMembers.size()));

}

Ok,soallitdoesispickonefromalistofreplicasetnodesmaintainedinternally.Now,therandompickcanbeasecondary,evenifaprimarycanbechosen(becauseitispresentinthelist).Thus,wecannowsaythatwhenthenearestischosenasareadpreference,andevenifaprimaryisinthelistofcontenders,itmightnotnecessarilybechosenrandomly.

Thequestionnowis,howistheacceptableMemberslistinitialized?Weseeitisdoneintheconstructorofthecom.mongodb.ReplicaSetStatus.ReplicaSetclassasfollows:

this.acceptableMembers

=Collections.unmodifiableList(calculateGoodMembers(all,

calculateBestPingTime(all,true),acceptableLatencyMS,true));

ThecalculateBestPingTimelinejustfindsthebestpingtimeofall(wewillseewhatthispingtimeislater).

AnotherparameterworthmentioningisacceptableLatencyMS.Thisgetsinitializedin

com.mongodb.ReplicaSetStatus.Updater(thisisactuallyabackgroundthreadthatupdatesthestatusofthereplicasetcontinuously),andthevalueforacceptableLatencyMSisinitializedasfollows:

slaveAcceptableLatencyMS=

Integer.parseInt(System.getProperty("com.mongodb.slaveAcceptableLatencyMS",

"15"));

Aswecansee,thiscodesearchesforthesystemvariablecalledcom.mongodb.slaveAcceptableLatencyMS,andifnoneisfound,itinitializestothevalue15,whichis15ms.

Thiscom.mongodb.ReplicaSetStatus.Updaterclassalsohasarunmethodthatperiodicallyupdatesthereplicasetstats.Withoutgettingtoomuchintoit,wecanseethatitcallsupdateAll,whicheventuallyreachestheupdatemethodincom.mongodb.ConnectionStatus.UpdatableNode:

longstart=System.nanoTime();

CommandResultres=_port.runCommand(_mongo.getDB("admin"),isMasterCmd);

longend=System.nanoTime()

Allitdoesisexecutethe{isMaster:1}commandandrecordtheresponsetimeinnanoseconds.Thisresponsetimeisconvertedtomillisecondsandstoredasthepingtime.So,comingbacktothecom.mongodb.ReplicaSetStatus.ReplicaSetclassitstores,allcalculateGoodMembersdoesisfindandaddthemembersofareplicasetthatarenomorethanacceptableLatencyMSmillisecondsmorethanthebestpingtimefoundinthereplicaset.

Forexample,inareplicasetwiththreenodes,thepingtimesfromtheclienttothethreenodes(node1,node2,andnode3)are2ms,5ms,and150msrespectively.Aswesee,thebesttimeis2msandhence,node1goesintothesetofgoodmembers.Now,fromtheremainingnodes,allthosewithalatencythatisnomorethanacceptableLatencyMSmorethanthebest,whichis2+15ms=17ms,as15msisthedefaultthatwillbeconsidered.Thus,node2isalsoacontender,leavingoutnode3.Wenowhavetwonodesinthelistofgoodmembers(goodintermsoflatency).

Now,puttingtogetherallthatwesawonhowitwouldworkforthescenariowesawintheprecedingdiagram,theleastresponsetimewillbefromoneoftheinstancesinthesamedatacenter(fromtheprogramminglanguagedriver’sperspectiveinthesetwodatacenters),astheinstance(s)inotherdatacentersmightnotrespondwithin15ms(thedefaultacceptablevalue)morethanthebestresponsetimeduetopublicnetworklatency.Thus,theacceptablenodesinDataCenterIwillbetwoofthereplicasetnodesinthatdatacenter,andoneofthemwillbechosenatrandom,andforDataCenterII,onlyoneinstanceispresentandistheonlyoption.Hence,itwillbechosenbytheapplicationrunninginthatdatacenter.

IndexA

advancedpackagingtool(apt)/Howtodoit…aggregationoperations,inMongo

executing,withPyMongo/AggregationinMongousingPyMongo,Howitworks…executing,withJavaclient/AggregationinMongousingaJavaclient,Gettingready,Howitworks…

alertssettingup,onMMS/MonitoringMongoDBinstancesonMMS,Howtodoit…,Howitworks…,There’smore…URL/Seealso

AmazonURL/Gettingready

AmazonEC2MongoDB,settingupwithMongoDBAMI/SettingupMongoDBonAmazonEC2usingtheMongoDBAMI,Gettingready,Howtodoit…,Howitworks…MongoDB,settingupwithoutMongoDBAMI/SettingupMongoDBonAmazonEC2withoutusingtheMongoDBAMI,Gettingready,Howtodoit…,Howitworks…

AmazonEMRMapReducejob,runningon/RunningaMapReducejobonAmazonEMR,Gettingready,Howtodoit…,Howitworks…URL/Seealso

AmazonmarketplaceURL/Howtodoit…

AmazonS3URL/RunningaMapReducejobonAmazonEMR

AmazonWebService(AWS)/IntroductionAMI/SettingupMongoDBonAmazonEC2usingtheMongoDBAMI

URL/SettingupMongoDBonAmazonEC2usingtheMongoDBAMIApacheHadoop

URL/Gettingreadyar|awcolumn/Howitworks…atomiccounters

implementing,inMongoDB/ImplementingatomiccountersinMongoDB,Howtodoit…,Howitworks…

avgObjectSizefield,databasestats/Howitworks…avgObjSizefield,collectionstats/Howitworks…AWSconsole

URL/Howtodoit…

Bbackups

managing,inMMSbackupservice/ManagingbackupsintheMMSbackupservice,Howtodoit…,Howitworks…

binarydatastoring,inMongoDB/StoringbinarydatainMongoDB,Howitworks…

buildIndexesoption/Hidden,votes,slavedelayed,andbuildindexconfigurationsbuilt-in-roles

URL/Seealsobulkinserts

URL/Howitworks…

Ccappedcollection

about/CreatingandtailingcappedcollectioncursorsinMongoDBnormalcollection,convertingto/Convertinganormalcollectiontoacappedcollection,Howitworks…

cappedcollectioncursorstailing,inMongoDB/CreatingandtailingcappedcollectioncursorsinMongoDB,Gettingready,Howtodoit…,Howitworks…creating,inMongoDB/CreatingandtailingcappedcollectioncursorsinMongoDB,Howtodoit…,Howitworks…

chunkssplitting/Manuallysplittingandmigratingchunks,Howtodoit…,Howitworks…manualmigration/Manuallysplittingandmigratingchunks,Howitworks…

clientfield,db.currentOp()operation/Howitworks…clientfield,operations/Howitworks…cloudcomputing

URL/Introductioncloudformation

URL/Seealsocollection

renaming/Renamingacollection,Howtodoit…,Howitworks…collectionbehavior

modifying,withcollModcommand/ModifyingcollectionbehaviorusingthecollModcommand,Howtodoit…,Howitworks…

collectionsfield,databasestats/Howitworks…collectionstats

viewing/Viewingcollectionstats,Howitworks…collModcommand

used,formodifyingcollectionbehavior/ModifyingcollectionbehaviorusingthecollModcommand,Howtodoit…,Howitworks…

command-lineoptionsused,forstartingsinglenodeinstance/Startingasinglenodeinstanceusingcommand-lineoptions,Howitworks…,Seealso—help/-h/Howitworks…—config/-f/Howitworks…—verbose/-v/Howitworks…—quiet/Howitworks…—port/Howitworks…—logpath/Howitworks…—logappend/Howitworks…—dbpath/Howitworks…—smallfiles/Howitworks…

—replSet/Howitworks…—configsvr/Howitworks…—shardsvr/Howitworks…—oplogSize/Howitworks…

commandcolumn/Howitworks…configdatabase

exploring,inshardedsetup/Exploringtheconfigdatabaseinashardedsetup,Howtodoit…,Howitworks…

configfileoptionsused,forsinglenodeinstallationofMongoDB/SinglenodeinstallationofMongoDBwithoptionsfromtheconfigfile,Howitworks…

conncolumn/Howitworks…connectionIdfield,operations/Howitworks…countfield,collectionstats/Howitworks…coveredindexes

about/Improvementusingcoveredindexesusing/Improvementusingcoveredindexes

customuserrolesURL/Seealso

D-doption/Howitworks…data

deleting,fromMongoshell/Gettingready,Howtodoit…updating,fromMongoshell/Gettingready,Howtodoit…storing,toGridFSfromJavaclient/StoringdatatoGridFSfromaJavaclient,Howtodoit…,Howitworks…storing,toGridFSfromPythonclient/StoringdatatoGridFSfromaPythonclient,Howtodoit…,Howitworks…backingup,without-of-theboxtoolsinMongoDB/BackingupandrestoringdatainMongousingout-of-theboxtools,Howitworks…restoring,without-of-theboxtoolsinMongoDB/BackingupandrestoringdatainMongousingout-of-theboxtools,Howitworks…

databasestatsviewing/Viewingdatabasestats,Howtodoit…,Howitworks…

DataCenterI/Readpreferenceforquerying,KnowingtheinternalsDataCenterII/Readpreferenceforquerying,Knowingtheinternalsdatafilepreallocation

disabling/Disablingthepreallocationofdatafiles,Howtodoit…datanucleus

URL/SeealsodataSizefield,databasestats/Howitworks…dbaddresscommand-lineoption,values

mydb/There’smore…mongo.server.host/mydb/There’smore…mongo.server.host*27000/mydb/There’smore…mongo.server.host*27000/There’smore…

dbfield,databasestats/Howitworks…deletecolumn/Howitworks…deleteoperations

executing,withPyMongo/Gettingready,Howtodoit…,Howitworks…executing,withJavaclient/Howtodoit…,Howitworks…,Seealso

descfield,operations/Howitworks…document

manualpadding/Manuallypaddingadocument,Howtodoit…,Howitworks…

domain-drivenshardingperforming,withtags/Performingdomain-drivenshardingusingtags,Gettingready,Howitworks…

driver/Introductionduplicatedata

deleting/Creatinguniqueindexesoncollectionanddeletingtheexistingduplicatedataautomatically,Gettingready,Howitworks…

EEC2

URL/GettingreadyElasticBlockStore(EBS)/SettingupMongoDBonAmazonEC2usingtheMongoDBAMIElasticCloudCompute(E2C)/SettingupMongoDBonAmazonEC2usingtheMongoDBAMIElasticsearch

integrating,withMongoDBforfulltextsearch/IntegratingMongoDBwithElasticsearchforafull-textsearch,Gettingready,Howtodoit…,Howitworks…,There’smore…URL/Gettingready,Seealso

EMRconsoleURL/Howtodoit…

evalfunction/Howitworks…

F—fieldsoption/Howitworks…faultscolumn/Howitworks…fields,collectionstats

ns/Howitworks…count/Howitworks…size/Howitworks…avgObjSize/Howitworks…storageSize/Howitworks…numExtents/Howitworks…nindexes/Howitworks…lastExtentSize/Howitworks…paddingFactor/Howitworks…totalIndexSize/Howitworks…indexSizes/Howitworks…

fields,databasestatsdb/Howitworks…collections/Howitworks…objects/Howitworks…avgObjectSize/Howitworks…dataSize/Howitworks…storageSize/Howitworks…numExtents/Howitworks…indexes/Howitworks…indexSize/Howitworks…fileSize/Howitworks…nsSizeMB/Howitworks…

fields,db.currentOp()operationop/Howitworks…ns/Howitworks…query/Howitworks…nscanned/Howitworks…numYields/Howitworks…lockStats/Howitworks…nreturned/Howitworks…responseLength/Howitworks…millis/Howitworks…ts/Howitworks…client/Howitworks…

fields,operationsopid/Howitworks…active/Howitworks…secs_running/Howitworks…

op/Howitworks…ns/Howitworks…insert/Howitworks…query/Howitworks…client/Howitworks…desc/Howitworks…connectionId/Howitworks…locks/Howitworks…waitingForLock/Howitworks…msg/Howitworks…progress/Howitworks…numYields/Howitworks…lockStats/Howitworks…

fieldsparameter/Howitworks…fileSizefield,databasestats/Howitworks…findAndModifymethod/Howitworks…findAndModifymethod,MongoTemplateclass/Howitworks…findAndRemove/findAllAndRemovemethod,MongoTemplateclass/Howitworks…findByAgeBetweenmethod/Howitworks…findByAgeGreaterThanEqualmethod/Howitworks…findByAgeGreaterThanmethod/Howitworks…findByFirstNameAndCountrymethod/Howitworks…findByResidentialAddressCountrymethod/Howitworks…findoperation

about/Atomicfindandmodifyoperations,Howtodoit…working/Howitworks…

findPeopleByLastNameLikemethod/Howitworks…FirstIn,FirstOut(FIFO)pattern/Howitworks…firstinfirstout(FIFO)/Howitworks…flatplane(2D)geospatialqueries

executing,inMongoDBwithgeospatialindexes/Executingflatplane(2D)geospatialqueriesinMongousinggeospatialindexes,Gettingready,Howtodoit…,Howitworks…

flushescolumn/Howitworks…fulltextsearch

implementing,inMongoDB/Implementingafull-textsearchinMongoDB,Howtodoit…,Howitworks…URL/Howitworks…MongoDB,integratingwithElasticsearch/IntegratingMongoDBwithElasticsearchforafull-textsearch,Gettingready,Howtodoit…,Howitworks…,There’smore…

GGeoJSON

URL/SphericalindexesandGeoJSON-compliantdatainMongoDBGeoJSON-compliantdata,MongoDB

about/SphericalindexesandGeoJSON-compliantdatainMongoDB,Howtodoit…working/Howitworks…

geospatialindexesused,forexecutingflatplane(2D)geospatialqueriesinMongoDB/Executingflatplane(2D)geospatialqueriesinMongousinggeospatialindexes,Gettingready,Howtodoit…,Howitworks…

geospatialoperatorsURL/Howitworks…

getmorecolumn/Howitworks…Git

URL/GettingreadyGridFS

used,forstoringlargedatainMongoDB/StoringlargedatainMongoDBusingGridFS,Howtodoit…,Howitworks…,There’smore…URL/There’smore…

GridFS,fromJavaclientdata,storingto/StoringdatatoGridFSfromaJavaclient,Howtodoit…,Howitworks…

GridFS,fromPythonclientdata,storingto/StoringdatatoGridFSfromaPythonclient,Howtodoit…,Howitworks…

groupsmanaging,onMMSconsole/Gettingready,Howtodoit…,Howitworks…

GUI-basedclientinstalling,forMongoDB/InstallingtheGUI-basedclient,MongoVUE,forMongoDB,Howtodoit…

HHadoop

about/IntroductionURL/Howtodoit…streaming,usedforrunningMapReducejobson/RunningMapReducejobsonHadoopusingstreaming,Howitworks…,Howtodoit…

HadoopMapReducejobwriting/WritingourfirstHadoopMapReducejob,Howtodoit…,Howitworks…

HadoopstreamingURL/RunningMapReducejobsonHadoopusingstreaming

hostinstancemonitoring/MonitoringMongoDBinstancesonMMS,Howtodoit…,Howitworks…,There’smore…

II/Ooperationspersecond(IOPS)/SettingupMongoDBonAmazonEC2usingtheMongoDBAMIidxmiss%column/Howitworks…in-builtprofiler

used,toprofileoperations/Usingprofilertoprofileoperations,Howitworks…index

creating/Gettingready,Howitworks…creating,gotchas/Somegotchasofindexcreation

indexcreation,fromMongoshellpitfalls,avoiding/Backgroundandforegroundindexcreationfromtheshell,Gettingready,Howitworks…

indexesfield,databasestats/Howitworks…indexSizefield,databasestats/Howitworks…indexSizesfield,collectionstats/Howitworks…insertcolumn/Howitworks…insertfield,operations/Howitworks…insertmethod,MongoTemplateclass/Howitworks…insertoperations

executing,withPyMongo/ExecutingqueryandinsertoperationsusingPyMongo,Howtodoit…,Howitworks…executing,withJavaclient/ExecutingqueryandinsertoperationsusingaJavaclient,Howitworks…

interprocesssecurityinMongoDB/UnderstandinginterprocesssecurityinMongoDB,Howtodoit…,There’smore…

IntrastructureasaService(IaaS)/There’smore…

JJavaclient

singlenodeconnection,establishing/ConnectingtoasinglenodefromaJavaclient,Howtodoit…,Howitworks…replicasetconnection,forqueryingdata/Gettingready,Howtodoit…,Howitworks…replicasetconnection,forinsertingdata/Gettingready,Howtodoit…,Howitworks…used,forexecutingqueryoperations/ExecutingqueryandinsertoperationsusingaJavaclient,Howitworks…used,forexecutinginsertoperations/ExecutingqueryandinsertoperationsusingaJavaclient,Howitworks…used,forexecutingupdateoperations/ExecutingupdateanddeleteoperationsusingaJavaclient,Howitworks…used,forexecutingdeleteoperations/ExecutingupdateanddeleteoperationsusingaJavaclient,Howitworks…used,forexecutingaggregationoperationsinMongo/AggregationinMongousingaJavaclient,Gettingready,Howitworks…used,forexecutingMapReduceoperationsinMongo/MapReduceinMongousingaJavaclient,Howitworks…data,storingtoGridFSfrom/StoringdatatoGridFSfromaJavaclient,Howtodoit…,Howitworks…

JavaDatabaseConnectivity(JDBC)/Howitworks…Javadoc

URL/SeealsoJavadocuments

URL/Howitworks…JavaPersistenceAPI

used,foraccessingMongoDB/AccessingMongoDBusingJavaPersistenceAPI,Howtodoit…,Howitworks…URL/Seealso

JavaPersistenceAPI(JPA)about/Introduction

JIRAURL/Howitworks…,Howitworks…

LlastExtentSizefield,collectionstats/Howitworks…leastrecentlyused(LRU)/Howitworks…localdatabase,replicaset

exploring/Exploringthelocaldatabaseofareplicaset,Howtodoit…,Howitworks…

lockedcolumn/Howitworks…locksfield,operations/Howitworks…lockStatsfield,db.currentOp()operation/Howitworks…lockStatsfield,operations/Howitworks…

M-moption/Howitworks…mappedcolumn/Howitworks…MapReduce

URL/Gettingready,Howtodoit…MapReducejob

executing,withmongo-hadoopconnector/ExecutingourfirstsampleMapReducejobusingthemongo-hadoopconnector,Gettingready,Howtodoit…,Howitworks…,There’smore…running,onHadoopwithstreaming/RunningMapReducejobsonHadoopusingstreaming,Howitworks…,Howtodoit…running,onAmazonEMR/RunningaMapReducejobonAmazonEMR,Gettingready,Howtodoit…,Howitworks…

MapReduceoperationimplementing,MongoVUEused/Howtodoit…

MapReduceoperations,inMongoexecuting,withPyMongo/MapReduceinMongousingPyMongo,Howtodoit…,Howitworks…,Seealsoexecuting,withJavaclient/MapReduceinMongousingaJavaclient,Howitworks…

MavenURL,fordownloading/Howtodoit…URL,fordocumentation/Howitworks…

millisfield,db.currentOp()operation/Howitworks…MMS

about/Introductionsigningup/SigningupforMMSandsettinguptheMMSmonitoringagent,Howtodoit…,Howitworks…monitoringagent,settingup/Gettingready,Howtodoit…,Howitworks…URL/Howtodoit…,Seealsoalerts,settingup/MonitoringMongoDBinstancesonMMS,Howtodoit…,Howitworks…,There’smore…monitoringalerts,settingup/SettingupmonitoringalertsonMMS,Howitworks…

MMSbackupserviceconfiguring/ConfiguringtheMMSbackupservice,Howtodoit…,Howitworks…backups,managing/ManagingbackupsintheMMSbackupservice,Howtodoit…,Howitworks…

MMSconsoleusers,managing/ManagingusersandgroupsontheMMSconsole,Howtodoit…,Howitworks…groups,managing/ManagingusersandgroupsontheMMSconsole,Howtodo

it…,Howitworks…modifyoperation

about/Gettingready,Howtodoit…working/Howitworks…

MongoURL/Introductionaggregationoperations,executingwithPyMongo/AggregationinMongousingPyMongo,Howitworks…MapReduceoperations,executingwithPyMongo/MapReduceinMongousingPyMongo,Gettingready,Howitworks…aggregationoperations,executingwithJavaclient/AggregationinMongousingaJavaclient,Howtodoit…,Howitworks…MapReduceoperations,executingwithJavaclient/MapReduceinMongousingaJavaclient,Howitworks…

mongo-connectorURL/Howitworks…

mongo-hadoopconnectorused,forexecutingMapReducejob/ExecutingourfirstsampleMapReducejobusingthemongo-hadoopconnector,Gettingready,Howtodoit…,Howitworks…,There’smore…URL/Howitworks…,Seealso

Mongoclient,options—help/-h/There’smore…—shell/There’smore…—port/There’smore…—host/There’smore…—username/-u/There’smore…—password/-p/There’smore…

MongoconnectorURL/IntegratingMongoDBwithElasticsearchforafull-textsearch

MongoDBtriggers,implementingwithoplog/ImplementingtriggersinMongoDBusingoplog,Gettingready,Howitworks…

MongoDBsinglenodeinstallation/SinglenodeinstallationofMongoDBURL,fordownloading/Gettingreadysinglenodeinstallation,withconfigfileoptions/SinglenodeinstallationofMongoDBwithoptionsfromtheconfigfile,Howitworks…users,settingup/SettingupusersinMongoDB,Gettingready,Howtodoit…,Howitworks…interprocesssecurity/UnderstandinginterprocesssecurityinMongoDB,Gettingready,Howtodoit…,There’smore…settingup,asWindowsService/SettingupMongoDBasaWindowsService,Howtodoit…

atomiccounters,implementing/ImplementingatomiccountersinMongoDB,Howitworks…cappedcollectioncursors,creating/CreatingandtailingcappedcollectioncursorsinMongoDB,Howtodoit…,Howitworks…cappedcollectioncursors,tailing/CreatingandtailingcappedcollectioncursorsinMongoDB,Howtodoit…,Howitworks…binarydata,storingin/StoringbinarydatainMongoDB,Howitworks…largedata,storingwithGridFS/StoringlargedatainMongoDBusingGridFS,Howtodoit…,Howitworks…,There’smore…geospatialindexes,usedforexecutingflatplane(2D)geospatialqueries/Executingflatplane(2D)geospatialqueriesinMongousinggeospatialindexes,Gettingready,Howtodoit…,Howitworks…fulltextsearch,implementingin/Implementingafull-textsearchinMongoDB,Howtodoit…,Howitworks…,There’smore…integrating,withElasticsearchforfulltextsearch/IntegratingMongoDBwithElasticsearchforafull-textsearch,Gettingready,Howtodoit…,Howitworks…URL/There’smore…,ConfiguringtheMMSbackupservice,Howitworks…,Gettingreadydata,backingupwithout-of-theboxtools/BackingupandrestoringdatainMongousingout-of-theboxtools,Howitworks…data,restoringwithout-of-theboxtools/BackingupandrestoringdatainMongousingout-of-theboxtools,Howitworks…operations,performingfromMongoLabGUI/PerformingoperationsonMongoDBfromMongoLabGUI,Howtodoit…,Howitworks…settingup,onAmazonEC2withMongoDBAMI/SettingupMongoDBonAmazonEC2usingtheMongoDBAMI,Gettingready,Howtodoit…,Howitworks…settingup,onAmazonEC2withoutMongoDBAMI/SettingupMongoDBonAmazonEC2withoutusingtheMongoDBAMI,Howtodoit…,Howitworks…accessing,JavaPersistenceAPIused/AccessingMongoDBusingJavaPersistenceAPI,Howitworks…accessing,overREST/AccessingMongoDBoverREST,Howtodoit…,Howitworks…GUI-basedclient,installingfor/InstallingtheGUI-basedclient,MongoVUE,forMongoDB,Howtodoit…MongoVUE,installingfor/InstallingtheGUI-basedclient,MongoVUE,forMongoDB,Howtodoit…queries,writing/Howtodoit…document,insertingincollection/Howtodoit…document,updating/Howtodoit…indexes,creating/Howtodoit…index,dropping/Howtodoit…

aggregationoperations,executing/Howtodoit…MongoDBAMI

used,forsettingupMongoDBonAmazonEC2/SettingupMongoDBonAmazonEC2usingtheMongoDBAMI,Gettingready,Howtodoit…,Howitworks…

MongoDBAPIURL/Howitworks…

MongoDBdriverURL/Seealso

MongoLabURL/SettingupandmanagingtheMongoLabaccountsandboxMongoDBinstance,settingup/SettingupasandboxMongoDBinstanceonMongoLab,Howtodoit…,Howitworks…

MongoLabaccountsettingup/SettingupandmanagingtheMongoLabaccount,Howtodoit…managing/SettingupandmanagingtheMongoLabaccount,Howtodoit…,Howitworks…

MongoLabGUIoperations,performingonMongoDBfrom/PerformingoperationsonMongoDBfromMongoLabGUI,Howtodoit…,Howitworks…

MongoMonitoringService/IntroductionMongoshell

singlenodeconnection,withpreloadedJavaScript/ConnectingtoasinglenodefromtheMongoshellwithapreloadedJavaScript,Howtodoit…,There’smore…shardconnection,creatingfrom/ConnectingtoashardfromtheMongoshellandperformingoperations,Howitworks…pagination,performing/Performingsimplequerying,projections,andpaginationfromtheMongoshell,Howtodoit…,Howitworks…querying/Performingsimplequerying,projections,andpaginationfromtheMongoshell,Howtodoit…,Howitworks…projections,performing/Gettingready,Howitworks…data,deleting/Updatinganddeletingdatafromtheshell,Howtodoit…data,updating/Gettingready,Howtodoit…

/Introductionmongostatutility

about/Understandingthemongostatandmongotoputilities,Howtodoit…working/Howitworks…

MongoTemplateclass,methodssave/Howitworks…remove/Howitworks…updateMulti/Howitworks…updateFirst/Howitworks…insert/Howitworks…

findAndRemove/findAllAndRemove/Howitworks…findAndModify/Howitworks…

mongotoputilityabout/Understandingthemongostatandmongotoputilities,Gettingreadyworking/Howitworks…

MongoVUEinstalling,forMongoDB/InstallingtheGUI-basedclient,MongoVUE,forMongoDB,Howtodoit…URL/Howtodoit…,Seealsoused,forimplementingMapReduceoperation/Howtodoit…used,formonitoringserverinstances/Howtodoit…

monitoringagentURL/Seealso

monitoringalertssettingup,onMMS/SettingupmonitoringalertsonMMS,Howitworks…

msgfield,operations/Howitworks…

N-noption/Howitworks…nearest,readpreference/ReadpreferenceforqueryingnetIncolumn/Howitworks…netOutcolumn/Howitworks…nindexesfield,collectionstats/Howitworks…nonshardedcollections

shard,configuring/Configuringthedefaultshardfornonshardedcollections,Howtodoit…,Howitworks…

normalcollectionconverting,tocappedcollection/Convertinganormalcollectiontoacappedcollection,Howitworks…

nreturnedfield,db.currentOp()operation/Howitworks…nscannedfield,db.currentOp()operation/Howitworks…nsfield,collectionstats/Howitworks…nsfield,db.currentOp()operation/Howitworks…nsfield,operations/Howitworks…nsSizeMB,databasestats/Howitworks…nsSizeMBfield,databasestats/Howitworks…numExtentsfield,collectionstats/Howitworks…numExtentsfield,databasestats/Howitworks…numYieldsfield,db.currentOp()operation/Howitworks…numYieldsfield,operations/Howitworks…

Oobjectrelationalmapping(ORM)

about/Introductionobjectsfield,databasestats/Howitworks…operations

killing/Gettingready,Howtodoit…,Howitworks…viewing/Gettingready,Howtodoit…,Howitworks…profiling,within-builtprofiler/Usingprofilertoprofileoperations,Howitworks…

opfield,db.currentOp()operation/Howitworks…opfield,operations/Howitworks…opidfield,operations/Howitworks…oplog

about/Understandingandanalyzingoplogs,Gettingreadyanalyzing/Howtodoit…working/Howitworks…used,forimplementingtriggersinMongoDB/ImplementingtriggersinMongoDBusingoplog,Gettingready,Howitworks…

options,mongodumputility—help/Howitworks…—host(-h)/Howitworks…—port/Howitworks…—username(-u)/Howitworks…—password(-p)/Howitworks…—authenticationDatabase/Howitworks…—db(-d)/Howitworks…—collection(-c)/Howitworks…—out(-o)/Howitworks…—dbpath/Howitworks…—oplog/Howitworks…

options,Mongoimportutility—type/Howitworks…-d/Howitworks…-c/Howitworks…—headerline/Howitworks…—drop/Howitworks…

options,mongorestoreutility—dbpath/Howitworks…—drop/Howitworks…—oplogReplay/Howitworks…—oplogLimit/Howitworks…

out-of-theboxtoolsused,forbackingupdatainMongoDB/Gettingready,Howitworks…

used,forrestoringdatainMongoDB/Gettingready,Howitworks…

PpaddingFactorfield,collectionstats/Howitworks…pagination

performing,fromMongoshell/Performingsimplequerying,projections,andpaginationfromtheMongoshell,Howitworks…

pipforWindowsURL/Gettingready

postalcodedataURL/Howitworks…

primary,readpreference/ReadpreferenceforqueryingprimaryPreferred,readpreference/Readpreferenceforqueryingprogressfield,operations/Howitworks…projections

performing,fromMongoshell/Performingsimplequerying,projections,andpaginationfromtheMongoshell,Howitworks…

proofofconcept(POC)/There’smore…PuTTY

URL/GettingreadyPyMongo

about/InstallingPyMongoinstalling/Gettingready,Howtodoit…,There’smore…URL/Howtodoit…used,forinsertingoperations/ExecutingqueryandinsertoperationsusingPyMongo,Howtodoit…,Howitworks…used,forexecutingquery/Gettingready,Howtodoit…,Howitworks…used,forexecutingupdateoperations/Gettingready,Howtodoit…,Howitworks…used,forexecutingdeleteoperations/Howtodoit…,Howitworks…used,forexecutingaggregationoperationsinMongo/AggregationinMongousingPyMongo,Howitworks…used,forexecutingMapReduceoperationsinMongo/MapReduceinMongousingPyMongo,Howtodoit…,Howitworks…,Seealso

PythonURL/Gettingready

Pythonclientdata,storingtoGridFS/Gettingready,Howtodoit…,Howitworks…

PythonPackageIndex(PyPI)tool/Howtodoit…

Qqr|qwcolumn/Howitworks…querycolumn/Howitworks…queryexecutiontime

improving/Improvingthequeryexecutiontimequeryfield,db.currentOp()operation/Howitworks…queryfield,operations/Howitworks…querying

performing,fromMongoshell/Performingsimplequerying,projections,andpaginationfromtheMongoshell,Howtodoit…,Howitworks…

queryoperationsexecuting,withPyMongo/Gettingready,Howtodoit…,Howitworks…executing,withJavaclient/ExecutingqueryandinsertoperationsusingaJavaclient,Howitworks…

queryparameter/Howitworks…queryplan

viewing/Gettingready,Howitworks…analysis/Analyzingtheplanexecutiontime,improving/Improvingthequeryexecutiontimeimproving,withindexesusage/Improvementusingindexesimproving,withcoveredindexesusage/Improvementusingcoveredindexes

RRAID

URL/Seealsoreadpreference

about/Readpreferenceforqueryingforquerying/Readpreferenceforqueryingprimary/ReadpreferenceforqueryingprimaryPreferred/Readpreferenceforqueryingsecondary/ReadpreferenceforqueryingsecondaryPreferred/Readpreferenceforqueryingnearest/Readpreferenceforqueryinginternals/Knowingtheinternals

removemethod,MongoTemplateclass/Howitworks…removeparameter/Howitworks…replicaset

creating/Startingmultipleinstancesaspartofareplicasetabout/Startingmultipleinstancesaspartofareplicasetconfiguring/Gettingready,Howtodoit…,Howitworks…,Configuringareplicaset,Gettingreadystandaloneinstance,convertingto/There’smore…URL,forstandaloneinstanceconversion/There’smore…elections/Electionsinareplicasetconfiguration/Basicconfigurationforareplicasetconfiguration,steps/Howtodoit…member,asarbiter/Areplicasetmemberasanarbiterhiddenterm/Hidden,votes,slavedelayed,andbuildindexconfigurationsvotesoption/Hidden,votes,slavedelayed,andbuildindexconfigurationsindexconfiguration,building/Hidden,votes,slavedelayed,andbuildindexconfigurationssteppingdown,asprimaryinstance/Steppingdownasaprimaryinstancefromthereplicaset,Howtodoit…localdatabase,exploring/Exploringthelocaldatabaseofareplicaset,Howtodoit…,Howitworks…indexcreation,URL/Howitworks…taggedreplicasets,building/BuildingtaggedreplicasetsWriteConcern/WriteConcernintaggedreplicasetsReadPreference/ReadPreferenceintaggedreplicasets

replicaset,writeconcernsettingup/Settingupareplicaset

replicasetconnectionestablishing,fromshellforqueryingdata/Connectingtothereplicasetfromtheshelltoqueryandinsertdata,Howtodoit…,Howitworks…establishing,fromshellforinsertingdata/Connectingtothereplicasetfromthe

shelltoqueryandinsertdata,Howtodoit…,Howitworks…establishing,forinsertingdatafromJavaclient/Gettingready,Howtodoit…,Howitworks…establishing,forqueryingdatafromJavaclient/Gettingready,Howtodoit…,Howitworks…

replicasetmemberasarbiter/Areplicasetmemberasanarbiterpriority/Priorityofreplicasetmembers

rescolumn/Howitworks…responseLengthfield,db.currentOp()operation/Howitworks…REST

MongoDB,accessingover/AccessingMongoDBoverREST,Howtodoit…,Howitworks…

returnNewparameter/Howitworks…rs.stepDown()method/Howitworks…

SsandboxMongoDBinstance

settingup,onMongoLab/SettingupasandboxMongoDBinstanceonMongoLab,Howtodoit…,Howitworks…

savefunction/Howitworks…savemethod,MongoTemplateclass/Howitworks…secondary,readpreference/ReadpreferenceforqueryingsecondaryPreferred,readpreference/Readpreferenceforqueryingsecs_runningfield,operations/Howitworks…server-sidescripts

implementing/Implementingserver-sidescripts,Howtodoit…,Howitworks…

serverinstancesmonitoring,MongoVUEused/Howtodoit…

sh.addShardTagmethod/Howitworks…sh.removeShardTagmethod/Howitworks…sh.splitAtfunction/Howitworks…shard

configuring,fornonshardedcollections/Configuringthedefaultshardfornonshardedcollections,Howtodoit…,Howitworks…

shardconnectioncreating,fromMongoshell/ConnectingtoashardfromtheMongoshellandperformingoperations,Howitworks…creating,fordataoperations/ConnectingtoashardfromtheMongoshellandperformingoperations,Howitworks…

shardedsetupconfigdatabase,exploring/Exploringtheconfigdatabaseinashardedsetup,Howtodoit…,Howitworks…

simpleshardedenvironmentstarting,oftwoshards/Startingasimpleshardedenvironmentoftwoshards,Howtodoit…,Howitworks

SimpleStorageService(S3)/Howtodoit…,RunningaMapReducejobonAmazonEMRsinglenodeconnection

establishing,fromMongoshellwithpreloadedJavaScript/ConnectingtoasinglenodefromtheMongoshellwithapreloadedJavaScript,Howtodoit…,There’smore…establishing,fromJavaclient/ConnectingtoasinglenodefromaJavaclient,Howtodoit…,Howitworks…prerequisites,fromJavaclient/Gettingready

singlenodeinstallation,MongoDBabout/SinglenodeinstallationofMongoDB

singlenodeinstance

starting,command-lineoptionsused/Startingasinglenodeinstanceusingcommand-lineoptions,Howitworks…,Seealso

sizefield,collectionstats/Howitworks…slaveDelayoption/Howtodoit…,Hidden,votes,slavedelayed,andbuildindexconfigurationssocialsecuritynumber

about/Howitworks…sortparameter/Howitworks…sparseindexes

about/Creatingandunderstandingsparseindexescreating/Howtodoit…,Howitworks…

sphericalindexes,MongoDBabout/SphericalindexesandGeoJSON-compliantdatainMongoDB,Howtodoit…working/Howitworks…

spring-data-mongodbused,fordevelopment/Developingusingspring-data-mongodb,Howtodoit…,Howitworks…project,URL/Seealso

spring-data-restURL/Seealso

SpringJavadocURL/Howitworks…

standarderror(stderr)/Howtodoit…Stemming

URL/Howitworks…storageSizefield,collectionstats/Howitworks…storageSizefield,databasestats/Howitworks…streaming

used,forrunningMapReducejobsonHadoop/RunningMapReducejobsonHadoopusingstreaming,Howitworks…,Howtodoit…

T$textoperator

URL/Seealso-toption/Howitworks…taggedreplicasets

building/Buildingtaggedreplicasets,Gettingready,Howtodoit…building,usecases/Buildingtaggedreplicasets

tagsused,fordomain-drivenshardingperformance/Performingdomain-drivenshardingusingtags,Howtodoit…,Howitworks…

testdatacreating/Creatingtestdata,Howtodoit…,Howitworks…

timecolumn/Howitworks…totalIndexSizefield,collectionstats/Howitworks…triggers

implementing,inMongoDBwithoplog/ImplementingtriggersinMongoDBusingoplog,Gettingready,Howitworks…

tsfield,db.currentOp()operation/Howitworks…TTLindex

about/ExpiringdocumentsafterafixedintervalusingtheTTLindex,Howtodoit…used,fordocumentexpiringafterfixedinterval/Gettingready,Howitworks…,There’smore…used,fordocumentexpiringatgiventime/ExpiringdocumentsatagiventimeusingtheTTLindex,Howitworks…

Uuniqueindexes,oncollection

creating/Creatinguniqueindexesoncollectionanddeletingtheexistingduplicatedataautomatically,Howtodoit…,Howitworks…

updatecolumn/Howitworks…updateFirstmethod,MongoTemplateclass/Howitworks…updateMultimethod,MongoTemplateclass/Howitworks…updateoperations

executing,withPyMongo/Gettingready,Howtodoit…,Howitworks…executing,withJavaclient/Howtodoit…,Howitworks…

updateparameter/Howitworks…upsertparameter/Howitworks…users

settingup,inMongoDB/SettingupusersinMongoDB,Howtodoit…,Howitworks…managing,onMMSconsole/ManagingusersandgroupsontheMMSconsole,Howtodoit…,Howitworks…

VVirtualBox

URL/Gettingreadyvsizecolumn/Howitworks…

WwaitingForLockfield,operations/Howitworks…WindowsService

MongoDB,settingupas/SettingupMongoDBasaWindowsService,Howtodoit…

workingsetestimating/Estimatingtheworkingset,Howitworks…

WritableinterfaceURL/Seealso

writeconcernabout/Writeconcernanditssignificancesignificance/Writeconcernanditssignificancewkey/Writeconcernanditssignificancejkey/WriteconcernanditssignificanceFsynckey/Writeconcernanditssignificancewtimeoutoption/Writeconcernanditssignificancereplicaset,settingup/Settingupareplicaset

wtimeoutkey/Writeconcernanditssignificance