Post on 20-Jan-2023
TableofContents
MongoDBCookbook
Credits
AbouttheAuthor
AbouttheReviewers
www.PacktPub.com
Supportfiles,eBooks,discountoffers,andmore
WhySubscribe?
FreeAccessforPacktaccountholders
Preface
Whatthisbookcovers
Whatyouneedforthisbook
Whothisbookisfor
Sections
Gettingready
Howtodoit…
Howitworks…
There’smore…
Seealso
Conventions
Readerfeedback
Customersupport
Downloadingtheexamplecode
Errata
Piracy
Questions
1.InstallingandStartingtheMongoDBServer
Introduction
SinglenodeinstallationofMongoDB
Gettingready
Howtodoit…
There’smore…
Seealso
Startingasinglenodeinstanceusingcommand-lineoptions
Gettingready
Howtodoit…
Howitworks…
There’smore…
Seealso
SinglenodeinstallationofMongoDBwithoptionsfromtheconfigfile
Gettingready
Howtodoit…
Howitworks…
ConnectingtoasinglenodefromtheMongoshellwithapreloadedJavaScript
Gettingready
Howtodoit…
Howitworks…
There’smore…
ConnectingtoasinglenodefromaJavaclient
Gettingready
Howtodoit…
Howitworks…
Startingmultipleinstancesaspartofareplicaset
Gettingready
Howtodoit…
Howitworks…
There’smore…
Seealso
Connectingtothereplicasetfromtheshelltoqueryandinsertdata
Gettingready
Howtodoit…
Howitworks…
Seealso
ConnectingtothereplicasettoqueryandinsertdatafromaJavaclient
Gettingready
Howtodoit…
Howitworks…
Startingasimpleshardedenvironmentoftwoshards
Gettingready
Howtodoit…
Howitworks
There’smore…
ConnectingtoashardfromtheMongoshellandperformingoperations
Gettingready
Howtodoit…
Howitworks…
There’smore…
2.Command-lineOperationsandIndexes
Creatingtestdata
Gettingready
Howtodoit…
Howitworks…
Seealso
Performingsimplequerying,projections,andpaginationfromtheMongoshell
Gettingready
Howtodoit…
Howitworks…
Updatinganddeletingdatafromtheshell
Gettingready
Howtodoit…
Howitworks…
Creatinganindexandviewingplansofqueries
Gettingready
Howtodoit…
Howitworks…
Analyzingtheplan
Improvingthequeryexecutiontime
Improvementusingindexes
Improvementusingcoveredindexes
Somegotchasofindexcreation
Backgroundandforegroundindexcreationfromtheshell
Gettingready
Howtodoit…
Howitworks…
Creatinguniqueindexesoncollectionanddeletingtheexistingduplicatedataautomatically
Gettingready
Howtodoit…
Howitworks…
Creatingandunderstandingsparseindexes
Gettingready
Howtodoit…
Howitworks…
ExpiringdocumentsafterafixedintervalusingtheTTLindex
Gettingready
Howtodoit…
Howitworks…
There’smore…
ExpiringdocumentsatagiventimeusingtheTTLindex
Gettingready
Howtodoit…
Howitworks…
There’smore…
3.ProgrammingLanguageDrivers
Introduction
InstallingPyMongo
Gettingready
Howtodoit…
There’smore…
ExecutingqueryandinsertoperationsusingPyMongo
Gettingready
Howtodoit…
Howitworks…
Seealso
ExecutingupdateanddeleteoperationsusingPyMongo
Gettingready
Howtodoit…
Howitworks…
AggregationinMongousingPyMongo
Gettingready
Howtodoit…
Howitworks…
MapReduceinMongousingPyMongo
Gettingready
Howtodoit…
Howitworks…
Seealso
ExecutingqueryandinsertoperationsusingaJavaclient
Gettingready
Howtodoit…
Howitworks…
ExecutingupdateanddeleteoperationsusingaJavaclient
Gettingready
Howtodoit…
Howitworks…
Seealso
AggregationinMongousingaJavaclient
Gettingready
Howtodoit…
Howitworks…
MapReduceinMongousingaJavaclient
Gettingready
Howtodoit…
Howitworks…
Seealso
4.Administration
Renamingacollection
Gettingready
Howtodoit…
Howitworks…
Viewingcollectionstats
Gettingready
Howtodoit…
Howitworks…
Seealso
Viewingdatabasestats
Gettingready
Howtodoit…
Howitworks…
Seealso
Disablingthepreallocationofdatafiles
Howtodoit…
Manuallypaddingadocument
Gettingready
Howtodoit…
Howitworks…
Understandingthemongostatandmongotoputilities
Gettingready
Howtodoit…
Howitworks…
Seealso
Estimatingtheworkingset
Gettingready
Howtodoit…
Howitworks…
Viewingandkillingthecurrentlyexecutingoperations
Gettingready
Howtodoit…
Howitworks…
Usingprofilertoprofileoperations
Gettingready
Howtodoit…
Howitworks…
SettingupusersinMongoDB
Gettingready
Howtodoit…
Howitworks…
There’smore…
Seealso
UnderstandinginterprocesssecurityinMongoDB
Gettingready
Howtodoit…
There’smore…
ModifyingcollectionbehaviorusingthecollModcommand
Gettingready
Howtodoit…
Howitworks…
SettingupMongoDBasaWindowsService
Gettingready
Howtodoit…
Configuringareplicaset
Gettingready
Electionsinareplicaset
Basicconfigurationforareplicaset
Howtodoit…
Howitworks…
Areplicasetmemberasanarbiter
Priorityofreplicasetmembers
Hidden,votes,slavedelayed,andbuildindexconfigurations
There’smore…
Steppingdownasaprimaryinstancefromthereplicaset
Gettingready
Howtodoit…
Howitworks…
Exploringthelocaldatabaseofareplicaset
Gettingready
Howtodoit…
Howitworks…
Seealso
Understandingandanalyzingoplogs
Gettingready
Howtodoit…
Howitworks…
Buildingtaggedreplicasets
Gettingready
Howtodoit…
Howitworks…
WriteConcernintaggedreplicasets
ReadPreferenceintaggedreplicasets
Configuringthedefaultshardfornonshardedcollections
Gettingready
Howtodoit…
Howitworks…
Manuallysplittingandmigratingchunks
Gettingready
Howtodoit…
Howitworks…
Performingdomain-drivenshardingusingtags
Gettingready
Howtodoit…
Howitworks…
Exploringtheconfigdatabaseinashardedsetup
Gettingready
Howtodoit…
Howitworks…
5.AdvancedOperations
Introduction
Atomicfindandmodifyoperations
Gettingready
Howtodoit…
Howitworks…
Seealso
ImplementingatomiccountersinMongoDB
Gettingready
Howtodoit…
Howitworks…
Seealso
Implementingserver-sidescripts
Gettingready
Howtodoit…
Howitworks…
CreatingandtailingcappedcollectioncursorsinMongoDB
Gettingready
Howtodoit…
Howitworks…
Seealso
Convertinganormalcollectiontoacappedcollection
Gettingready
Howtodoit…
Howitworks…
There’smore…
StoringbinarydatainMongoDB
Gettingready
Howtodoit…
Howitworks…
StoringlargedatainMongoDBusingGridFS
Gettingready
Howtodoit…
Howitworks…
There’smore…
Seealso
StoringdatatoGridFSfromaJavaclient
Gettingready
Howtodoit…
Howitworks…
Seealso
StoringdatatoGridFSfromaPythonclient
Gettingready
Howtodoit…
Howitworks…
Seealso
ImplementingtriggersinMongoDBusingoplog
Gettingready
Howtodoit…
Howitworks…
Executingflatplane(2D)geospatialqueriesinMongousinggeospatialindexes
Gettingready
Howtodoit…
Howitworks…
SphericalindexesandGeoJSON-compliantdatainMongoDB
Gettingready
Howtodoit…
Howitworks…
Implementingafull-textsearchinMongoDB
Gettingready
Howtodoit…
Howitworks…
There’smore…
Seealso
IntegratingMongoDBwithElasticsearchforafull-textsearch
Gettingready
Howtodoit…
Howitworks…
There’smore…
Seealso
6.MonitoringandBackups
Introduction
SigningupforMMSandsettinguptheMMSmonitoringagent
Gettingready
Howtodoit…
Howitworks…
There’smore…
ManagingusersandgroupsontheMMSconsole
Gettingready
Howtodoit…
Howitworks…
MonitoringMongoDBinstancesonMMS
Gettingready
Howtodoit…
Howitworks…
There’smore…
Seealso
SettingupmonitoringalertsonMMS
Gettingready
Howtodoit…
Howitworks…
Seealso
BackingupandrestoringdatainMongousingout-of-theboxtools
Gettingready
Howtodoit…
Howitworks…
ConfiguringtheMMSbackupservice
Gettingready
Howtodoit…
Howitworks…
ManagingbackupsintheMMSbackupservice
Gettingready
Howtodoit…
Howitworks…
Seealso
7.CloudDeploymentonMongoDB
Introduction
SettingupandmanagingtheMongoLabaccount
Howtodoit…
Howitworks…
SettingupasandboxMongoDBinstanceonMongoLab
Gettingready
Howtodoit…
Howitworks…
PerformingoperationsonMongoDBfromMongoLabGUI
Gettingready
Howtodoit…
Howitworks…
SettingupMongoDBonAmazonEC2usingtheMongoDBAMI
Gettingready
Howtodoit…
Howitworks…
SettingupMongoDBonAmazonEC2withoutusingtheMongoDBAMI
Gettingready
Howtodoit…
Howitworks…
Seealso
8.IntegrationwithHadoop
Introduction
ExecutingourfirstsampleMapReducejobusingthemongo-hadoopconnector
Gettingready
Howtodoit…
Howitworks…
There’smore…
Seealso
WritingourfirstHadoopMapReducejob
Gettingready
Howtodoit…
Howitworks…
There’smore…
RunningMapReducejobsonHadoopusingstreaming
Gettingready
Howitworks…
Howtodoit…
RunningaMapReducejobonAmazonEMR
Gettingready
Howtodoit…
Howitworks…
Seealso
9.OpenSourceandProprietaryTools
Introduction
Developingusingspring-data-mongodb
Gettingready
Howtodoit…
Howitworks…
Seealso
AccessingMongoDBusingJavaPersistenceAPI
Gettingready
Howtodoit…
Howitworks…
Seealso
AccessingMongoDBoverREST
Gettingready
Howtodoit…
Howitworks…
Seealso
InstallingtheGUI-basedclient,MongoVUE,forMongoDB
Gettingready
Howtodoit…
Howitworks…
Seealso
A.ConceptsforReference
Writeconcernanditssignificance
Settingupareplicaset
Readpreferenceforquerying
Knowingtheinternals
Index
MongoDBCookbookCopyright©2014PacktPublishing
Allrightsreserved.Nopartofthisbookmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,withoutthepriorwrittenpermissionofthepublisher,exceptinthecaseofbriefquotationsembeddedincriticalarticlesorreviews.
Everyefforthasbeenmadeinthepreparationofthisbooktoensuretheaccuracyoftheinformationpresented.However,theinformationcontainedinthisbookissoldwithoutwarranty,eitherexpressorimplied.Neithertheauthor,norPacktPublishing,anditsdealersanddistributorswillbeheldliableforanydamagescausedorallegedtobecauseddirectlyorindirectlybythisbook.
PacktPublishinghasendeavoredtoprovidetrademarkinformationaboutallofthecompaniesandproductsmentionedinthisbookbytheappropriateuseofcapitals.However,PacktPublishingcannotguaranteetheaccuracyofthisinformation.
Firstpublished:November2014
Productionreference:1221114
PublishedbyPacktPublishingLtd.
LiveryPlace
35LiveryStreet
BirminghamB32PB,UK.
ISBN978-1-78216-194-3
www.packtpub.com
CoverimagebyPratyushMohanta(<tysoncinematics@gmail.com>)
CreditsAuthor
AmolNayak
Reviewers
JanBorgelin
DougDuncan
LaurencePutra
LiranTal
KhaledTannir
AcquisitionEditor
NehaNagwekar
ContentDevelopmentEditor
PriyankaShah
TechnicalEditors
VeronicaFernandes
AnkitaThakur
CopyEditors
KarunaNarayanan
ShambhaviPai
ProjectCoordinators
MaryAlex
NehaThakur
Proofreaders
StephenCopestake
PaulHindle
KellyHutchinson
ClydeJenkins
Indexers
MariammalChettiyar
MonicaAjmeraMehta
RekhaNair
Graphics
SheetalAute
AbhinashSahu
ProductionCoordinators
KyleAlbuquerque
ConidonMiranda
NiteshThakur
CoverWork
KyleAlbuquerque
AbouttheAuthorAmolNayakisacertifiedMongoDBdeveloperandhasbeenworkingasadeveloperforover8years.Heiscurrentlyemployedwithaleadingfinancialdataprovider,workingoncutting-edgetechnologies.HehasusedMongoDBasadatabaseforvarioussystemsathiscurrentandpreviousworkplacestosupportenormousdatavolumes.Heisanopensourceenthusiastandsupportsitbycontributingtoopensourceframeworksandpromotingthem.HehasmadecontributionstotheSpringIntegrationproject,andhiscontributionsareadaptersforJPA,XQuery,andMongoDBandpushnotificationsformobiledevicesandAmazonWebServices(AWS).HealsohassomecontributionstotheSpringDataMongoDBproject.Apartfromtechnology,heispassionateaboutmotorsportsandisaraceofficialatBuddhInternationalCircuit,India,forvariousmotor-sportsevents.Earlier,hewastheauthorofInstantMongoDB,PacktPublishing.
IwouldliketothankeveryoneatPacktPublishingwhohavebeeninvolvedwiththisbook.ItstartedwhenLukePreslandfromPacktPublishingapproachedmetoauthorabookonMongoDB.Iwasskepticaltotakeuptheopportunityduetoothercommitmentsandtightdeadlines,butifitwasn’tformymom,friends,andofficecolleagues,whoconvincedmetotakeuptheopportunity,Iwouldnothavewrittenthisbook.Thechaptersandcontenttobecoveredwasalot,andIwashavingatoughtimekeepingupwiththetimelines.AspecialthankstoPriyankaShah,RebeccaPedley,MaryAlex,andJoelGoveya,withwhomIinteractedthemost;theywereveryflexibletomychangesindeliverytimelines.AbigthankstoDougDuncanandotherreviewersofthebookforreviewingthebookcloselyandhelpingimprovethequalityofthecontentdrastically.Finally,IwouldliketothanktheotherstaffatPacktPublishingwhowereinvolvedinthebook’spublishingprocessbuthaven’tinteractedwithme.
AbouttheReviewersJanBorgelinisatechnicalgeekwithover15yearsofprofessionalsoftware-developmentexperience.HeiscurrentlytheCTOofBAGroupLtd.,aconsultancybasedinFinland.BAGroupwasoneoftheearlyadoptersofMongoDBandthefirstofficialMongoDBpartnerinScandinavia.
DougDuncanhasbeenworkingwithRDBMSesforthepast15yearsandhasstartedshiftinggearstowardsthenewerdatastoressincethepast3years.HehasfocusedmainlyonMongoDBsincehecameacrossthe0.8release.InadditiontohisdayjobasaMongoDBdatabaseadministrator,heworksasanonlineteachingassistantfortheMongoDBeducationteamforseveraloftheironlinecourses(https://university.mongodb.com/),wherehehelpsstudentsunderstandhowMongoDBworks.Whennotworking,helikestoreadaboutnewtechnologiesandtrytofigureouthowtheycanintegrateandworkinconjunctionwiththemoreestablishedsystemsalreadyinplace.
LaurencePutraisasoftwareengineerworkinginSingaporeandrunstheSingaporeMongoDBUserGroup.Inhisfreetime,hehacksawayonrandomstuffandpicksupnewtechnologies.Hiskeyinterestslieinsecurityanddistributedsystems.Formoreinformation,viewhisprofileatgeeksphere.net.
LiranTalisacertifiedMongoDBdeveloperandtopcontributortotheopensourceMEAN.IOandMEAN.JSfull-stackJavaScriptframeworks.Beinganavidsupporterofandcontributortotheopensourcemovement,in2007,heredefinednetworkRADIUSmanagementbyestablishingdaloRADIUS,aworld-recognizedandindustry-leadingopensourceproject.
LiraniscurrentlyworkingatHPSoftwareasanR&DteamleaderonacombinedtechnologystackfeaturingaDrupal-basedcollaborationplatform,Java,Node.js,andMongoDB.
AtHPLiveNetwork,Liranplaysakeyroleinsystem-architecturedesign,shapingthetechnologystrategyfromplanninganddevelopmenttodeploymentandmaintenanceinHP’sIaaScloud.Actingasthetechnologicalfocalpoint,helovesmentoringteammates,drivingforbettercodemethodology,andseekingoutinnovativesolutionstosupportbusinessstrategies.
Hehasacumlaude(honors)inhisBachelor’sdegreeinBusinessandInformationSystemsAnalysisstudiesandenjoysspendinghistimewithhisbelovedwife,Tal,andhisnewbornson,Ori.Amongotherthings,hishobbiesincludeplayingtheguitar,hackingallthingsonLinux,andcontinuouslyexperimentingwithandcontributingtoopensourceprojects.
KhaledTannirisavisionarysolutionarchitectwithmorethan20yearsoftechnicalexperience,focusingonBigDatatechnologiesanddataminingsince2010.
HeiswidelyrecognizedasanexpertinthesefieldsandhasaMasterofResearchdegree
inBigDataandCloudComputingandaMaster’sdegreeinSystemInformationArchitectureswithinitiallyaBachelorofTechnologydegreeinElectronics.
KhaledisaMicrosoftCertifiedSolutionsDeveloper(MCSD)andanavidtechnologist.HeworkedformanycompaniesinFrance(andrecentlyinCanada),leadingthedevelopmentandimplementationofsoftwaresolutionsandgivingtechnicalpresentations.
HeistheauthorofRavenDB2.xBeginner’sGuideandOptimizingHadoopforMapReduceandisthetechnicalreviewerforPentahoAnalyticsforMongoDBandMongoDBHighAvailability,allavailableatPacktPublishing.
Heenjoystakinglandscapeandnightphotos;traveling;playingvideogames;creatingfunnyelectronicgadgetswithArduino,RaspberryPI,and.NETGadgeteer;andofcourse,spendingtimewithhiswifeandfamily.
Youcanreachhimat<contact@khaledtannir.net>.
Supportfiles,eBooks,discountoffers,andmoreForsupportfilesanddownloadsrelatedtoyourbook,pleasevisitwww.PacktPub.com.
DidyouknowthatPacktofferseBookversionsofeverybookpublished,withPDFandePubfilesavailable?YoucanupgradetotheeBookversionatwww.PacktPub.comandasaprintbookcustomer,youareentitledtoadiscountontheeBookcopy.Getintouchwithusat<service@packtpub.com>formoredetails.
Atwww.PacktPub.com,youcanalsoreadacollectionoffreetechnicalarticles,signupforarangeoffreenewslettersandreceiveexclusivediscountsandoffersonPacktbooksandeBooks.
https://www2.packtpub.com/books/subscription/packtlib
DoyouneedinstantsolutionstoyourITquestions?PacktLibisPackt’sonlinedigitalbooklibrary.Here,youcansearch,access,andreadPackt’sentirelibraryofbooks.
WhySubscribe?FullysearchableacrosseverybookpublishedbyPacktCopyandpaste,print,andbookmarkcontentOndemandandaccessibleviaawebbrowser
FreeAccessforPacktaccountholdersIfyouhaveanaccountwithPacktatwww.PacktPub.com,youcanusethistoaccessPacktLibtodayandview9entirelyfreebooks.Simplyuseyourlogincredentialsforimmediateaccess.
PrefaceMongoDBisadocument-oriented,leadingNoSQLdatabase,whichofferslinearscalability,thusmakingitagoodcontenderforhigh-volume,high-performancesystemsacrossallbusinessdomains.IthasanedgeoverthemajorityofNoSQLsolutionsforitseaseofuse,highperformance,andrichfeatures.
ThisbookprovidesdetailedrecipesthatdescribehowtousethedifferentfeaturesofMongoDB.TherecipescovertopicsrangingfromsettingupMongoDB,knowingitsprogramming-languageAPI,monitoringandadministration,tosomeadvancedtopicssuchasclouddeployment,integrationwithHadoop,andsomeopensourceandproprietarytoolsforMongoDB.Therecipeformatpresentstheinformationinaconcise,actionableform;thisletsyourefertotherecipetoaddressandknowthedetailsofjusttheusecaseinhand,withoutgoingthroughtheentirebook.
WhatthisbookcoversChapter1,InstallingandStartingtheMongoDBServer,isallaboutstartingMongoDB.Itwilldemonstratehowtostarttheserverinthestandalonemode,asareplicaset,andasashard,withtheprovidedstart-upoptionsfromthecommandlineorconfigfile.
Chapter2,Command-lineOperationsandIndexes,hassimplerecipestoperformCRUDoperationsfromtheMongoshellandcreatevarioustypesofindexesfromtheshell.
Chapter3,ProgrammingLanguageDrivers,isaboutprogramminglanguageAPIs.ThoughMongosupportsavastarrayoflanguages,wewilllookathowtousethedriverstoconnecttotheMongoDBserverfromJavaandPythonprogramsonly.ThischapteralsoexplorestheMongoDBwireprotocolusedforcommunicationbetweentheserverandtheprogramming-languageclients.
Chapter4,Administration,containsmanyrecipesaroundadministrationoryourMongoDBdeployment.Thischaptercoversalotoffrequentlyusedadministrativetaskssuchasviewingthestatsofthecollectionsanddatabase,viewingandkillinglong-runningoperationsandotherreplica,andsharding-relatedadministration.
Chapter5,AdvancedOperations,isanextensionofChapter2,Command-lineOperationsandIndexes.Wewilllookatsomeoftheslightlyadvancedfeaturessuchasimplementingserver-sidescripts,geospatialsearch,GridFS,full-textsearch,andhowtointegrateMongoDBwithanexternalfull-textsearchengine.
Chapter6,MonitoringandBackups,isallaboutadministrationandsomebasicmonitoring.However,MongoDBprovidesastate-of-the-artmonitoringandreal-timebackupservice,MongoDBMonitoringService(MMS).Inthischapter,wewilllookatsomerecipesaroundmonitoringandbackupusingMMS.
Chapter7,CloudDeploymentonMongoDB,coversrecipesthatuseMongoDBserviceprovidersforclouddeployment,andwewillsetupourownMongoDBserverontheAWScloud.
Chapter8,IntegrationwithHadoop,coversrecipestointegrateMongoDBwithHadooptousetheHadoopMapReduceAPItorunMapReducejobsondataresidinginMongoDB/MongoDBdatafilesandwritetheresultsbacktothem.WewillalsoseehowtouseAWSEMRtorunourMapReducejobsonthecloudusingAmazon’smanagedHadoopcluster,EMRwiththemongo-hadoopconnector.
Chapter9,OpenSourceandProprietaryTools,isaboutusingframeworksandproductsbuiltaroundMongoDBtoimproveadeveloper’sproductivityoraboutmakingsomeoftheday-to-dayjobsinusingMongoeasy.Unlessexplicitlymentioned,theproducts/frameworkswewillbelookingatinthischapterareopensource.
Appendix,ConceptsforReference,givesyouabitofadditionalinformationonwriteconcernandreadpreferenceforreference.
WhatyouneedforthisbookTheversionofMongoDBusedtotryouttherecipesis2.4.6.Therecipesholdgoodforversion2.6.xaswell.Incaseofsomespecialfeaturespecifictoversion2.6.x,itwouldbeexplicitlymentionedintherecipe.
ThesampleswhereJavaprogrammingwasinvolvedweretestedandrunonJavaVersion1.7.40.PythonVersion2.7isusedwhereverPythonisused.ForMongoDBdrivers,youmaychoosetousethelatestavailableversion.
Theseareprettycommontypesofsoftware,andtheirminimumversionsareusedacrossdifferentrecipes.Alltherecipesinthisbookwillmentiontherequiredsoftwaretocompleteitandtheirrespectiveversion.SomerecipesneedtobetestedonWindowssystemwhilesomeonLinux.
WhothisbookisforThisbookisdesignedforadministratorsanddeveloperswhoareinterestedinknowingMongoDBandusingitasahigh-performanceandscalabledatastorage.ItisalsoforthosewhoknowthebasicsofMongoDBandwouldliketoexpandtheirknowledgefurther.TheaudienceofthisbookisexpectedtoatleasthavesomebasicknowledgeofMongoDB.
SectionsInthisbook,youwillfindseveralheadingsthatappearfrequently(Gettingready,Howtodoit,Howitworks,There’smore,andSeealso).
Togiveclearinstructionsonhowtocompletearecipe,weusethesesectionsasfollows:
GettingreadyThissectiontellsyouwhattoexpectintherecipe,anddescribeshowtosetupanysoftwareoranypreliminarysettingsrequiredfortherecipe.
There’smore…Thissectionconsistsofadditionalinformationabouttherecipeinordertomakethereadermoreknowledgeableabouttherecipe.
ConventionsInthisbook,youwillfindanumberoftextstylesthatdistinguishbetweendifferentkindsofinformation.Herearesomeexamplesofthesestylesandanexplanationoftheirmeaning.
Codewordsintext,databasetablenames,foldernames,filenames,fileextensions,pathnames,dummyURLs,userinput,andTwitterhandlesareshownasfollows:“Createthe/data/mongo/dbdirectory(oranyofyourchoice).”
Ablockofcodeissetasfollows:
importcom.mongodb.DB;
importcom.mongodb.DBCollection;
importcom.mongodb.DBObject;
importcom.mongodb.MongoClient;
Anycommand-lineinputoroutputiswrittenasfollows:
$sudoapt-getinstalldefault-jdk
Newtermsandimportantwordsareshowninbold.Wordsthatyouseeonthescreen,forexample,inmenusordialogboxes,appearinthetextlikethis:“Withouteditinganydefaultsettings,clickonLaunch.”
NoteWarningsorimportantnotesappearinaboxlikethis.
TipTipsandtricksappearlikethis.
ReaderfeedbackFeedbackfromourreadersisalwayswelcome.Letusknowwhatyouthinkaboutthisbook—whatyoulikedordisliked.Readerfeedbackisimportantforusasithelpsusdeveloptitlesthatyouwillreallygetthemostoutof.
Tosendusgeneralfeedback,simplye-mail<feedback@packtpub.com>,andmentionthebook’stitleinthesubjectofyourmessage.
Ifthereisatopicthatyouhaveexpertiseinandyouareinterestedineitherwritingorcontributingtoabook,seeourauthorguideatwww.packtpub.com/authors.
CustomersupportNowthatyouaretheproudownerofaPacktbook,wehaveanumberofthingstohelpyoutogetthemostfromyourpurchase.
DownloadingtheexamplecodeYoucandownloadtheexamplecodefilesfromyouraccountathttp://www.packtpub.comforallthePacktPublishingbooksyouhavepurchased.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.
ErrataAlthoughwehavetakeneverycaretoensuretheaccuracyofourcontent,mistakesdohappen.Ifyoufindamistakeinoneofourbooks—maybeamistakeinthetextorthecode—wewouldbegratefulifyoucouldreportthistous.Bydoingso,youcansaveotherreadersfromfrustrationandhelpusimprovesubsequentversionsofthisbook.Ifyoufindanyerrata,pleasereportthembyvisitinghttp://www.packtpub.com/submit-errata,selectingyourbook,clickingontheErrataSubmissionFormlink,andenteringthedetailsofyourerrata.Onceyourerrataareverified,yoursubmissionwillbeacceptedandtheerratawillbeuploadedtoourwebsiteoraddedtoanylistofexistingerrataundertheErratasectionofthattitle.
Toviewthepreviouslysubmittederrata,gotohttps://www.packtpub.com/books/content/supportandenterthenameofthebookinthesearchfield.TherequiredinformationwillappearundertheErratasection.
PiracyPiracyofcopyrightedmaterialontheInternetisanongoingproblemacrossallmedia.AtPackt,wetaketheprotectionofourcopyrightandlicensesveryseriously.IfyoucomeacrossanyillegalcopiesofourworksinanyformontheInternet,pleaseprovideuswiththelocationaddressorwebsitenameimmediatelysothatwecanpursuearemedy.
Pleasecontactusat<copyright@packtpub.com>withalinktothesuspectedpiratedmaterial.
Weappreciateyourhelpinprotectingourauthorsandourabilitytobringyouvaluablecontent.
QuestionsIfyouhaveaproblemwithanyaspectofthisbook,youcancontactusat<questions@packtpub.com>,andwewilldoourbesttoaddresstheproblem.
Chapter1.InstallingandStartingtheMongoDBServerInthischapter,wewillcoverthefollowingrecipes:
SinglenodeinstallationofMongoDBStartingasinglenodeinstanceusingcommand-lineoptionsSinglenodeinstallationofMongoDBwithoptionsfromtheconfigfileConnectingtoasinglenodefromtheMongoshellwithapreloadedJavaScriptConnectingtoasinglenodefromaJavaclientStartingmultipleinstancesaspartofareplicasetConnectingtothereplicasetfromtheshelltoqueryandinsertdataConnectingtothereplicasettoqueryandinsertdatafromaJavaclientStartingasimpleshardedenvironmentoftwoshardsConnectingtoashardfromtheMongoshellandperformingoperations
IntroductionInthischapter,wewilllookatstartinguptheMongoDBserver.Thoughitisacakewalktostarttheserverfordevelopmentpurposesandwiththedefaultsettings,therearenumerousoptionsthatletustunethestartupbehavior.Wewillstarttheserverasasinglenode;then,we’llintroducevariousconfigurationsbeforeweconcludebystartingupasimplereplicasetandashardedsetup.So,let’sgetstartedbyinstallingandsettinguptheMongoDBserverintheeasiestwaypossible,forsimpledevelopmentpurposes.
SinglenodeinstallationofMongoDBInthisrecipe,wewilllookattheprocessofinstallingMongoDBinthestandalonemode.ThisisthesimplestandquickestwaytostartaMongoDBserverbutisseldomusedforproductionusecases.However,thisisthemostcommonwaytostarttheserverforthepurposeofdevelopment.Inthisrecipe,wewillstarttheserverwithoutlookingatalotofotherstartupoptions.
GettingreadyWell,assumingthatwehavedownloadedtheMongoDBbinariesfromthedownloadsite,extractedthem,andhavethebindirectoryofMongoDBintheoperatingsystem’spathvariable(thisisnotmandatorybutitreallybecomesconvenient),thebinariescanbedownloadedfromhttp://www.mongodb.org/downloadsafterselectingyourhostoperatingsystem.
Howtodoit…PerformthefollowingstepstostartwiththesinglenodeinstallationofMongoDB:
1. Createthe/data/mongo/dbdirectory(oranyofyourchoice).Thiswillbeourdatabasedirectory,anditneedstohavepermissiontoletthemongodprocess(themongoserverprocess)writetoit.
2. Wewillstarttheserverfromtheconsolewiththe/data/mongo/dbdatadirectoryasfollows:
$mongod--dbpath/data/mongo/db
There’smore…Ifyouseethefollowingmessageontheconsole,youhavesuccessfullystartedtheserver:
[initandlisten]waitingforconnectionsonport27017
Startingaservercan’tgeteasierthanthis.Despitethesimplicityinstartingtheserver,therearealotofconfigurationoptionsthatwillbeusedtotunethebehavioroftheserveronstartup.Mostofthedefaultoptionsaresensibleandneednotbechanged.Withthedefaultvalues,theservershouldbelisteningtoport27017fornewconnections,andthelogswillbeprintedouttothestandardoutput.
Startingasinglenodeinstanceusingcommand-lineoptionsInthisrecipe,wewillseehowtostartastandalonesingleNodeserverwithsomecommand-lineoptions.Wewillseeanexamplewherewewillperformthefollowingtasks:
Startingtheserverthatlistenstoport27000Writinglogsto/logs/mongo.logSettingthedatabasedirectoryto/data/mongo/db
Sincetheserverisstartedfordevelopmentpurposes,wedon’twanttopreallocatefullsizedatabasefiles(wewillsoonseewhatthismeans).
GettingreadyIfyouhavealreadyseenandexecutedthestepsmentionedintheSinglenodeinstallationofMongoDBrecipe,youneednotdoanythingdifferent.Ifalltheprerequisitesaremet,wearegoodforthisrecipetoo.
Howtodoit…Youcanstartasinglenodeinstanceusingcommand-lineoptionswiththefollowingsteps:
1. The/data/mongo/dbdirectoryforthedatabaseand/logs/forthelogsshouldbecreatedandpresentonyourfilesystemwithappropriatewritepermissions.
2. Executethefollowingcommand:
>mongod--port27000--dbpath/data/mongo/db--logpath/logs/mongo.log
--smallfiles
TipDownloadingtheexamplecode
YoucandownloadtheexamplecodefilesforallPacktbooksyouhavepurchasedfromyouraccountathttp://www.packtpub.com.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.
Howitworks…OK,thiswasn’ttoodifficultandissimilartothepreviousrecipe,butwehavesomeadditionalcommand-lineoptionsthistimearound.MongoDBactuallysupportsquiteafewoptionsatstartup,andwewillseealistoftheonesthataremostcommonandimportantinmyopinion:
Option Description
--helpor-h
Thisisusedtoprinttheinformationofvariousstartupoptionsavailable.
--config
or-f
Thisspecifiesthelocationoftheconfigurationfilethatcontainsalltheconfigurationoptions.WewilllearnmoreaboutthisoptionintheSinglenodeinstallationofMongoDBwithoptionsfromtheconfigfilerecipe.Itisjustaconvenientwayofspecifyingtheconfigurationsinafileratherthaninacommandprompt,especiallywhenthenumberofoptionsspecifiedismore.Usingaseparateconfigurationfilesharedacrossdifferentmongodinstanceswillalsoensurethatalltheinstancesarerunningwithidenticalconfigurations.
--verbose
or-vThismakesthelogsmoreverbose.Wecanputmorev’stomaketheoutputevenmoreverbose,forexample,-vvvvv.
--quietThisisthequieteroutput.Thisistheoppositeofverboseorthe-voption.Itwillkeepthelogslesschattyandclean.
--port
Thisoptionisusedifyouarelookingtostarttheserverthatlistenstoaportotherthanthedefault27017.WewillfrequentlyusethisoptionwheneverwearelookingtostartmultipleMongoserversonthesamemachine;forexample,--port27018willstarttheserverthatlistenstoport27018fornewconnections.
--logpath
Thisprovidesapathtoalogfilewherethelogswillbewritten.ThevaluedefaultstoSTDOUT.Forexample,--logpath/logs/server.outwilluse/logs/server.outasthelogfilefortheserver.Rememberthatthevalueprovidedshouldbeafileandnotadirectorywherethelogswillbewritten.
--
logappend
Thisoptionwillappendtotheexistinglogfileifany.ThedefaultbehavioristorenametheexistinglogfileandthencreateanewfileforthelogsofthecurrentlystartedMongoinstance.Let’sassumethatweusedthenameofthelogfileasserver.outandonstartupthefileexists.Then,bydefault,thisfilewillberenamedasserver.out.<timestamp>,where<timestamp>isthecurrenttime.ThetimeisGMTasagainstthelocaltime.SupposethecurrentdateisOctober28,2013andthetimeis12:02:15,thenthefilegeneratedwillhavethe2013-10-28T12-02-15valueasthetimestamp.
--dbpath
Thisprovidesthedirectorywhereanewdatabasewillbecreatedoranexistingdatabaseispresent.Thevaluedefaultsto/data/db.Wewillstarttheserverusing/data/mongo/dbasthedatabasedirectory.Notethatthevalueshouldbeadirectoryratherthanthenameofthefile.
--
smallfiles
ThisisusedfrequentlyfordevelopmentpurposeswhenweplantostartmorethanoneMongoinstanceonourlocalmachine.Onstartup,Mongocreatesadatabasefileofsize64MB(on64-bitmachines).Thispreallocationhappensforperformancereasons,andthefileiscreatedwithzeroswrittentoittofilloutthespaceonthedisk.Addingthisoptiononstartupcreatesapreallocatedfileof16MBonly(againona64-bitmachine).Thisoptionalsoreducesthemaximumsizeofthedatabaseandjournalfiles.Avoidusingthisoptionforproductiondeployments.Also,thedatabasefilesizedoublestoamaximumof2GBbydefault.Ifthe--smallfileoptionischosen,itgoesuptoamaximumof512MB.
--replSet
Thisoptionisusedtostarttheserverasamemberofthereplicaset.Thevalueofthisargumentisthenameofthereplicaset,forexample,--replSetrepl1.Moreinformationonthisoptioniscoveredinthe
Startingmultipleinstancesaspartofareplicasetrecipe,wherewewillstartasimpleMongoreplicaset.
--
configsvr
Thisoptionisusedtostarttheserverasaconfigserver.TheroleoftheconfigserverwillbemadeclearerwhenwesetupasimpleshardedenvironmentintheStartingasimpleshardedenvironmentoftwoshardsrecipeinthischapter.This,however,willbestartedandlistentoport27019bydefaultandthe/data/configdbdatadirectory.Thesecan,ofcourse,beoverriddenusingthe--portand--dbpathoptions.
--shardsvr
Thisinformsthestartedmongodprocessthatthisserverisbeingstartedasashardserver.Bygivingthisoption,theserveralsolistenstoport27018insteadofthedefault27017.Wewilllearnmoreaboutthisoptionwhenwestartasimpleshardedserver.
--
oplogSize
Oplogisthebackboneofreplication.Itisacappedcollectionwherethedatabeingwrittentotheprimaryisstoredtobereplicatedtothesecondaryinstances.Thiscollectionresidesinadatabasenamedlocal.Oninitializationofareplicaset,thediskspacefortheoplogispreallocated,andthedatabasefile(forthelocaldatabase)isfilledwithzerosasplaceholders.Thedefaultvalueis5percentofthediskspace,whichshouldbegoodenoughinmostcases.Thesizeoftheoplogiscrucial,becausecappedcollectionsareofafixedsize,andtheydiscardtheoldestdocumentsinthemuponexceedingtheirsize-makingspacefornewdocuments;iftheoplogsizeistoosmall,itcanresultinthedatabeingdiscardedbeforebeingreplicatedtosecondarynodes.Alargeoplogsizecanresultinunnecessarydisk-spaceutilizationandalongertimeforthereplicasetinitialization.Fordevelopmentpurposes,whenwestartmultipleserverprocessesonthesamehost,wemightwanttokeeptheoplogsizetoaminimumvaluesothatitquicklyinitiatesthereplicasetandusestheminimumdiskspacepossible.
There’smore…Foranexhaustivelistoftheoptionsavailable,usethe--helpor-hoption.Theprecedinglistofoptionsisnotexhaustive,andwewillseesomemorecomingupintheupcomingrecipesasandwhenweneedthem.Inthenextrecipe,wewillseehowtouseaconfigfileinsteadofthecommand-linearguments.
SeealsoTheSinglenodeinstallationofMongoDBwithoptionsfromtheconfigfilerecipetouseconfigfilestoprovidestartupoptionsTostartareplicaset,refertotheStartingmultipleinstancesaspartofareplicasetrecipeTosetupashardedenvironment,refertotheStartingasimpleshardedenvironmentoftwoshardsrecipe
SinglenodeinstallationofMongoDBwithoptionsfromtheconfigfileAswecansee,providingoptionsfromthecommandlinedoesthework,butitstartsgettingawkwardassoonasthenumberofoptionsweprovideincreases.Wehaveaniceandcleanalternativetoprovidingthestartupoptionsfromaconfigurationfileratherthanascommand-linearguments.
GettingreadyIfyouhavealreadyseenandexecutedthestepsmentionedintheSinglenodeinstallationofMongoDBrecipe,youneednotdoanythingdifferent,andalltheprerequisitesofthisrecipearethesame.
Howtodoit…The/data/mongo/dbdirectoryforthedatabaseand/logs/forthelogsshouldbecreatedandpresentonyourfilesystem,withtheappropriatewritepermissions.Let’stakealookatthestepsindetail:
1. Createaconfigfilethatcanhaveanyarbitraryname.Inourcase,let’ssaywecreatethefileat/conf/mongo.conf.Wewilltheneditthefileandaddthefollowinglinesofcodetoit:
port=27000
dbpath=/data/mongo/db
logpath=/logs/mongo.log
smallfiles=true
2. StarttheMongoserverusingthefollowingcommand:
>mongod--config/conf/mongo.conf
Howitworks…Allthecommand-lineoptionswediscussedinthepreviousrecipe,Startingasinglenodeinstanceusingcommand-lineoptions,holdtrue.Wearejustprovidingtheseoptionsinaconfigurationfileinstead.Ifyouhavenotvisitedthepreviousrecipe,Irecommendthatyoudoso,asthisiswherewehavediscussedsomeofthecommoncommand-lineoptions.Thepropertiesarespecifiedas<propertyname>=<value>.Forallthosepropertiesthatdon’thavevalues,forexample,thesmallfilesoption,thevaluegivenisaBooleanvalue,true.Ifyouneedtohaveaverboseoutput,youwilladdv=true(ormultiplev’stomakeitmoreverbose)toourconfigfile.Ifyoualreadyknowwhatthecommand-lineoptionis,itisprettyeasytoguessthevalueofthepropertyinthefile.Itisthesimilartothecommand-lineoption,withjustthehyphenremoved.
ConnectingtoasinglenodefromtheMongoshellwithapreloadedJavaScriptThisrecipeisaboutstartingtheMongoshellandconnectingtoaMongoDBserver.Here,we’llalsodemonstratehowtoloadJavaScriptcodeintotheshell.Thoughthisisnotalwaysrequired,itishandywhenwehavealargeblockofJavaScriptcode,includingvariablesandfunctionswithsomebusinesslogicinthemthatisrequiredtobeexecutedfromtheshellfrequently,andwewantthesefunctionstobeavailableintheshellalways.
GettingreadyItisnotnecessaryfortheMongoDBservertoruntostartashell.WewillrarelystartashellwithoutconnectingittoarunningMongoDBserver.Tostartaserveronthelocalhostwithoutmuchofahassle,takealookatthefirstrecipe,SinglenodeinstallationofMongoDB,andstarttheserver.
Howtodoit…Let’stakealookatthestepsindetail:
1. First,wewillstartbycreatingasimpleJavaScriptfile;let’scallithello.js.Typeinthefollowinglinesinthehello.jsfile:
functionsayHello(name){
print('Hello'+name+',howareyou?')
}
2. Savethisfileat/mongo/scripts.(itcanbesavedatanyotherlocationtoo).3. Inthecommandprompt,executethefollowingcommand:
>mongo--shell/mongo/scripts/hello.js
4. Onexecutingthis,weshouldseethefollowingmessageonourconsole:
MongoDBshellversion:2.4.6
connectingto:test
>
5. Testthedatabasethattheshellisconnectedtobytypingthefollowingcommand:
>db
Thisshouldprintouttestontheconsole.
6. Now,typeinthefollowingcommandontheshell:
>sayHello('Fred')
HelloFred,howareyou?
Howitworks…TheJavaScriptfunctionweexecutedhereisofnopracticaluse,butit’sjustusedtodemonstratehowafunctioncanbepreloadeduponthestartupoftheshell.Therecanbemultiplefunctionsinthe.jsfilethatcontainvalidJavaScriptcode,possiblysomecomplexbusinesslogic.
Whenweexecutedthemongocommandwithoutanyarguments,weconnectedtotheMongoDBserverthatrunsonthelocalhostandlistensfornewconnectionsonthedefaultport27017.Theformatofthecommandisasfollows:
mongo<options><dbaddress><.jsfiles>
Iftherearenoargumentspassedtothemongoexecutable,itisequivalenttopassingdbaddressaslocalhost:27017/test.
There’smore…Let’slookatsomeexamplevaluesofthedbaddresscommand-lineoptionanditsinterpretation:
mydb:Thiswillconnecttotheserverthatrunsonthelocalhostandlistensforconnectiononport27017.Thedatabaseconnectedwillbemydb.mongo.server.host/mydb:Thiswillconnecttotheserverthatrunsonmongo.server.hostandthedefaultport27017.Thedatabaseconnectedwillbemydb.mongo.server.host:27000/mydb:Thiswillconnecttotheserverthatrunsonmongo.server.hostandtheport27000.Thedatabaseconnectedwillbemydb.mongo.server.host:27000:Thiswillconnecttotheserverthatrunsonmongo.server.hostandtheport27000.Thedatabaseconnectedwillbethedefaultdatabase,test.
Now,therearequiteafewoptionsavailableontheMongoclienttoo.Wewillseeafewoftheminthefollowingtable:
Option Description
--help
or–h Thisoffershelpregardingtheusageofvariouscommand-lineoptions.
--shell
When.jsfilesaregivenasarguments,thesescriptsgetexecuted,andtheMongoclientwillexit.ProvidingthisoptionensuresthattheshellremainsrunningaftertheJavaScriptfilesexecute.Allthefunctionsandvariablesdefinedinthese.jsfilesareavailableintheshelluponstartup.Asintheprecedingcase,thesayHellofunctiondefinedintheJavaScriptfileisavailableintheshellforinvocation.
--port ThisspecifiestheportoftheMongoserverwheretheclientneedstoconnect.
--hostThisspecifiesthehostnameoftheMongoserverwheretheclientneedstoconnect.Ifthedbaddressisprovidedwiththehostname,port,anddatabase,boththe--hostand--portoptionsneednotbespecified.
--
username
or–uThisisrelevantwhensecurityisenabledforMongo.Itisusedtoprovidetheusernameoftheusertobeloggedin.
--
password
or–pThisisrelevantwhensecurityisenabledforMongo.Itisusedtoprovidethepasswordoftheusertobeloggedin.
ConnectingtoasinglenodefromaJavaclientThisrecipeisaboutsettinguptheJavaclientforMongoDB.Youwillberepeatedlyreferringtothisrecipewhileworkingonothers,soreaditverycarefully.
GettingreadyThefollowingaretheprerequisitesforthisrecipe:
Version1.6oraboveofJavaSDKisrecommended.UsethelatestavailableversionofMaven.Version3.1.1wasthelatestatthetimeofwritingthisbook.UsetheMongoDBJavadriver.Version2.11.3wasthelatestatthetimeofwritingthisbook.ConnectivitytotheInternettoaccesstheonlineMavenrepositoryoralocalrepositoryisneeded.Alternatively,youmightchooseanappropriatelocalrepositoryaccessibletoyoufromyourcomputer.TheMongoserverisupandrunningonthelocalhostandonport27017.Takealookatthefirstrecipe,SinglenodeinstallationofMongoDB,andstarttheserver.
Howtodoit…Let’stakealookatthestepsindetail:
1. InstallthelatestversionofJDKifyoudon’talreadyhaveitonyourmachine.WewillnotbegoingthroughthestepstoinstallJDKinthisrecipebut,beforemovingonwith,nextstep,theJDKshouldbepresent.Typejavac-versionontheshelltocheckfortheversioninstalled.
2. OncetheJDKissetup,thenextstepistosetupMaven.SkipthenextthreestepsifMavenisalreadyinstalledonyourmachine.
3. Mavenneedstobedownloadedfromhttp://maven.apache.org/download.cgi.Choosethebinariesinthe.tar.gzor.zipformatanddownloadit.ThisrecipeisexecutedonamachinethatrunsontheWindowsplatform;thus,thesestepsareforinstallationonWindows.ThefollowingscreenshotshowsthedownloadpageofMaven:
4. Oncethearchiveisdownloaded,weneedtoextractitandputtheabsolutepathofthebinfolderintheextractedarchiveintheoperatingsystem’spathvariable.MavenalsoneedsthepathoftheJDKtobesetastheJAVA_HOMEenvironmentvariable.RemembertosettherootofyourJDKasthevalueofthisvariable.
5. Allweneedtodonowistypemvn-versioninthecommandprompt.IfyouseetheversionofMavenonthecommandprompt,wehavesuccessfullysetupMaven:
>mvn-version
6. Atthisstage,wehaveMaveninstalled,andwearenowreadytocreateoursimpleprojecttowriteourfirstMongoclientinJava.Wewillstartbycreatingaprojectfolder.Let’sassumethatwecreateafoldercalledMongoJava.Then,wewillcreateafolderstructuresrc/main/javainthisprojectfolder.Therootoftheprojectfolderthencontainsafilecalledpom.xml.Oncethisfoldercreationisdone,thefolderstructureshouldlookasfollows:
MongoJava
+--src
|+main
|+java
|--pom.xml
7. Wejusthavetheprojectskeletonwithusnow.Wewillnowaddsomecontenttothe
pom.xmlfile.Notmuchisneededforthis.Addthefollowingcodesnippetinthepom.xmlfileandsaveit:
<project>
<modelVersion>4.0.0</modelVersion>
<name>MongoJava</name>
<groupId>com.packtpub</groupId>
<artifactId>mongo-cookbook-java</artifactId>
<version>1.0</version>
<packaging>jar</packaging>
<dependencies>
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongo-java-driver</artifactId>
<version>2.11.3</version>
</dependency>
</dependencies>
</project>
8. Finally,wewillwriteourJavaclientthatwillbeusedtoconnecttotheMongoserverandexecutesomeverybasicoperations.ThefollowingistheJavaclasslocatedatsrc/main/javainthecom.packtpub.mongo.cookbookpackage,andthenameoftheclassisFirstMongoClient:
packagecom.packtpub.mongo.cookbook;
importcom.mongodb.BasicDBObject;
importcom.mongodb.DB;
importcom.mongodb.DBCollection;
importcom.mongodb.DBObject;
importcom.mongodb.MongoClient;
importjava.net.UnknownHostException;
importjava.util.List;
/**
*SimpleMongoJavaclient
*
*/
publicclassFirstMongoClient{
/**
*MainmethodfortheFirstMongoClient.Hereweshallbe
connectingtoamongo
*instancerunningonlocalhostandport27017.
*
*@paramargs
*/
publicstaticfinalvoidmain(String[]args)
throwsUnknownHostException{
MongoClientclient=newMongoClient("localhost",27017);
DBtestDB=client.getDB("test");
System.out.println("Droppingpersoncollectionintest
database");
DBCollectioncollection=testDB.getCollection("person");
collection.drop();
System.out.println("Addingapersondocumentintheperson
collectionoftestdatabase");
DBObjectperson=
newBasicDBObject("name","Fred").append("age",30);
collection.insert(person);
System.out.println("NowfindingapersonusingfindOne");
person=collection.findOne();
if(person!=null){
System.out.printf("Personfound,nameis%sandageis
%d\n",person.get("name"),person.get("age"));
}
List<String>databases=client.getDatabaseNames();
System.out.println("Databasenamesare");
inti=1;
for(Stringdatabase:databases){
System.out.println(i+++":"+database);
}
System.out.println("Closingclient");
client.close();
}
}
9. It’snowtimetoexecutetheprecedingJavacode.WewillexecuteitusingMavenfromtheshell.Youshouldbeinthesamedirectoryasthepom.xmlfileoftheproject:
mvncompileexec:java-
Dexec.mainClass=com.packtpub.mongo.cookbook.FirstMongoClient
Howitworks…Thosewerequitealotofstepstofollow!Let’slookatsomeoftheminmoredetail.Everythinguptostep6isstraightforwardanddoesn’tneedanyexplanation.Let’slookattheothersteps.
Thepom.xmlfilewehavehereisprettysimple.WedefinedadependencyonMongo’sJavadriver.Itreliesontheonlinerepository(http://search.maven.org)forresolvingtheartifacts.Foralocalrepository,allweneedtodoisdefinetherepositoriesandpluginRepositoriestagsinpom.xml.FormoreinformationonMaven,refertotheMavendocumentationathttp://maven.apache.org/guides/index.html.
Now,fortheJavaclass,theorg.mongodb.MongoClientclassisthebackbone.Wewillfirstinstantiateitusingoneofitsoverloadedconstructorsthatgivestheserver’shostandport.Inthiscase,thehostnameandportwerenotreallyneededasthevaluesprovidedarethedefaultvaluesanyway,andtheno-argumentconstructorwouldhaveworkedwelltoo.Thefollowinglineofcodeinstantiatesthisclient:
MongoClientclient=newMongoClient("localhost",27017);
Thenextstepistogetthedatabase;inthiscase,testusingthegetDBmethod.Thisisreturnedasanobjectoftypecom.mongodb.DB.Notethatthisdatabasemightnotexist,yetgetDBwillnotthrowanyexception.Instead,thedatabasewillgetcreatedwheneverweaddanewdocumenttothecollectioninthisdatabase.Similarly,getCollectionontheDBobjectwillreturnanobjectoftypecom.mongodb.DBCollection,representingthecollectioninthedatabase.Thistoomightnotexistinthedatabaseandwillgetcreatedautomaticallyupontheinsertionofthefirstdocument.
ThefollowinglinesofcodefromourclassshowhowtogetaninstanceofDBandDBCollection:
DBtestDB=client.getDB("test");
DBCollectioncollection=testDB.getCollection("person");
Beforeweinsertadocument,wewilldropthecollectionsothatevenuponmultipleexecutionsoftheprogram,wewillhavejustonedocumentinthepersoncollection.Thecollectionisdroppedusingthedrop()methodontheDBCollectionobject’sinstance.Next,wewillcreateaninstanceofcom.mongodb.DBObject.Thisisanobjectthatrepresentsthedocumenttobeinsertedinthecollection.TheconcreteclassusedhereisBasicDBObject,whichisatypeofjava.util.LinkedHashMapclass,wherethekeyisastringandthevalueisanobject.ThevaluecanbeanotherDBObject,too,inwhichcaseitisadocumentnestedwithinanotherdocument.Inourcase,wehavetwokeys:nameandage.Thesearethefieldnamesinthedocumenttobeinserted,andthevaluesareoftypestringandinteger,respectively.TheappendmethodofBasicDBObjectaddsanewkey-valuepairtotheBasicDBObjectinstanceandreturnsthesameinstance,whichallowsustochaintheappendmethodcallstoaddmultiplekeyvaluepairs.DBObjectistheninsertedintothecollectionusingtheinsertmethod.ThisishowweinstantiatedaDBObjectforthepersonandinserteditinthecollection:
DBObjectperson=newBasicDBObject("name","Fred").append("age",30);
collection.insert(person);
ThefindOnemethodonDBCollectionisstraightforwardandreturnsonedocumentfromthecollection.ThisversionoffindOnedoesn’tacceptDBObject(which,otherwise,actsasaqueryexecutedbeforeadocumentisselectedandreturned)asaparameter.Thisissynonymoustoexecutingadb.person.findOne()fromtheMongoshell.
Finally,wewillsimplyinvokegetDatabaseNamestogetalistofdatabasesnamesintheserver.Atthispointoftime,weshouldatleastbehavingthetestandlocaldatabasesinthereturnedresult.Oncealltheoperationsarecompleted,wewillclosetheclient.TheMongoClientclassisthread-safe;generally,oneinstanceisusedperapplication.Toexecutetheprogram,wewilluseMaven’sexecplugin.Onexecutingstep9,wewillseethefollowingoutputontheconsole:
[INFO]---exec-maven-plugin:1.2.1:java(default-cli)@mongo-cookbook-java
---
Droppingpersoncollectionintestdatabase
Addingapersondocumentinthepersoncollectionoftestdatabase
NowfindingapersonusingfindOne
Personfound,nameisFredandageis30
Databasenamesare
1:local
2:test
Closingclient
[INFO]--------------------------------------------------------------------
----
[INFO]BUILDSUCCESS
[INFO]--------------------------------------------------------------------
----
[INFO]Totaltime:5.183s
[INFO]Finishedat:WedOct3000:42:29IST2013
[INFO]FinalMemory:7M/19M
[INFO]--------------------------------------------------------------------
----
StartingmultipleinstancesaspartofareplicasetInthisrecipe,wewilllookatstartingmultipleserversonthesamehostbutasacluster.StartingasingleMongoserverisenoughfordevelopmentpurposesorapplicationsthatarenotmission-critical.Forcrucialproductiondeployments,weneedtheavailabilitytobehighwhere,ifoneserverinstancefails,anotherinstancetakesoverandthedataremainsavailableforquerying,inserting,orupdating.Clusteringisanadvancedconcept,andwewon’tbedoingitjusticebycoveringthiswholeconceptinonerecipe.Inthisrecipe,wewilltouchthesurfaceandgetintomoredetailsinotherrecipesinChapter4,Administration,laterinthebook.Inthisrecipe,wewillstartmultipleMongoserverprocessesonthesamemachinefortestingpurpose.Intheproductionenvironment,theywillberunningondifferentmachines(orvirtualmachines)inthesameordifferentdatacenters.
Let’sseeinbriefexactlywhatareplicasetis.Asthenamesuggests,itisasetofserversthatarereplicasofeachotherintermsofdata.LookingathowtheyarekeptinsyncwitheachotherandotherinternalsissomethingwewilldefertosomelaterrecipesinChapter4,Administration,butonethingtorememberisthatwriteoperationswillhappenonlyononenode,theprimaryone.Allthequeryingalsohappensfromtheprimarynodebydefault,thoughwemightpermitreadoperationsonsecondaryinstancesexplicitly.Animportantfacttorememberisthatreplicasetsarenotmeanttoachievescalabilitybydistributingthereadoperationsacrossvariousnodesinareplicaset.Theirsoleobjectiveistoensurehighavailability.
GettingreadyThoughnotaprerequisite,takingalookattheStartingasinglenodeinstanceusingcommand-lineoptionsrecipewilldefinitelymakethingseasier,justincaseyouarenotawareofthevariouscommand-lineoptionsandtheirsignificancewhilestartingaMongoserver.Also,thenecessarybinariesandsetupasmentionedintheSinglenodeinstallationofMongoDBrecipemustbemasteredbeforewecontinuewiththisrecipe.Let’ssumupwhatweneedtodo.
Wewillstartthreemongodprocesses(Mongoserverinstances)onourlocalhost.Then,wewillcreatethreedatadirectories,/data/n1,/data/n2,and/data/n3,fornode1,node2,andnode3,respectively.Similarly,wewillredirectthelogsto/logs/n1.log,/logs/n2.log,and/logs/n3.log.Thefollowingdiagramwillgiveyouanideaastohowtheclusterwilllooklike:
Howtodoit…Let’stakealookatthestepsindetail:
1. Createthe/data/n1,/data/n2,and/data/n3directories,/logsfordata,andlogsofthethreenodes.OntheWindowsplatform,youcanchoosethec:\data\n1,c:\data\n2,c:\data\n3,orc:\logs\directory(oranyotherdirectoryofyourchoice)fordataandlogs,respectively.EnsurethatthesedirectorieshaveappropriatewritepermissionsfortheMongoservertowritethedataandlogs.
2. Startthethreeserversasfollows(notethatusersontheWindowsplatformneedtoskipthe--forkoption,asitisnotsupported):
$mongod--replSetrepSetTest--dbpath/data/n1--logpath/logs/n1.log
--port27000--smallfiles--oplogSize128--fork
$mongod--replSetrepSetTest--dbpath/data/n2--logpath/logs/n2.log
--port27001--smallfiles--oplogSize128--fork
$mongod--replSetrepSetTest--dbpath/data/n3--logpath/logs/n3.log
--port27002--smallfiles--oplogSize128--fork
3. StarttheMongoshellandconnecttoanyoftheMongoserversthatarerunning.Inthiscase,wewillconnecttothefirstone(theonelisteningtoport27000).Executethefollowingcommand:
$mongolocalhost:27000
4. TrytoexecuteaninsertoperationfromtheMongoshellafterconnectingtoitasfollows:
>db.person.insert({name:'Fred',age:35})
Thisoperationshouldfailasthereplicasetisnotinitializedyet.MoreinformationcanbefoundintheHowitworks…sectionofthisrecipe.
5. Thenextstepistostartconfiguringthereplicaset.WewillstartbypreparingaJSONconfigurationintheshell:
cfg={
'_id':'repSetTest',
'members':[
{'_id':0,'host':'localhost:27000'},
{'_id':1,'host':'localhost:27001'},
{'_id':2,'host':'localhost:27002'}
]
}
6. Thelaststepistoinitiatethereplicasetwiththeprecedingconfigurationasfollows:
>rs.initiate(cfg)
Executers.status()afterafewsecondsontheshelltoseethestatus.Inafewseconds,oneofthemshouldbecomeprimary,andtheremainingtwoshouldbecomesecondary.
Howitworks…Wedescribedthecommonoptionsandallthesecommand-lineoptionsintheStartingasinglenodeinstanceusingcommand-lineoptionsrecipeindetail.
Aswearestartingthreeindependentmongodservices,wehavethreededicateddatabasepathsonthefilesystem.Similarly,wehavethreeseparatelogfilelocationsforeachoftheprocesses.Wethenstartedthreemongodprocesseswiththedatabaseandlogfilepathspecified.Asthissetupisfortestpurposesandstartedonthesamemachine,weusedthe--smallfilesand--oplogSizeoptions.Avoidusingtheseoptionsintheproductionenvironment.Asthesearerunningonthesamehost,wealsochoosetheportsexplicitlytoavoidportconflicts.Theportswechosehereare27000,27001,and27002.Whenwestarttheserversondifferenthosts,wemightormightnotchooseaseparateport.Wecanverywellchoosetousethedefaultonewheneverpossible.
The--forkoptiondemandssomeexplanation.Bychoosingthisoption,westartedtheserverasabackgroundprocessfromouroperatingsystem’sshellandgotthecontrolbackintheshell,wherewecanthenstartmoresuchmongodprocessesorperformotheroperations.Intheabsenceofthe--forkoption,wecannotstartmorethanoneprocesspershellandwillneedtostartthreemongodprocessesinthreeseparateshells.Thisoption,however,doesn’tworkontheWindowsplatform,andweneedtostartoneprocesspershell.Wecan,however,executethefollowingcommandtospawnanewshellandthenstartthenewMongoserviceinthisnewlyspawnedshell:
startmongod--replSetrepSetTest--dbpathc:\data\c1--logpath
c:\logs\n1.log--port27000--smallfiles--oplogSize128
Theprecedingcommandallowsustohaveabatchfile(a.batfile)thatcontainsallthelogictocreatetherelevantdirectoriesandthenspawnthreemongodprocessesinthreeshells.
Let’sgetbacktothereplicasetcreation;wearenotyetdonewithsettingupareplicaset.Ifwetakealookatthelogsgeneratedinthelogdirectory,wewillseethefollowinglinesinit:
[rsStart]replSetcan'tgetlocal.system.replsetconfigfromselforany
seed(EMPTYCONFIG)
[rsStart]replSetinfoyoumayneedtorunreplSetInitiate—rs.initiate()in
theshell—ifthatisnotalreadydone
Thoughwestartedthreemongodprocesseswiththe--replSetoption,westillhaven’tconfiguredthemtoworkwitheachotherasareplicaset.Thiscommand-lineoptionisjustusedtotelltheserveronstartupthatthisprocesswillberunningaspartofareplicaset.Thenameofthereplicasetisthesameasthevalueofthisoptionpassedonthecommandprompt.Thisalsoexplainswhytheinsertoperationexecutedononeofthenodesfailedbeforethereplicasetwasinitialized.Inmongoreplicasets,onlyonenodeistheprimarynodewherealltheinsertsandqueryinghappen.Intheprecedingdiagram,noden1isshownastheprimarynodeandlistenstoport27000forclientconnections.Alltheothernodesareslave/secondaryinstancesthatsyncthemselvesupwiththeprimarynode;hence,
queryingtooisdisabledonthembydefault.Itisonlywhentheprimarynodegoesdownthatoneofthesecondariestakesoverandbecomesaprimarynode.Itis,however,possibletoquerythesecondaryinstancesfordata,asweshowedintheprecedingdiagram.Wewillseehowtoqueryfromasecondaryinstanceinthenextrecipe.
Well,allthatisleftnowistoconfigurethereplicasetbygroupingthethreeprocesseswestarted.ThisisdonebyfirstdefiningaJSONobjectasfollows:
cfg={
'_id':'repSetTest',
'members':[
{'_id':0,'host':'localhost:27000'},
{'_id':1,'host':'localhost:27001'},
{'_id':2,'host':'localhost:27002'}
]
}
Therearetwofields,_idandmembers,fortheuniqueIDofthereplicasetandanarrayofthehostnamesandportnumbersofthemongodserverprocessesaspartofthisreplicaset,respectively.Usingthelocalhosttorefertothehostisnotaverygoodideaandisusuallydiscouraged.However,inthiscase,westartedalltheprocessesonthesamemachine;thus,weareOKwithit.Itis,however,preferredtorefertothehostsbytheirhostnameseveniftheyarerunningonthelocalhost.Notethatyoucannotmixreferringtheinstancesusingthelocalhostandhostnamesbothinthesameconfig.Youcanuseeitherthehostnamesorthelocalhost.Toconfigurethereplicaset,wethenconnecttoanyoneofthreerunningmongodprocesses;inthiscase,wewillconnecttothefirstoneandthenexecutethefollowingcommandfromtheshell:
>rs.initiate(cfg)
The_idinthecfgobjectpassedhasthesamevalueasthevaluewegavetothe--replSetoptioninthecommandpromptwhenwestartedtheserverprocesses.Notgivingthesamevaluewillthrowthefollowingerror:
{
"ok":0,
"errmsg":"couldn'tinitiate:setnamedoesnotmatchtheset
namehostAmol-PC:27000expects"
}
Ifallgoeswellandtheinitiatecallissuccessful,youwillseesomethinglikethefollowingJSONresponseontheshell:
{
"info":"Confignowsavedlocally.Shouldcomeonlineinabouta
minute.","ok":1
}
Inafewseconds,youshouldseeadifferentpromptfortheshellfromwhichweexecutedthiscommand.Itshouldnowbecomeaprimaryorsecondarynode.Thefollowingcommandisanexampleoftheshellconnectedtoaprimarymemberofthereplicaset:
repSetTest:PRIMARY>
Executingrs.status()shouldgiveussomestatsonthereplicasetstatus.ThestateStrfieldhereisimportant,anditcontainsthetextPRIMARY,SECONDARY,andsoon.
There’smore…Ifyouarelookingtoconvertastandaloneinstancetoareplicaset,theinstancewithdataneedstobecomeaprimaryinstancefirst,andthenemptysecondaryinstanceswillbeadded,towhichthedatawillbesynchronized.Formoreinformationonhowtoperformthisoperation,visithttp://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/.
SeealsoTheConnectingtothereplicasetfromtheshelltoqueryandinsertdatarecipetoperformmoreoperationsfromtheshellafterconnectingtoareplicasetChapter4,Administration,formoreadvancedrecipesonreplication
ConnectingtothereplicasetfromtheshelltoqueryandinsertdataInthepreviousrecipe,westartedareplicasetofthreemongodprocesses.Inthisrecipe,wewillbeworkingontopofitandwillconnecttoitfromtheclientapplication,performquerying,insertdata,andtakealookatsomeoftheinterestingaspectsofthereplicasetfromaclient’sperspective.
GettingreadyTheprerequisiteforthisrecipeisthatthereplicasetshouldbesetup,anditshouldbeupandrunning.Fordetailsonhowtostartthereplicaset,refertotheStartingmultipleinstancesaspartofareplicasetrecipe.
Howtodoit…Let’stakealookatthestepsindetail:
1. Createthe/data/n1,/data/n2,/data/n3,and/logsdirectoriesfordataandlogsofthethreenodes,respectively.
2. Wewillstarttwoshellshere:oneforprimaryandoneforsecondary.Executethefollowingcommandinthecommandprompt:
mongolocalhost:27000
3. Thepromptoftheshelltellswhethertheservertowhichweconnectedisprimaryorsecondary.Itshouldshowthereplicaset’snamefollowedby:andthenfollowedbytheserver’sstate.Inthiscase,ifthereplicasetisinitializedandisupandrunning,wewillseeeitherrepSetTest:PRIMARY>orrepSetTest:SECONDARY>.
4. Supposethefirstserverweconnectedtoisasecondaryserver,thenweneedtofindtheprimaryserverasfollows:
1. Executethers.status()commandintheshellandlookoutforthestateStrfield.Thisshouldgiveustheprimaryserver.UsetheMongoshelltoconnecttothisserver.Atthispoint,weshouldhavetwoshellsrunning:oneconnectedtoaprimarynodeandtheotherconnectedtoasecondarynode.
5. Intheshellconnectedtotheprimarynode,executethefollowinginsertcommand:
repSetTest:PRIMARY>db.replTest.insert({_id:1,value:'abc'})
Thereisnothingspecialaboutit.Wehavejustinsertedasmalldocumentinacollectionthatweuseforthereplicationtest.
6. Byexecutingthefollowingqueryontheprimarynode,weshouldgetoneresult:
repSetTest:PRIMARY>db.replTest.findOne()
{"_id":1,"value":"abc"}
7. Sofarsogood.Now,wewillgototheshellthatisconnectedtothesecondarynodeandexecutethefollowingcommand:
repSetTest:SECONDARY>db.replTest.findOne()
8. Ondoingthis,wewillseethefollowingerrorontheconsole:
{"$err":"notmasterandslaveOk=false","code":13435}
9. Now,executethefollowingcommandontheconsole:
repSetTest:SECONDARY>rs.slaveOk(true)
10. Executethequeryweexecutedinstep7againontheshell.Thiswillnowgetthefollowingresults:
repSetTest:SECONDARY>db.replTest.findOne()
{"_id":1,"value":"abc"}
11. Executethefollowinginsertcommandonthesecondarynode;itshouldnotsucceedwiththefollowingmessage:
repSetTest:SECONDARY>db.replTest.insert({_id:1,value:'abc'})not
master
Howitworks…Wehavedonealotofthingsinthisrecipe,andwewilltrytothrowsomelightonsomeoftheimportantconceptstoremember.
Webasicallyconnectedtoaprimaryandasecondarynodefromtheshellandperformed(Iwouldsay,triedtoperform)theselectandinsertoperations.ThearchitectureofaMongoreplicasetismadeupofoneprimary(justone;nomore,noless)andmultiplesecondarynodes.Allwriteshappenontheprimarynodeonly.Notethatreplicationisnotamechanismtodistributearead-requestloadthatenablesustoscalethesystem.Itsprimaryintentistoensurehighavailabilityofdata.Bydefaultwearenotpermittedtoreaddatafromthesecondarynodes.Instep6,wesimplyinserteddatafromtheprimarynodeandthenexecutedthequerytogetthedocumentthatweinserted.Thisisstraightforward,andthereisnothingrelatedtoclusteringhere.Justnotethatweinsertedthedocumentfromtheprimarynodeandthenquerieditback.
Inthenextstep,weexecutedthesamequerybut,thistime,fromthesecondarynode’sshell.Bydefault,queryingisnotenabledonthesecondarynode.Theremightbeasmalllaginreplicatingthedata,possiblyduetoheavydatavolumestobereplicated,networklatency,andhardwarecapacitytonameafewofthecauses;thus,queryingonthesecondarynodemightnotreflectthelatestinsertsorupdatesmadeontheprimarynode.If,however,weareOKwithitandcanlivewiththeslightlaginthedatabeingreplicated,allweneedtodoisenablequeryingonthesecondarynodeexplicitlybyjustexecutingonecommand,rs.slaveOk()orrs.slaveOk(true).Oncethisisdone,wearefreetoexecutequeriesonthesecondarynodestoo.
Finally,wetriedtoinsertdatainacollectionoftheslavenode.Undernocircumstancesthisispermitted,regardlessofwhetherwehaveexecutedrs.slaveOk().Whenrs.slaveOk()isinvoked,itjustpermitsthedatatobequeriedfromthesecondarynode.Allthewriteoperationsstillhavetogototheprimarynodeandthenflowdowntothesecondarynode.TheinternalsofreplicationwillbecoveredinadifferentrecipeintheUnderstandingandanalyzingoplogsrecipeinChapter4,Administration.
SeealsoTheConnectingtothereplicasettoqueryandinsertdatafromaJavaclientrecipeistogetdetailsonhowtoconnecttoreplicasetfromaJavaclient
ConnectingtothereplicasettoqueryandinsertdatafromaJavaclientInthisrecipe,wewilldemonstratehowtoconnecttoareplicasetusingaJavaclientandexecutequeriesandinsertdatausingtheJavaclientforMongoDB.Wewillalsoseehowtheclientwouldautomaticallyfailovertoanothermemberinthereplicasetshouldaprimarymembergoesdown.
GettingreadyWefirstneedtotakealookattheConnectingtoasinglenodefromaJavaclientrecipe,asitcontainsalltheprerequisitesandstepstosetupMavenandotherdependencies.AswearedealingwithaJavaclientforreplicasets,areplicasetmustbeupandrunning.RefertotheStartingmultipleinstancesaspartofareplicasetrecipefordetailsonhowtostartthereplicaset.
Howtodoit…Let’stakealookatthestepsindetail:
1. First,weneedtowrite/copythefollowingpieceofcode(thisJavaclassisalsoavailablefordownloadfromthebook’ssite):
packagecom.packtpub.mongo.cookbook;
importcom.mongodb.BasicDBObject;
importcom.mongodb.DB;
importcom.mongodb.DBCollection;
importcom.mongodb.DBObject;
importcom.mongodb.MongoClient;
importcom.mongodb.ServerAddress;
importjava.util.Arrays;
/**
*
*/
publicclassReplicaSetMongoClient{
/**
*Mainmethodforthetestclientconnectingtothereplicaset.
*@paramargs
*/
publicstaticfinalvoidmain(String[]args)throwsException{
MongoClientclient=newMongoClient(
Arrays.asList(
newServerAddress("localhost",27000),
newServerAddress("localhost",27001),
newServerAddress("localhost",27002)
)
);
DBtestDB=client.getDB("test");
System.out.println("DroppingreplTestcollection");
DBCollectioncollection=testDB.getCollection("replTest");
collection.drop();
DBObjectobject=newBasicDBObject("_id",1).append("value",
"abc");
System.out.println("Addingatestdocumenttoreplicaset");
collection.insert(object);
System.out.println("Retrievingdocumentfromthecollection,
thisonecomesfromprimarynode");
DBObjectdoc=collection.findOne();
showDocumentDetails(doc);
System.out.println("NowRetrievingdocumentsinaloopfromthe
collection.");
System.out.println("Stoptheprimaryinstancemanuallyafter
fewiterations");
for(inti=0;i<20;i++){
try{
doc=collection.findOne();
showDocumentDetails(doc);
}catch(Exceptione){
//Ignoringorlogamessage
}
Thread.sleep(5000);
}
}
/**
*
*@paramobj
*/
privatestaticvoidshowDocumentDetails(DBObjectobj){
System.out.printf("_id:%d,valueis%s\n",obj.get("_id"),
obj.get("value"));
}
}
2. Connecttoanyofthenodesinthereplicaset,saytolocalhost:27000,and,fromtheshell,executers.status().Takeanoteoftheprimaryinstanceinthereplicasetandconnecttoitfromtheshelliflocalhost:27000isnotaprimarynode.Now,switchtotheadmindatabaseasfollows:
repSetTest:PRIMARY>useadmin
3. Now,executetheprecedingprogramfromtheoperatingsystemshellasfollows:
$mvncompileexec:java-
Dexec.mainClass=com.packtpub.mongo.cookbook.ReplicaSetMongoClient
4. ShutdowntheprimaryinstancebyexecutingthefollowingcommandontheMongoshellconnectedtotheprimarynode:
repSetTest:PRIMARY>db.shutdownServer()
5. Watchtheoutputontheconsolewherethecom.packtpub.mongo.cookbook.ReplicaSetMongoClientclassisexecutedusingMaven.
Howitworks…AninterestingthingtoobserveishowweinstantiateaMongoClientinstance.Itisdoneasfollows:
MongoClientclient=newMongoClient(Arrays.asList(new
ServerAddress("localhost",27000),newServerAddress("localhost",27001),new
ServerAddress("localhost",27002)));
Theconstructortakesalistofcom.mongodb.ServerAddress.Thisclasshasalotofoverloadedconstructors,butwechosetousetheonethattakesthehostnameandportnumber.Here.weprovidedalltheserverdetailsinareplicasetasalist.Wehaven’tmentionedwhattheprimarynodeisandwhatthesecondarynodesare.TheMongoClientclassisintelligentenoughtofigurethisoutandconnecttotheappropriateinstance.Thelistofserversprovidediscalledtheseedlist.Itneednotcontainanentiresetofserversinareplicaset,thoughtheobjectiveistoprovideasmuchaswecan.TheMongoClientclasswillfigureoutalltheserverdetailsfromtheprovidedsubset.Forexample,ifthereplicasetisoffivenodesbutweprovideonlythreeservers,itstillworksfine.Onconnectingwiththeprovidedreplicasetservers,theclientwillquerythemtogetthereplicasetmetadataandfigureouttherestoftheprovidedserversinthereplicaset.Intheprecedingcase,weinstantiatedtheclientwiththreeinstancesinthereplicaset.Ifthereplicasethasfivemembers,instantiatingtheclientwithjustthreeofthemaswedidearlierisstillgoodenough,andtheremainingtwoinstanceswillbeautomaticallydiscovered.
Next,wewillstarttheclientfromthecommandpromptusingMaven.Oncetheclientisrunninginthelooptofindonedocument,wewillbringdowntheprimaryinstance.Wewillseesomethinglikethefollowingoutputontheconsole:
_id:1,valueisabc
Now,retrievingdocumentsinaloopfromthecollection.
Stoptheprimaryinstancemanuallyafterafewiterations:
_id:1,valueisabc
_id:1,valueisabc
Nov03,20135:21:57PMcom.mongodb.ConnectionStatus$UpdatableNodeupdate
WARNING:Serverseendown:Amol-PC/192.168.1.171:27002
java.net.SocketException:Softwarecausedconnectionabort:recvfailed
atjava.net.SocketInputStream.socketRead0(NativeMethod)
atjava.net.SocketInputStream.read(SocketInputStream.java:150)
…
WARNING:PrimaryswitchingfromAmol-PC/192.168.1.171:27002toAmol-
PC/192.168.1.171:27001
_id:1,valueisabc
Aswecansee,thequeryintheloopwasinterruptedwhentheprimarynodewentdown.Theclient,however,switchedtothenewprimarynodeseamlessly,well,nearlyseamlessly,astheclientmighthavetocatchanexceptionandretrytheoperationafterapredeterminedintervalhaselapsed.
StartingasimpleshardedenvironmentoftwoshardsInthisrecipe,wewillsetupasimpleshardedsetupmadeupoftwodatashards.Therewillbenoreplicationtokeepitsimple,asthisisthemostbasicshardsetuptodemonstratetheconcept.Wewon’tbegettingdeepintotheinternalsofsharding,whichwewillexplorefurtherinChapter4,Administration.
Hereisabitoftheorybeforeweproceed.Scalabilityandavailabilityaretwoimportantcornerstonesforbuildinganymission-criticalapplication.Availabilityissomethingthatwastakencareofbyreplicasets,whichwediscussedinthepreviousrecipesofthischapter.Let’slookatscalabilitynow.Simplyput,scalabilityistheeasewithwhichthesystemcancopewithanincreasingdataandrequestload.Considerane-commerceplatform.Onregulardays,thenumberofhitstothesiteandloadisfairlymodest,andthesystemresponsetimesanderrorratesareminimal(thisissubjective).
Now,considerthedayswherethesystemloadbecomestwiceorthreetimesanaverageday’sload(orevenmore),forexample,sayonThanksgivingDay,Christmas,andsoon.Iftheplatformisabletodeliversimilarlevelsofserviceonthesehigh-loaddayscomparedwithanyotherday,thesystemissaidtohavescaledupwelltothesuddenincreaseinthenumberofrequests.
Now,consideranarchivingapplicationthatneedstostorethedetailsofalltherequeststhathitaparticularwebsiteoverthepastdecade.Foreachrequestthathitsthewebsite,wewillcreateanewrecordintheunderlyingdatastore.Supposeeachrecordisof250byteswithanaverageloadof3millionrequestsperday,thenwewillcrossthe1TBdatamarkinabout5years.Thisdatawillbeusedforvariousanalyticpurposesandmightbefrequentlyqueried.Thequeryperformanceshouldnotbedrasticallyaffectedwhenthedatasizeincreases.Ifthesystemisabletocopewiththisincreasingdatavolumeandstillgivesadecentperformancecomparabletothatonlowdatavolumes,thesystemissaidtohavescaledupwellagainsttheincreasingdatavolumes.
Nowthatwehaveseeninbriefwhatscalabilityis,letmetellyouthatshardingisamechanismthatletsasystemscaletoincreasingdemands.Thecruxliesinthefactthattheentiredataispartitionedintosmallersegmentsanddistributedacrossvariousnodescalledshards.Let’sassumethatwehaveatotalof10milliondocumentsinaMongocollection.Ifweshardthiscollectionacross10shards,wewillideallyhave10,000,000/10=1,000,000documentsoneachshard.Atagivenpointoftime,onedocumentwillonlyresideononeshard(which,byitself,willbeareplicasetinaproductionsystem).Thereis,however,somemagicinvolvedthatkeepsthisconcepthiddenfromthedeveloperqueryingthecollection,whogetsoneunifiedviewofthecollectionirrespectiveofthenumberofshards.Basedonthequery,itisMongothatdecideswhichshardtoqueryforthedataandreturntheentireresultset.Withthisbackground,let’ssetupasimpleshardandtakeacloserlookatit.
GettingreadyApartfromtheMongoDBserveralreadyinstalled,therearenoprerequisitesfromasoftwareperspective.Wewillcreatetwodatadirectories,oneforeachshard.Therewillbeonedirectoryfordataandoneforlogs.
Howtodoit…Let’stakealookatthestepsindetail:
1. Wewillstartbycreatingdirectoriesforlogsanddata.Createthe/data/s1/db,/data/s2/db,and/logsdirectories.OnWindows,wecanhavec:\data\s1\db,andsoonforthedataandlogdirectories.Thereisalsoaconfigserverthatisusedinashardedenvironmenttostoresomemetadata.Wewilluse/data/con1/dbasthedatadirectoryfortheconfigserver.
2. Startthefollowingmongodprocesses,oneforeachofthetwoshardsandonefortheconfigdatabase,andonemongosprocess(wewillseewhatthisprocessdoes).FortheWindowsplatform,skipthe--forkparameterasitisnotsupported:
$mongod--shardsvr--dbpath/data/s1/db--port27000--logpath
/logs/s1.log--smallfiles--oplogSize128--fork
$mongod--shardsvr--dbpath/data/s2/db--port27001--logpath
/logs/s2.log--smallfiles--oplogSize128--fork
$mongod--configsvr--dbpath/data/con1/db--port25000--logpath
/logs/config.log--fork
$mongos--configdblocalhost:25000--logpath/logs/mongos.log--fork
3. Inthecommandprompt,executethefollowingcommand.Thiswillshowamongosprompt:
$mongo
MongoDBshellversion:2.4.6
connectingto:test
mongos>
4. Finally,wesetuptheshard.Fromthemongosshell,executethefollowingtwocommands:
mongos>sh.addShard("localhost:27000")
mongos>sh.addShard("localhost:27001")
5. Ontheadditionofeachshard,wewillgetanokreply.SomethinglikethefollowingJSONmessagewillbeseengivingtheuniqueIDforeachshardthatisadded:
{"shardAdded":"shard0000","ok":1}
NoteWehaveusedlocalhosteverywheretorefertothelocallyrunningservers.Itisnotarecommendedapproachandisdiscouraged.Abetterapproachwillbetousehostnameseveniftheyarelocalprocesses.
HowitworksLet’sseewhatwedidintheprocess.Wecreatedthreedirectoriesfordata(twofortheshardsandonefortheconfigdatabase)andonedirectoryforlogs.Wecanhaveashellscriptorabatchfiletocreatethedirectoriesaswell.Infact,inlargeproductiondeployments,settingupshardsmanuallyisnotonlytime-consumingbutalsoerror-prone.
Let’strytogetapictureofwhatexactlywehavedoneandwhatwearetryingtoachieve.
Thefollowingdiagramshowstheshardsetupwejustbuilt:
Ifwelookattheprecedingdiagramandtheserversstartedinstep2,wewillseethatwehaveshardserversthatwillstoretheactualdatainthecollections.Thesewerethefirsttwoofthefourprocessesthatstartedlisteningtoport27000and27001.Next,westartedaconfigserver,whichisseenontheleft-handsideintheprecedingdiagram.Itisthethirdserverofthefourserversstartedinstep2,anditlistenstoport25000forincomingconnections.Thesolepurposeofthisdatabaseistomaintainthemetadataoftheshardservers.Ideally,onlythemongosprocessordriversconnecttothisserverforthesharddetails/metadataandtheshardkeyinformation.Wewillseewhatashardkeyisinthenextrecipe,wherewewillplayaroundwithashardedcollectionandseetheshardswecreatedinaction.
Finally,wehaveamongosprocess.Thisisalightweightprocessthatdoesn’tdoanypersistenceofdataandjustacceptsconnectionsfromclients.Thisisthelayerthatactsasagatekeeperandabstractstheclientfromtheconceptofshards.Fornow,wecanviewitasarouterthatconsultstheconfigserverandtakesthedecisiontoroutetheclient’squerytotheappropriateshardserverforexecution.Itthenaggregatestheresultfromvariousshardsifapplicableandreturnstheresulttotheclient.Itissafetosaythatnoclientdirectlyconnectstotheconfigortheshardservers;infact,ideally,nooneshouldconnecttotheseprocessesdirectly,exceptforsomeadministrationoperations.Clientssimplyconnecttothemongosprocessandexecutetheirqueries,orinsertorupdateoperations.
Justbystartingtheshardservers,theconfigserverandmongosprocessdon’tcreateashardedenvironment.Onstartingupthemongosprocess,weprovideditwiththedetailsoftheconfigserver.Whataboutthetwoshardsthatwillbestoringtheactualdata?Thetwomongodprocessesthatstartedasshardserversare,however,notyetdeclaredanywhereas
shardserversintheconfiguration.Thatisexactlywhatwedointhefinalstepbyinvokingsh.addShard()forboththeshardservers.Themongosprocessisprovidedwiththeconfigserver’sdetailsonstartup.Addingshardsfromtheshellstoresthismetadataabouttheshardsintheconfigdatabase;then,themongosprocesseswillquerythisconfigdatabasefortheshard’sinformation.Onexecutingallthestepsofthisrecipe,wewillhaveanoperationalshard.Beforeweconclude,theshardwesetuphereisfarfromidealandnothowitwillbedoneintheproductionenvironment.Thefollowingdiagramgivesusanideaofhowatypicalshardwillbeinaproductionenvironment:
Thenumberofshardswillnotbetwobutmuchmore.Also,eachshardwillbeareplicasettoensurehighavailability.Therewillbethreeconfigserverstoensuretheavailabilityoftheconfigserverstoo.Similarly,therewillbeanynumberofmongosprocessescreatedforashardthatlistensforclientconnections.Insomecases,itmightevenbestartedonaclientapplication’sserver.
There’smore…Whatgoodisashardunlessweputittoactionandseewhathappensfromtheshelloninsertingandqueryingthedata?Inthenextrecipe,wewillmakeuseoftheshardsetup,addsomedata,andseeitinaction.
ConnectingtoashardfromtheMongoshellandperformingoperationsInthisrecipe,wewillbeconnectingtoashardfromacommandprompt;wewillalsoseehowtoshardacollectionandobservethedatasplittinginactiononsometestdata.
GettingreadyObviously,weneedashardedmongoserversetupthatisupandrunning.Seethepreviousrecipeformoredetailsonhowtosetupasimpleshard.Themongosprocess,asinthepreviousrecipe,shouldbelisteningtoportnumber27017.WehavegotsomenamesinaJavaScriptfilecallednames.js.Thisfileneedstobedownloadedfromthisbook’ssiteandkeptonthelocalfilesystem.Thefilecontainsavariablecallednames,andthevalueisanarraywithsomeJSONdocumentsasthevalues,eachonerepresentingaperson.Thecontentslookasfollows:
names=[
{name:'JamesSmith',age:30},
{name:'RobertJohnson',age:22},
…
]
Howtodoit…Let’stakealookatthestepsindetail:
1. StarttheMongoshellandconnecttothedefaultportonthelocalhostasfollows(thiswillensurethatthenameswillbeavailableinthecurrentshell):
mongo--shellnames.js
MongoDBshellversion:2.4.6
connectingto:test
mongos>
2. Switchtothedatabasethatwillbeusedtotestshardingasfollows(wecallitshardDB):
mongos>useshardDB
3. Enableshardingatthedatabaselevelasfollows:
mongos>sh.enableSharding("shardDB")
4. Shardacollectioncalledpersonasfollows:
mongos>sh.shardCollection("shardDB.person",{name:"hashed"},false)
5. Addtestdatatotheshardedcollectionasfollows:
mongos>for(i=1;i<=300000;i++){
...person=names[Math.round(Math.random()*100)%20]
...doc={_id:i,name:person.name,age:person.age}
...db.person.insert(doc)
}
6. Executethefollowingcommandtogetaqueryplanandthenumberofdocumentsoneachshard:
mongos>db.person.find().explain()
Howitworks…Thisrecipedemandssomeexplanation.WehavedownloadedaJavaScriptfilethatdefinesanarrayof20people.EachelementofthearrayisaJSONobjectwithanameandageattribute.WestartedtheshellthatconnectstothemongosprocessloadedwiththisJavaScript.WethenswitchedtoshardDB,whichwewilluseforthepurposeofsharding.
Foracollectiontobesharded,thedatabaseinwhichitwillbecreatedneedstobeenabledforshardingfirst.Wedothisusingsh.enableSharding().
Thenextstepistoenablethecollectiontobesharded.Bydefault,allthedatawillbekeptononeshardandwillnotbesplitacrossdifferentshards.ThinkabouthowMongowillbeabletomeaningfullysplitthedata.Thewholeintentionistosplititmeaningfullyandasevenlyaspossiblesothatwheneverwequerybasedonashardkey,Mongowilleasilybeabletodeterminewhichshard(s)toquery.Ifaquerydoesn’tcontainashardkey,theexecutionofthequerywillhappenonalltheshards,andthedatawillthenbecollatedbythemongosprocessbeforereturningittotheclient.Thus,choosingtherightshardkeyisverycrucial.
Let’snowseehowtoshardthecollection.Wewilldothisbyinvokingsh.shardCollection("shardDB.person",{name:"hashed"},false).Therearethreeparametershere.
Thefirstparameterspecifiesafullyqualifiednameofthecollectioninthe<dbname>.<collectionname>format.ThisisthefirstparameteroftheshardCollectionmethod.Thesecondparameterspecifiesthefieldnametosharduponinthecollection.Thisisthefieldthatwillbeusedtosplitthedocumentsontheshards.Oneoftherequirementsofagoodshardkeyisthatitshouldhavehighcardinality(thenumberofpossiblevaluesshouldbehigh).Inourtestdata,thenamevaluehasaverylowcardinalityandthus,isnotagoodchoiceasashardkey.Wethushashthiskeywhenusingitasashardkey.Wedosobymentioningthekeyas{name:"hashed"}.Thelastparameterspecifieswhetherthevalueusedasashardkeyisuniqueornot.Thenamefieldisdefinitelynotunique;thus,itwillbefalse.Ifthefieldwas,say,theperson’ssocialsecuritynumber,itcouldhavebeensetastrue.Also,SSNisagoodchoiceforashardkeyduetoitshighcardinality.Rememberthough,forthequerytobeefficient,theshardkeyhastobepresentinit.
Thelaststepistoseetheexecutionplantofindallthedata.Theintentofthisoperationistoseehowthedataisbeingsplitacrosstwoshards.With3,00,000documents,weexpectsomethingaround1,50,000documentsoneachshard.Fromtheexplainplan’soutput,theshardattributehasanarraywithadocumentvalueforeachshardinthecluster.Inourcase.wehavetwo;thus.wehavetwoshardsthatgivethequeryplanforeachshard.Ineachofthem,thevalueofnissomethingtolookat.Itshouldgiveusthenumberofdocumentsthatresideoneachshard.ThefollowingcodesnippetistherelevantJSONdocumentweseefromtheconsole.Thenumberofdocumentsonshardsoneandtwois164938and135062,respectively:
"shards":{
"localhost:27000":[
{
"cursor":"BasicCursor",
"isMultiKey":false,
"n":164938,
"nscannedObjects":164938,
"nscanned":164938,
"nscannedObjectsAllPlans":164938,
"nscannedAllPlans":164938,
"scanAndOrder":false,
"indexOnly":false,
"nYields":1,
"nChunkSkips":0,
"millis":974,
"indexBounds":{
},
"server":"Amol-PC:27000"
}
],
"localhost:27001":[
{
"cursor":"BasicCursor",
"isMultiKey":false,
"n":135062,
"nscannedObjects":135062,
"nscanned":135062,
"nscannedObjectsAllPlans":135062,
"nscannedAllPlans":135062,
"scanAndOrder":false,
"indexOnly":false,
"nYields":0,
"nChunkSkips":0,
"millis":863,
"indexBounds":{
},
"server":"Amol-PC:27001"
}
]
}
ThereareacoupleofadditionalthingsthatIrecommendyoualltodo.
ConnecttotheindividualshardfromtheMongoshellandexecutequeriesonthepersoncollection.Seethatthecountsinthesecollectionsaresimilartowhatweseeintheprecedingplan.Also,onecanfindoutthatnodocumentexistsonboththeshardsatthesametime.
Wediscussedinbriefhowcardinalityaffectsthewaythedataissplitacrossshards.Let’sdoasimpleexercise.WewillfirstdropthepersoncollectionandexecutetheshardCollectionoperationagainbut,thistime,withthe{name:1}shardkeyinsteadof{name:"hashed"}.Thisensuresthattheshardkeyisnothashedandstoredasis.Now,loadthedatausingtheJavaScriptfunctionweusedearlierinstep5andthenexecute
There’smore…Alotofquestionsmightnowcomeup,suchaswhatarethebestpractices,whataresometipsandtricks,howistheshardingthingpulledoffbyMongoDBbehindthescenesinawaytransparenttotheenduser,andsoon.
Thisrecipeonlyexplainedthebasics.AllthesequestionswillbeansweredinChapter4,Administration.
Chapter2.Command-lineOperationsandIndexesInthischapter,wewillcoverthefollowingrecipes:
CreatingtestdataPerformingsimplequerying,projections,andpaginationfromtheMongoshellUpdatinganddeletingdatafromtheshellCreatinganindexandviewingplansofqueriesBackgroundandforegroundindexcreationfromtheshellCreatinguniqueindexesoncollectionanddeletingtheexistingduplicatedataautomaticallyCreatingandunderstandingsparseindexesExpiringdocumentsafterafixedintervalusingtheTTLindexExpiringdocumentsatagiventimeusingtheTTLindex
CreatingtestdataThisrecipeisaboutcreatingtestdataforsomeoftherecipesinthischapterandalsoforthelaterchaptersinthisbook.WewilldemonstratehowtoloadaCSVfileintoaMongodatabaseusingtheimportutility.Thisisabasicrecipe;ifreadersareawareofthedata-importprocess,theymightjustdownloadtheCSVfile(pincodes.csv)fromthebook’ssite,loaditinthecollectionbythemselves,andskiptherestoftherecipe.Wewillusethedefaultdatabasetest,andthecollectionwillbenamedpostalCodes.
GettingreadyThedatausedhereisforpostalcodesinIndia.Downloadthepincodes.csvfilefromthebook’swebsite.ThefileisaCSVfilewith39,732records;itshouldcreate39,732documentsuponsuccessfulimport.WeneedtohavetheMongoserverupandrunning.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,forinstructionsonhowtostarttheserver.Theservershouldbeginlisteningforconnectionsonthedefaultport27017.
Howtodoit…1. Executethefollowingcommandfromtheshellwiththefiletobeimportedinthe
currentdirectory:
$mongoimport--typecsv-dtest-cpostalCodes--headerline--drop
pincodes.csv
2. StarttheMongoshellbytypinginmongointhecommandprompt.3. Intheshell,executethefollowingcommand:
>db.postalCodes.count()
Howitworks…Assumingthattheserverisupandrunning,theCSVfileisdownloadedandkeptinalocaldirectorywherewecanexecutetheimportutilitywiththefileinthecurrentdirectory.Let’slookattheoptionsgiventotheMongoimportutilityandtheirmeanings:
Command-lineoption Decription
--type ThisspecifiesthatthetypeofinputfileisCSV.ItdefaultstoJSON,theotherpossiblevaluebeingTSV.
-d Thisisthetargetdatabaseintowhichthedatawillbeloaded.
-c Thisisthecollectionintheprecedingdatabaseintowhichthedatawillbeloaded.
--
headerline
ThisisrelevantonlyinthecaseofTSVorCSVfiles.Itindicatesthatthefirstlineofthefileistheheader.Thesamenamewillbeusedasthenameofthefieldinthedocument.
--drop Thisindicatesthatweneedtodropthecollectionbeforethedatagetsloadedinit.
Afteralltheoptionsaregiven.thefinalvalueinthecommandpromptisthenameofthefile,pincodes.csv.
Iftheimportgoesthroughsuccessfully,youwillseesomethinglikethefollowingoutputontheconsole:
connectedto:127.0.0.1
MonDec923:29:13.004Progress:1593394/228608069%
MonDec923:29:13.014280009333/second
MonDec923:29:14.116check939733
MonDec923:29:14.116imported39732objects
Finally,wewillstarttheMongoshellandfindthecountofthedocumentsinthecollection.Itshouldindeedbe39,732,asseenintheprecedingimportlog.
NoteThepostalcodedataistakenfromhttps://github.com/kishorek/India-Codes/.Thisdataisnottakenfromanofficialsourceandmightnotbeaccurateasitisbeingcompiledmanuallyforfreepublicuse.ThankstoKishoreforcompilingthedataandsharingit.
SeealsoThePerformingsimplequerying,projections,andpaginationfromtheMongoshellrecipetoknowsomebasicqueriesonthedataimported
Performingsimplequerying,projections,andpaginationfromtheMongoshellInthisrecipe,wewillgetourhandsdirtywithabitofqueryingtoselectdocumentsfromthetestdatawesetupinthepreviousrecipe.Thereisnothingextravagantinthisrecipe,andsomeonewellversedwithquerylanguagebasicscanskipthisrecipecomfortably.Otherswhoaren’ttoocomfortablewithbasicqueryingorthosewhowanttogetasmallrefreshercancontinuetoreadthenextsectionoftherecipe.Additionally,thisrecipeisintendedtogiveafeelofthetestdatasetupfromthepreviousrecipe.
GettingreadyToexecutesimplequeries,weneedtohaveaserverupandrunning.Asimplesinglenodeiswhatwewillneed.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Thedataonwhichwewillbeoperatingneedstobeimportedintothedatabase.Thestepstoimportthedataaregiveninthepreviousrecipe.YoualsoneedtostarttheMongoshellandconnecttotheserverthatrunsonthelocalhost.Oncewehavetheseprerequisites,wearegoodtogo.
Howtodoit…1. Let’sfirstfindacountofdocumentsinthecollection:
>db.postalCodes.count()
2. Let’sfindjustonedocumentfromthepostalCodescollection:
>db.postalCodes.findOne()
3. Now,weneedtofindmultipledocumentsinthecollection:
>db.postalCodes.find().pretty()
4. Theprecedingqueryretrievedallthekeysofthefirst20documentsanddisplayedthemontheshell.Let’sdoacoupleofthingsnow;wewilljustdisplaythecity,state,andpincodefields.Additionally,wewanttodisplaythedocumentsnumbered91to100inthecollection.Todothis,executethefollowingcommand:
>db.postalCodes.find({},{_id:0,city:1,state:1,
pincode:1}).skip(90).limit(10)
5. Let’smoveastepaheadandwriteaslightlycomplexquerywherewewillfindthetop10citiesinthestateofGujaratsortedbythenameofthecity.Likethelastquery,wewilljustselectthecity,state,andpincodefields:
>db.postalCodes.find({state:'Gujarat'},{_id:0,city:1,state:1,
pincode:1}).sort({city:1}).limit(10)
Howitworks…Therecipeisprettysimpleandallowsustogetafeelofthetestdatawesetupinthepreviousrecipe.Nevertheless,aswiththeotherrecipes,Ioweyouallsomeexplanationaboutwhatwedidhere.
Wefirstfoundthecountofthedocumentsinthecollectionusingdb.postalCodes.count(),anditshouldgiveus39,732documents.ThisshouldbeinsyncwiththelogswesawwhileimportingthedataintothepostalCodescollection.Next,wequeriedforonedocumentfromthecollectionusingfindOne.Thismethodreturnedthefirstdocumentintheresultsetofthequery.Intheabsenceofaqueryorsortorder,asinthiscase,itwillbethefirstdocumentinthecollectionsortedbyitsnaturalorder.
Next,weusedfindratherthanfindOne.Thedifferenceisthatthefindoperationreturnsaniteratorfortheresultsetthatwecanusetotraversethroughtheresultsofthefindoperation,whereasfindOnereturnsadocument.Addingaprettymethodcalltothefindoperationwillprinttheresultinaprettyorformattedway.
NoteNotethattheprettymethodmakessenseandworksonlywiththefindmethodandnotwithfindOne.ThisisbecausethereturnvalueoffindOneisadocument,andthereisnoprettyoperationonthereturneddocument.
WewillnowexecutethefollowingqueryontheMongoshell:
>db.postalCodes.find({},{_id:0,city:1,state:1,
pincode:1}).skip(90).limit(10)
Here,wewillpasstwoparameterstothefindmethod:
Thefirstoneis{},whichisthequerytoselectthedocuments;inthiscase,wewillaskMongotoselectallthedocuments.Thesecondparameteristhesetoffieldsthatwewantintheresultdocuments.Rememberthatthe_idfieldispresentbydefault,unlessweexplicitlysay_id:0.Foralltheotherfields,weneedtosay<field_name>:1or<field_name>:true.Thefindmethodwithprojectionsisthesameassayingselectfield1andfield2fromthetableintherelationalworld,andnotspecifyingthefieldstobeselectedinthefindmethodislikesayingselect*fromthetableintherelationalworld.
Movingon,wejustneedtolookatwhatskipandlimitdo.Theskipfunctionskipsthegivennumberofdocumentsfromtheresultset,allthewayuptotheenddocumentintheresultset.Thelimitfunctionthenlimitstheresulttothegivennumberofdocuments.
Let’sseewhatallthismeanswithanexample.Byexecuting.skip(90).limit(10),wesaythatwewanttoskipthefirst90documentsfromtheresultsetandstartreturningfromtheninety-firstdocument.Thelimit,however,saysthatwewillreturnonly10documentsfromtheninety-firstdocument.
Now,therearesomeborderconditionsthatweneedtoknowhere.Whatifskipisbeingprovidedwithavaluemorethanthetotalnumberofdocumentsinthecollection?Well,in
thiscase,nodocumentswillbereturned.Also,ifthenumberprovidedtothelimitfunctionismorethantheactualnumberofdocumentsthatremaininthecollection,thenumberofdocumentsreturnedwillbethesameastheremainingdocumentsinthecollection,andnoexceptionwillbethrownineithercase.
UpdatinganddeletingdatafromtheshellThis,again,willbeasimplerecipethatwilllookatexecutingdeletesandupdatesonatestcollection.Wewon’tdealwiththesametestdataweimported,aswedon’twanttoupdate/deleteanyofthis;instead,wewillworkonatestcollectioncreatedonlyforthisrecipe.
GettingreadyForthisrecipe,wewillcreateacollectioncalledupdAndDelTest.Wewillneedtheservertobeupandrunning.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Also,starttheshellwiththeUpdAndDelTest.jsscriptloaded.Thisscriptwillbeavailableonthebook’swebsitefordownload.Toknowhowtostarttheshellwithareloadedscript,refertotheConnectingtoasinglenodefromtheMongoshellwithapreloadedJavaScriptrecipeinChapter1,InstallingandStartingtheMongoDBServer.
Howtodoit…1. Withtheshellstartedandthescriptloaded,executethefollowingcommandinthe
shell:
>prepareTestData()
Ifallgoeswell,youshouldsee20documentsinsertedinupdAndDelTestandprintedontheconsole.
2. Togetafeelofthecollection,let’squeryitasfollows:
>db.updAndDelTest.find({},{_id:0})
3. Wewillseethatforeachvalueofxas1and2,wehaveyincrementingfrom1to10.4. Wewillfirstupdatesomedocumentsandobservetheresults.Executethefollowing
updatecommand:
>db.updAndDelTest.update({x:1},{$set:{y:0}})
5. Executethefollowingfindcommandandobservetheresults;wewillget10documents(foreachofthem,notethevalueofy):
>db.updAndDelTest.find({x:1},{_id:0})
6. Wewillnowexecutethefollowingupdatecommand:
>db.updAndDelTest.update({x:1},{$set:{y:0}},false,true)
7. Executingthequerygiveninstep5againtoviewtheupdateddocumentswillshowthesamedocumentswesawearlier.Takeanoteofthevaluesofyagainandcomparethemtotheresultswesawwhenweexecutedthisquerythelasttimearoundbeforeexecutingtheupdatecommandgiveninstep6.
8. Wewillnowseehowthedeleteoperationworks.Wewillagainchoosethedocumentswherexis1forthedeletiontest.Let’sdeleteallthedocumentswherexis1fromthecollection:
>db.updAndDelTest.remove({x:1})
9. Executethefollowingfindcommandandobservetheresults.Wewillnotgetanyresults.Itseemsthattheremoveoperationhasremovedallthedocumentswithxas1:
>db.updAndDelTest.find({x:1},{_id:0})
Howitworks…First,wesetupthedatathatwewouldbeusingforupdatesanddeletion.Wehavealreadyseenthedataandknowwhatitis.Aninterestingthingtoobserveisthat,whenweexecuteanupdatesuchasdb.updAndDelTest.update({x:1},{$set:{y:0}}),itonlyupdatesthefirstdocumentthatmatchesthequeryprovidedasthefirstparameter.Thisissomethingwewillobservewhenwequerythecollectionafterthisupdate.Theupdatefunctionhasthedb.<collectionname>.update(query,updateobject,isUpsert,isMulti)format.
WewillseewhatUpsertisinAtomicfindandmodifyoperationsrecipeinChapter5,AdvancedOperations.Thefourthparameter(isMulti)isbydefaultfalse,andthismeansthatmultipledocumentswillnotbeupdatedbytheupdatecall.So,onlythefirstmatchingdocumentwillbeupdatedbydefault.However,whenweexecutedb.updAndDelTest.update({x:1},{$set:{y:0}},false,true)withthefourthparametersettotrue,allthedocumentsinthecollectionthatmatchthegivenquerygetupdated.Thisissomethingwecanverifyafterqueryingthecollection.
Removals,onotherhand,behavedifferently.Bydefault,theremoveoperationdeletesallthedocumentsthatmatchtheprovidedquery.However,ifwewishtodeleteonlyonedocument,wewillexplicitlypassthesecondparameterastrue.
NoteThedefaultbehaviorofupdateandremoveisdifferent.Anupdatecallbydefaultupdatesonlythefirstmatchingdocument,whereasremovedeletesallthedocumentsthatmatchthequery.
CreatinganindexandviewingplansofqueriesInthisrecipe,wewilllookatqueryingdata,analyzingitsperformancebyexplainingthequeryplan,andthenoptimizingitbycreatingindexes.
GettingreadyForthecreationofindexes,weneedtohaveaserverupandrunning.Asimplesinglenodeiswhatwewillneed.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Thedatawithwhichwewillbeoperatingneedstobeimportedintothedatabase.ThestepstoimportthedataaregivenintheCreatingtestdatarecipe.Oncewehavethisprerequisite,wearegoodtogo.
Howtodoit…Wewilltryingtowriteaquerythatwillfindallthezipcodesinagivenstate.Todothis,performthefollowingsteps:
1. Executethefollowingquerytoviewtheplanofaquery:
>db.postalCodes.find({state:'Maharashtra'}).explain()
Takeanoteofthecursor,n,nscannedObjects,andmillisfieldsintheresultoftheexplainplanoperation
2. Let’sexecutethesamequeryagain;thistime,however,wewilllimittheresultstoonly100results:
>db.postalCodes.find({state:'Maharashtra'}).limit(100).explain()
Again,takeanoteofthecursor,n,nscannedObjects,andmillisfieldsintheresult
3. Wewillnowcreateanindexonthestateandpincodefieldsasfollows:
>db.postalCodes.ensureIndex({state:1,pincode:1})
4. Executethefollowingquery:
>db.postalCodes.find({state:'Maharashtra'}).explain()
Again,takeanoteofthecursor,n,nscannedObjects,millis,andindexOnlyfieldsintheresult
5. Sincewewantonlythepincodes,wewillmodifythequeryasfollowsandviewitsplan:
>db.postalCodes.find({state:'Maharashtra'},{pincode:1,
_id:0}).explain()
Takeanoteofthecursor,n,nscannedObjects,nscanned,millis,andindexOnlyfieldsintheresult.
Howitworks…Thereisalottoexplainhere.Wewillfirstdiscusswhatwejustdidandhowtoanalyzethestats.Next,wewilldiscusssomepointstobekeptinmindforindexcreationandsomegotchas.
AnalyzingtheplanLet’slookatthefirststepandanalyzetheoutputweexecuted:
>db.postalCodes.find({state:'Maharashtra'}).explain()
Theoutputonmymachineisasfollows(Iamskippingthenonrelevantfieldsfornow):
{
"cursor":"BasicCursor",
…
"n":6446,
"nscannedObjects":39732,
"nscanned":39732,
…
"millis":55,
…
}
ThevalueofthecursorfieldintheresultisBasicCursor,whichmeansafullcollectionscan(allthedocumentsarescannedoneafteranother)hashappenedtosearchthematchingdocumentsintheentirecollection.Thevalueofnis6446,whichisthenumberofresultsthatmatchedthequery.Thenscannedandnscannedobjectsfieldshavevaluesof39,732,whichisthenumberofdocumentsinthecollectionthatarescannedtoretrievetheresults.Thisisthealsothetotalnumberofdocumentspresentinthecollection,andallwerescannedfortheresult.Finally,millisisthenumberofmillisecondstakentoretrievetheresult.
ImprovingthequeryexecutiontimeSofar,thequerydoesn’tlooktoogoodintermsofperformance,andthereisgreatscopeforimprovement.Todemonstratehowthelimitappliedtothequeryaffectsthequeryplan,wecanfindthequeryplanagainwithouttheindexbutwiththelimitclause:
>db.postalCodes.find({state:'Maharashtra'}).limit(100).explain()
{
"cursor":"BasicCursor",…
"n":100,
"nscannedObjects":19951,
"nscanned":19951,
…
"millis":30,
…
}
Thequeryplanthistimearoundisinteresting.Thoughwestillhaven’tcreatedanindex,wesawanimprovementinthetimethequerytookforexecutionandthenumberof
objectsscannedtoretrievetheresults.ThisisduetothefactthatMongodoesnotscantheremainingdocumentsoncethenumberofdocumentsspecifiedinthelimitfunctionisreached.Wecanthusconcludethatitisrecommendedthatyouusethelimitfunctiontolimityournumberofresults,whereasthemaximumnumberofdocumentsaccessedisknownupfront.Thismightgivebetterqueryperformance.Theword“might”isimportantas,intheabsenceofanindex,thecollectionmightstillbecompletelyscannedifthenumberofmatchesisnotmet.
ImprovementusingindexesMovingon,wewillcreateacompoundindexonstateandpincode.Theorderoftheindexisascendinginthiscase(asthevalueis1)andisnotsignificantunlessweplantoexecuteamultikeysort.ThisisadecidingfactorastowhethertheresultcanbesortedusingonlytheindexorwhetherMongoneedstosortitinmemorylateron,beforewereturntheresults.Asfarastheplanofthequeryisconcerned,wecanseethatthereisasignificantimprovement:
{
"cursor":"BtreeCursorstate_1_pincode_1",
…
"n":6446,
"nscannedObjects":6446,
"nscanned":6446,
…
"indexOnly":false,
…
"millis":16,
…
}
ThecursorfieldnowhastheBtreeCursorstate_1_pincode_1value,whichshowsthattheindexisindeedusednow.Asexpected,thenumberofresultsstaysthesameat6446.Thenumberofobjectsscannedintheindexanddocumentsscannedinthecollectionhavenowreducedtothesamenumberofdocumentsasintheresult.Thisisbecausewehavenowusedanindexthatgaveusthestartingdocumentfromwhichwecouldscan;then,onlytherequirednumberofdocumentswasscanned.Thisissimilartousingthebook’sindextofindawordorscanningtheentirebooktosearchfortheword.Thetime,millis,hascomedowntoo,asexpected.
ImprovementusingcoveredindexesThisleavesuswithonefield,indexOnly,andwewillseewhatthismeans.Toknowwhatthisvalueis,weneedtolookbrieflyathowindexesoperate.
Indexesstoreasubsetoffieldsoftheoriginaldocumentinthecollection.Thefieldspresentintheindexarethesameasthoseonwhichtheindexiscreated.Thefields,however,arekeptsortedintheindexinanorderspecifiedduringthecreationoftheindex.Apartfromthefields,thereisanadditionalvaluestoredintheindex;thisactsasapointertotheoriginaldocumentinthecollection.Thus,whenevertheuserexecutesaquery,ifthe
querycontainsfieldsonwhichanindexispresent,theindexisconsultedtogetasetofmatches.ThepointerstoredwiththeindexentriesthatmatchthequeryisthenusedtomakeanotherI/Ooperationtofetchthecompletedocumentfromthecollection;thisdocumentisthenreturnedtotheuser.
ThevalueofindexOnly,whichisfalse,indicatesthatthedatarequestedbytheuserinthequeryisnotentirelypresentintheindex;anadditionalI/Ooperationisneededtoretrievetheentiredocumentfromthecollectionthatfollowsthepointerfromtheindex.Hadthevaluebeenpresentintheindexitself,anadditionaloperationtoretrievethedocumentfromthecollectionwouldnotbenecessary,andthedatafromtheindexwillbereturned.Thisiscalledacoveredindex,andthevalueofindexOnly,inthiscase,willbetrue.
Inourcase,wejustneedthepincodes,sowhynotuseprojectioninourqueriestoretrievejustwhatweneed?Thiswillalsomaketheindexcoveredastheindexentrythatjusthasthestate’snameandpincode,andtherequireddata,canbeservedcompletelywithoutretrievingtheoriginaldocumentfromthecollection.Theplanofthequeryinthiscaseisinterestingtoo.Executingthefollowingqueryplan:
db.postalCodes.find({state:'Maharashtra'},{pincode:1,_id:0}).explain()
{
"cursor":"BtreeCursorstate_1_pincode_1",
…
"n":6446,
"nscannedObjects":0,
"nscanned":6446,
…
"indexOnly":true,
…
"millis":15,
…
}
ThevaluesofthenscannedobjectsandindexOnlyfieldsaresomethingtobeobserved.Asexpected,sincethedatawerequestedintheprojectioninthefindqueryisthepincodeonly,whichcanbeservedfromtheindexalone,thevalueofindexOnlyistrue.Inthiscase,wescanned6,446entriesintheindex;thus,thenscannedvalueis6446.We,however,didn’treachouttoanydocumentinthecollectiononthedisk,asthisquerywascoveredbytheindexalone,andnoadditionalI/Owasneededtoretrievetheentiredocument.Hence,thevalueofnscannedobjectsis0.
Asthiscollectioninourcaseissmall,wedonotseeasignificantdifferenceintheexecutiontimeofthequery.Thiswillbemoreevidentonlargercollections.Makinguseofindexesisgreatandgivesgoodperformance.Makinguseofcoveredindexesgivesevenbetterperformance.
NoteAnotherthingtorememberisthat,whereverpossible,tryanduseprojectiontoretrieveonlythenumberoffieldsweneed.The_idfieldisretrievedeverytimebydefault,unlessweplantoset_id:0tonotretrieveitifitisnotpartoftheindex.Executingacovered
queryisthemostefficientwaytoqueryacollection.
SomegotchasofindexcreationWewillnowseesomepitfallsinindexcreationandsomefactsaboutthearrayfield,whichisusedintheindex.
Someoftheoperatorsthatdonotusetheindexefficientlyarethe$where,$nin,and$existsoperators.Whenevertheseoperatorsareusedinthequery,oneshouldbearinmindapossibleperformancebottleneckwhenthedatasizeincreases.Similarly,the$inoperatormustbepreferredoverthe$oroperator,asbothcanbemoreorlessusedtoachievethesameresult.Asanexercise,trytofindthepincodesinthestateofMaharashtraandGujaratfromthepostalCodescollection.Writetwoqueries:oneusingthe$oroperatorandtheotherusingthe$inoperator.Explaintheplanforboththesequeries.
Whathappenswhenanarrayfieldisusedintheindex?Mongocreatesanindexentryforeachelementpresentinthearrayfieldofadocument.So,ifthereare10elementsinanarrayinadocument,therewillbe10indexentries,oneforeachelementinthearray.However,thereisaconstraintwhilecreatingindexesthatcontainarrayfields.Whencreatingindexesusingmultiplefields,nomorethanonefieldcanbeofthearraytype.Thisisdonetopreventapossibleexplosioninthenumberofindexesonaddingevenasingleelementtothearrayusedintheindex.Ifwethinkaboutitcarefully,foreachelementinthearray,anindexentryiscreated.Ifmultiplefieldsoftypearraywereallowedtobepartofanindex,wewouldhavealargenumberofentriesintheindexthatwouldbeaproductofthelengthofthesearrayfields.Forexample,adocumentaddedwithtwoarrayfields,eachoflength10,willadd100entriestotheindex,haditbeenallowedtocreateoneindexusingthesetwoarrayfields.
Thisshouldbegoodenoughfornowtoscratchthesurfacesofaplainvanillaindex.Wewillseemoreoptionsandtypesinsomeoftheupcomingrecipes.
BackgroundandforegroundindexcreationfromtheshellInthepreviousrecipe,welookedathowtoanalyzequeries,howtodecidewhatindexneedstobecreated,andhowwecreateindexes.This,byitself,isstraightforwardandlooksreasonablysimple.However,forlargecollections,thingsstartgettingworseastheindex-creationtimeislarge.Therearesomecaveatsthatweneedtokeepinmind.Theobjectiveofthisrecipeistothrowsomelightontheseconceptsandavoidpitfallswhilecreatingindexes,especiallyonlargecollections.
GettingreadyForthecreationofindexes,weneedtohaveaserverupandrunning.Asimplesinglenodeiswhatwewillneed.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,forhowtostarttheserver.
Startconnectingtwoshellstotheserverbyjusttypinginmongofromtheoperatingsystemshell.Bothofthemwill,bydefault,connecttothetestdatabase.
Ourtestdataforzippedcodesisprettysmalltodemonstratetheproblemfacedduringindexcreationonlargecollections.Weneedtohavemoredata;thus,wewillstartbycreatingsometosimulatetheproblemsduringindexcreation.Thedatahasnopracticalmeaningbutisgoodenoughtotesttheconcepts.Copythefollowingpieceofcodeinoneofthestartedshellsandexecuteit(itisaprettyeasysnippettotypeouttoo):
for(i=0;i<5000000;i++){
doc={}
doc._id=i
doc.value='Sometextwithnomeaningandnumber'+i+'inbetween'
db.indexTest.insert(doc)
}
Adocumentinthiscollectionwillbeasfollows:
{_id:0,value:"Sometextwithnomeaningandnumber0inbetween"}
Executionwilltakeaquitealotoftime,soweneedtobepatient.Oncetheexecutionisover,weareallsetfortheaction.
NoteIfyouarekeentoknowwhatthecurrentnumberofdocumentsloadedinthecollectionis,evaluatethefollowingcommandfromthesecondshellperiodically:
>db.indexTest.count()
Howtodoit…1. Createanindexonthevaluefieldofthedocument:
>db.indexTest.ensureIndex({value:1})
2. Whiletheindexcreationisinprogress,whichwilltakequitesometime,switchovertothesecondconsoleandexecutethefollowingcommand:
>db.indexTest.findOne()
BoththeindexcreationshellandtheonewhereweexecutedfindOnewillbeblocked,andthepromptwillnotbeshownonbothofthemuntiltheindexcreationiscomplete.
3. Now,thiswasforegroundindexcreationbydefault.Wewanttoseethebehaviorinbackgroundindexcreation.Dropthecreatedindex:
>db.indexTest.dropIndex({value:1})
4. Createtheindexagainbut,thistime,inthebackground:
>db.indexTest.ensureIndex({value:1},{background:true})
5. InthesecondMongoshell,executethefindOnequerythistimearound:
>db.indexTest.findOne()
Thisshouldreturnonedocumentthistimearound,unlikethefirstinstancewheretheoperationwasblockeduntilindexcreationcompletedintheforeground
6. Inthesecondshell,alsorepeatedlyexecutethefollowingexplainoperationwithanintervalofabout4to5secondsbetweeneachexplainplaninvocationuntiltheindex-creationprocessiscomplete:
>db.indexTest.find({value:"Sometextwithnomeaningandnumber0in
between"}).explain()
Howitworks…Let’snowanalyzewhatwejustdid.Wecreatedabout5milliondocumentswithnopracticalimportance,butwearejustlookingtogetsomedatathatwilltakeasignificantamountoftimeforindexbuilding.
Indexescanbebuiltintwoways,intheforegroundandbackground.Ineithercase,theshelldoesn’tshowthepromptuntiltheensureIndexoperationiscompletedanditdoesn’tshowtheblockstilltheindexiscreated.Youmightthenbewonderingwhatdifferenceitmakestocreateanindexinthebackgroundorforeground.
Thatisexactlywherethesecondshellwestartedcameintothepicture.Thisiswherewedemonstratedthedifferencebetweenabackgroundandforegroundindex-creationprocess.Wefirstcreatedtheindexintheforeground,whichisthedefaultbehavior.Thisindexbuildingdidn’tallowustoquerythecollection(fromthesecondshell)untiltheindexwasconstructed.ThefindOneoperationisblockeduntiltheentireindexisbuilt(fromthefirstshell)beforereturningtheresult.Ontheotherhand,theindexthatwasbuiltinthebackgrounddidn’tblockthefindOneoperation.Ifyouwanttotryinsertingnewdocumentsintothecollectionwhiletheindexbuildison,thistooshouldworkwell.FeelfreetodroptheindexandrecreateitinthebackgroundwhilesimultaneouslyinsertingadocumentintheindexTestcollection;youwillnoticethatitworkssmoothly.
Well,whatisthedifferencebetweenthetwoapproachesandwhynotalwaysbuildtheindexinthebackground?Apartfromanextraparameter,{background:true},whichcanalsobe{background:1},passedasasecondparametertotheensureIndexcall,therearefewdifferences.Theindex-creationprocessinthebackgroundwillbeslightlyslowerthantheindexcreatedintheforeground.Furthermore,internally,thoughitisnotrelevanttotheenduser,theindexcreatedintheforegroundwillbemorecompactthantheonecreatedinthebackground.
Otherthanthat,therewillbenosignificantdifference.Infact,ifasystemisrunningandanindexneedstobecreatedwhileitisservingtheendusers(notrecommended,buttherecanbeasituationthatdemandsindexcreationonalivesystem),thencreatingtheindexinthebackgroundistheonlywaywecandoit.ThereareotherstrategiesforperformingsuchadministrativeactivitiesthatwewillseeinsomerecipesinChapter4,Administration.
Tomakethingsworseforforegroundindexcreation,thelockacquiredbyMongoduringindexcreationisnotatthecollectionlevelbutatthedatabaselevel.Toexplainwhatthismeans,wewillhavetodroptheindexontheindexTestcollectionandperformasmallexerciseasfollows:
1. Startbycreatingtheindexintheforegroundfromtheshellbyexecutingthefollowingcommand:
>db.indexTest.ensureIndex({value:1})
2. Now,insertadocumentinthepersoncollection,whichmightormightnotexistatthispointinthetestdatabase:
>db.person.insert({name:'Amol'})
Wewillseethatthisinsertoperationonthepersoncollectionwillcreateablock,whiletheindexcreationontheindexTestcollectionisinprocess.If,however,thisinsertoperationisdoneonacollectioninadifferentdatabaseduringindexbuild(youcantrythisouttoo),itwillexecutenormallywithoutblocking.Thisclearlyshowsthatthelockisacquiredatthedatabaselevelandnotatthecollectionlevelorgloballevel.
NotePriortoversion2.2ofMongo,lockswereatthegloballevel,whichisatthemongodprocesslevelandnotatthedatabaselevelaswesawearlier.YouneedtorememberthisfactwhendealingwiththedistributionofMongothatisolderthanversion2.2.
CreatinguniqueindexesoncollectionanddeletingtheexistingduplicatedataautomaticallyInthisrecipe,wewilllookatcreatinguniqueindexesonacollection.Uniqueindexes,fromthenameitself,tellusthatthevaluewithwhichtheindexiscreatedhastobeunique.Whatifthecollectionalreadyhasdataandwewanttocreateauniqueindexonafieldwhosevalueisnotuniqueintheexistingdata?
Obviously,wecannotcreatetheindex,anditwillfail.Thereis,however,awaytodroptheduplicatesandcreatetheindex.Curioushowthiscanbeachieved?Yes?Keepreadingthisrecipe.
GettingreadyForthisrecipe,wewillcreateacollectioncalleduserDetails.Wewillneedtheservertobeupandrunning.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Also,starttheshellwiththeUniqueIndexData.jsscriptloaded.Thisscriptwillbeavailableonthebook’swebsitefordownload.Tofindouthowtostarttheshellwithascriptreloaded,refertotheConnectingtoasinglenodefromtheMongoshellwithapreloadedJavaScriptrecipeinChapter1,InstallingandStartingtheMongoDBServer.
Howtodoit…1. LoadtherequireddatainthecollectionusingtheloadUserDetailsDatamethod.2. ExecutethefollowingcommandontheMongoshell:
>loadUserDetailsData()
3. Seethecountofthedocumentsinthecollectionusingthefollowingquery(itshouldbe100):
>db.userDetails.count()
4. Now,trytocreateauniqueindexontheloginfieldontheuserDetailscollection:
>db.userDetails.ensureIndex({login:1},{unique:true})
5. Thiswillnotbesuccessfulandsomethinglikethefollowingerrorwillbeseenontheconsole:
{
"err":"E11000duplicatekeyerrorindex:test.userDetails.$login_1
dupkey:{:\"bander\"}",
"code":11000,
"n":0,
"connectionId":6,
"ok":1
}
6. Next,wewilltrytocreateanindexonthiscollectionbyeliminatingtheduplicates:
>db.userDetails.ensureIndex({login:1},{unique:true,dropDups:true})
7. Thiswillthrownoerrorsandfindthecountinthecollectionagain(takeanoteofthecountandcompareitwiththecountseenearlier,priortoindexcreation):
>db.userDetails.count()
8. Checkwhethertheindexisbeingusedbyviewingtheplanofthequery:
>db.userDetails.find({login:'mtaylo'}).explain()
Howitworks…Weinitiallyloadedourcollectionwith100documentsusingtheloadUserDetailsDatafunctionfromtheUniqueIndexData.jsfile.Welooped100timesandloadedthesamedataoverandoveragain.Thus,wegotduplicatedocuments.
WewillthentrytocreateauniqueindexontheloginfieldintheuserDetailscollectionasfollows:
>db.userDetails.ensureIndex({login:1},{unique:true})
Thiscreationfailsandindicatestheduplicatekeyitfirstencounteredonindexcreation.Itisbanderinthiscase.CanyouguesswhyanerrorwasfirstencounteredforthisuserID?ThisisnoteventhefirstIDwesawintheloadeddata.
TipWhenspecifying1inindexcreation,wemeantoconveythattheorderofthevaluesisascending.Trycreatingauniqueindexusing{login:-1}andseeiftheuserIDforwhichtheerrorisencounteredisdifferent.
Insuchascenario,weareleftwithtwooptions:
Manuallypickthedatatobedeleted/fixedandensurethatthefieldonwhichtheindexistobecreatedhasuniquedataacrosscollection.Thiscaneitherbedonemanuallyorprogrammatically,butitisoutsidethescopeofMongoanddonebytheenduseronacase-to-casebasis.Alternatively,ifwedon’tcaremuchaboutthedataasitisgenuinelyduplicatedandweneedtoretainjustonecopyofit,Mongoprovidesabrilliantwaytohandlethis.Apartfromtheregular{unique:true}optionusedtocreateauniqueindex,wewillprovideanadditionaldropDups:trueoption(ordropDups:1ifyouwish)thatwillblindlydeletealltheduplicatedataitencountersduringindexcreation.Notethatthereisnoguaranteeofwhichdocumentwillberetainedandwhichonewillbedeleted,butjustonewillberetained.Inthiscase,thereare20uniqueloginIDs.Onuniqueindexcreation,ifthevalueoftheloginIDisnotalreadypresentintheindex,itwillbeadded.Subsequently,whentheloginIDencounteredisalreadypresentintheindex,thecorrespondingdocumentisdeletedfromthecollection;thisexplainswhywewereleftwithjust20documentsintheuserDetailscollection.
CreatingandunderstandingsparseindexesSchema-freedesignisoneofthefundamentalfeaturesofMongo.Thisallowsdocumentsinacollectiontohavedisparatefields,withsomefieldspresentinsomedocumentsandabsentintheothers.Inotherwords,thesefieldsmightbesparse;thismighthavealreadygivenyouacluetowhatsparseindexesare.Inthisrecipe,wewillcreatesomerandomtestdataandseehowsparseindexesbehaveagainstanormalindex.Wewillseetheadvantageofusingasparseindexandonemajorpitfallinitsusage.
GettingreadyForthisrecipe,weneedtocreateacollectioncalledsparseTest.Wewillrequireaservertobeupandrunning.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Also,starttheshellwiththeSparseIndexData.jsscriptloaded.Thisscriptwillbeavailableonthebook’swebsitefordownload.Toknowhowtostarttheshellwithascriptreloaded,refertotheConnectingtoasinglenodefromtheMongoshellwithapreloadedJavaScriptrecipeinChapter1,InstallingandStartingtheMongoDBServer.
Howtodoit…1. Loadthedatainthecollectionbyinvokingthefollowingcommand(thisshould
import100documentsinthesparseTestcollection):
>createSparseIndexData()
2. Now,takealookatthedatabyexecutingthefollowingquery,takingnoteoftheyfieldinthetopfewresults:
>db.sparseTest.find({},{_id:0})
3. Wecanseethattheyfieldiseitherabsent,oritisuniqueifitispresent.Let’sthenexecutethefollowingquery:
>db.sparseTest.find({y:{$ne:2}},{_id:0}).limit(15)
4. Takeanoteoftheresult;itcontainsboththedocumentsthatmatchtheconditionaswellasthefieldsthatdonotcontainthegivenyfield.
5. Sincethevalueofyseemsunique,let’screateanewuniqueindexontheyfield:
>db.sparseTest.ensureIndex({y:1},{unique:1})
6. Thisthrowsanerror;itcomplainsthatthevalueisnotuniqueandthattheoffendingvalueisthenullvalue.
7. Wewillfixthisbymakingthisindexsparseasfollows:
>db.sparseTest.ensureIndex({y:1},{unique:1,sparse:1})
8. Thisshouldfixourproblem.Toconfirmthattheindexgotcreated,executethefollowingcommandontheshell:
>db.sparseTest.getIndexes()
9. Thisshouldshowtwoindexes:thedefaultoneon_idandtheonewejustcreatedintheprecedingstep.
10. Now,executethequeryweexecutedinstep3again,andseetheresult.11. Lookattheresultandcompareitwithwhatwesawbeforetheindexwascreated.Re-
executethequerybutwiththefollowinghint,forcingafulltablescan:
>db.sparseTest.find({y:{$ne:2}},{_id:0}).limit(15).hint({$natural:1})
12. Observetheresultagain.
Howitworks…Thosewerealotofstepsandworkthatwedid.Wewillnowdigdeeperandexplaintheinternalsandthereasoningfortheweirdbehaviorwesawwhilequeryingthecollectionthatusedsparseindexes.
ThetestdatathatwecreatedusingtheJavaScriptmethodjustcreateddocumentswithanxkeywhosevalueisanumberstartingfrom1andcangoallthewayupto100.Thevalueofyissetonlywhenxisamultipleof3;itsvaluetooisarunningnumberstartingfrom1andshouldgouptoamaximumof33whenxis99.
Wewillthenexecutethefollowingqueryandseethefollowingresultasexpected:
>db.sparseTest.find({y:{$ne:2}},{_id:0}).limit(15)
{"x":1}
{"x":2}
{"x":3,"y":1}
{"x":4}
{"x":5}
{"x":7}
{"x":8}
{"x":9,"y":3}
{"x":10}
{"x":11}
{"x":12,"y":4}
{"x":13}
{"x":14}
{"x":15,"y":5}
{"x":16}
Thevaluewhereyis2ismissingintheresult,andthisiswhatweintended.Notethatthedocumentswhereyisn’tpresentarestillseenintheresult.Wewillnowplantocreateanindexonthefieldy.Asthefieldiseithernotpresentorhasavaluethatisunique,itseemsnaturalthatauniqueindexshouldwork.
Internally,indexes,bydefault,addanentryintheindexevenifthefieldisabsentintheoriginaldocumentinthecollection.Thevaluethatgoesintheindexwillhoweverbenull.Thismeansthattherewillbethesamenumberofentriesintheindexasthenumberofdocumentsinthecollection.Forauniqueindex,thevalue(includingnullvalues)shouldbeuniqueacrossthecollection;thisexplainswhywegotanexceptionduringindexcreationwherethefieldissparse(notpresentinalldocuments).
Asolutionforthisproblemistomaketheindexsparse,andallwedidwasaddsparse:1totheoptionsalongwithunique:1.Thisdoesnotputanentryintheindexifthefielddoesn’texistinthedocument.Thus,theindexwillnowcontainfewerentries.Itwillonlycontainthoseentrieswherethefieldispresentinthedocument.Thisnotonlymakestheindexsmaller,makingiteasytofitinmemory,butalsosolvesourproblemofaddingauniqueconstraint.Thelastthingwewantistohaveanindexofacollectionwithmillionsofdocumentswithmillionsofentrieswhereonlyafewhundredhavesomevaluesdefined.
Thoughwesawthatcreatingasparseindexmadetheindexefficient,itintroducedanewproblemwheresomequeryresultswerenotconsistent.Whenweexecutedthesamequeryearlier,ityieldeddifferentresults.Executethefollowingcommand:
>db.sparseTest.find({y:{$ne:2}},{_id:0}).limit(15)
{"x":3,"y":1}
{"x":9,"y":3}
{"x":12,"y":4}
{"x":15,"y":5}
{"x":18,"y":6}
{"x":21,"y":7}
{"x":24,"y":8}
{"x":27,"y":9}
{"x":30,"y":10}
{"x":33,"y":11}
{"x":36,"y":12}
{"x":39,"y":13}
{"x":42,"y":14}
{"x":45,"y":15}
{"x":48,"y":16}
Whydidthishappen?Theanswerliesinthequeryplanforthisquery.Executethefollowingcommandtoviewtheplanofthisquery:
>db.sparseTest.find({y:{$ne:2}},{_id:0}).limit(15).explain()
Theplanshowsthatitusedtheindextofetchthematchingresults.Asthisisasparseindex,allthedocumentsthatdidn’thavethefieldyarenotpresentinit;thisdidn’tshowupintheresultthoughitshouldhave.Thisisthepitfallweneedtobecarefulofwhenqueryingacollectionwithasparseindexandthequeryhappenstousetheindex.Itwillyieldunexpectedresults.Onesolutionistoforceafulltablescan,whereweprovidethequeryanalyzerwithahint,usingthehintfunction.
Hintsareusedtoforcequeryanalyzerstouseauser-specifiedindex.Thoughthisisusuallynotrecommendedasyoureallyneedtoknowwhatyouaredoing,thisisoneofthescenarioswherethisisreallyneeded.So,howdoweforceafulltablescan?Allweneedtodoisprovide{$natural:1}inthehintfunction.Thenaturalorderingofacollectionistheorderinwhichitisstoredonthediskforaparticularcollection.Thishintforcesafulltablescan;now,wewillgettheresultsaswedidearlier.Thequeryperformancewill,however,degradeforlargecollections,asitisnowusingafulltablescan.
Ifthefieldispresentinalotofdocuments(thereisnoformalcutoffforwhatisalot;itcanbe50percentforsomeor75percentforothers)andnotreallysparse,makingtheindexsparsedoesn’tmakemuchsense,apartfromwhenwewanttomakeitunique.
NoteRememberthatthenullvalueofafieldandtheonenotpresentinthedocumentaredifferent.Iftwodocumentshaveanullvalueforthesamefield,uniqueindexcreationwillfailandcreatingitassparseindexwillnothelpeither.
ExpiringdocumentsafterafixedintervalusingtheTTLindexOneoftheniceandinterestingfeaturesinMongoisautomaticallyexpiringdatainthecollectionafterapredeterminedamountoftime.Thisisaveryusefultoolwhenwedesiretopurgesomedataolderthanaparticulartimeframe.Forarelationaldatabase,itisnotcommonforfolkstosetupabatchjobthatrunseverynighttoperformthisoperation.
WiththeTimeToLive(TTL)featureofMongo,weneednotworryaboutthisasthedatabasetakescareofitout-of-the-box.Let’sseehowwecanachievethis.
GettingreadyLet’screatesomedatainMongothatwewanttoplaywithusingtheTTLindexes.WewillcreateacollectioncalledttlTestforthispurpose.Wewillrequireaservertobeupandrunning.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Also,starttheshellwiththeTTLData.jsscriptloaded.Thisscriptwillbeavailableonthebook’swebsitefordownload.Toknowhowtostarttheshellwithascriptreloaded,refertotheConnectingtoasinglenodefromtheMongoshellwithapreloadedJavaScriptrecipeinChapter1,InstallingandStartingtheMongoDBServer.
Howtodoit…1. Assumingthattheserverisstartedandthescriptprovidedisloadedontheshell,
invokethefollowingmethodfromtheMongoshell:
>addTTLTestData()
2. CreateaTTLindexonthecreateDatefield:
>db.ttlTest.ensureIndex({createDate:1},{expireAfterSeconds:300})
3. Now,querythecollection:
>db.ttlTest.find()
4. Thisshouldgivethreedocuments.Repeattheprocessandexecutethefindqueryinapproximately30to40secondsrepeatedly,toseethethreedocumentsgettingdeleteduntiltheentirecollectionhaszerodocumentsleftinit.
Howitworks…Let’sstartbyopeningtheTTLData.jsfileandseewhatisgoingoninit.Thecodeisprettysimple;itjustgotthecurrentdateusingnewDate().ItthencreatedthreedocumentswithcreateDatethatweresome4,3,and2minutesbehindthecurrenttimeforthethreedocuments.So,ontheexecutionoftheaddTTLTestData()methodinthisscript,wewillhavethreedocumentsinthettlTestcollection,eachhavingadifferenceof1minuteintheircreationtime.
ThenextstepisthecoreoftheTTLfeature:thecreationoftheTTLindex.ItissimilartothecreationofanyotherindexusingtheensureIndexmethod,exceptthatitalsoacceptsasecondparameter,aJSONobject.Let’sseewhatthesetwoparametersare:
Thefirstparameteris{createDate:1};thiswilltellMongotocreateanindexonthecreateDatefield,andtheorderoftheindexisascendingasthevalueis1(-1wouldhavebeendescending)Thesecondparameter,{expireAfterSeconds:300},iswhatmakesthisindexaTTLindex;ittellsMongotoautomaticallyexpirethedocumentsafter300seconds(5minutes)
OK,but5minutessincewhen?Sincethetimetheywereinsertedinthecollectionorisitsomeothertimestamp?InthiscaseitconsidersthecreateTimefieldasthebase,asthiswasthefieldonwhichwecreatedtheindex.
Thisnowraisesaquestion:ifafieldisbeingusedasthebaseforthecomputationoftime,therehastobesomerestrictiononitstype.Itjustdoesn’tmakesensetocreateaTTLindex,aswecreatedearlier,onacharfieldthatholds,say,thenameofaperson.
Asweguessed,thetypeofthefieldcanbeaBSONtypedateoranarrayofdates.Whatwillhappeninthecasewhereanarrayhasmultipledates?Whatwillbeconsideredinthiscase?
ItturnsoutthatMongousestheminimumofdatesavailableinthearray.Tryoutthisscenarioasanexercise.
Puttwodatesseparatedbyabout5minutesfromeachotherinadocumentagainsttheupdateFieldfieldnameandthencreateaTTLindexonthisfield,asyoudidearlier,toexpirethedocumentafter10minutes(600seconds).Querythecollectionandseewhenthedocumentgetsdeletedfromthecollection.Itshouldgetdeletedafterroughly10minuteshaveelapsedsincetheminimumtimevaluepresentintheupdateFieldarray.
Apartfromtheconstraintforthetypeoffield,thereareafewmoreconstraints.
Ifafieldalreadyhasanindexonit,youcannotcreateaTTLindexonit.Asthe_idfieldofthecollectionalreadyhasanindexbydefault,iteffectivelymeansyoucannotcreateaTTLindexonthe_idfield.ATTLindexcannotbeacompoundindexthatinvolvesmultiplefields.Ifafielddoesn’texist,itwillneverexpire(thisisprettylogical,Iguess).ATTLindexcannotbecreatedoncappedcollections.Incaseyouarenotawareof
cappedcollections,theyarespecialcollectionsinMongowithasizelimitonthemwiththefirstinfirstout(FIFO)insertionorder;theydeleteolddocumentstomakeplacefornewdocuments,ifneeded.
NoteTTLindexesaresupportedonlyonMongoVersion2.2andabove.Alsonotethatthedocumentwillnotbedeletedatexactlythegiventimeinthefield.Thecyclewillbeofthegranularityof1minute;itwilldeleteallthedocumentseligiblefordeletionsincethelasttimethecyclewasrun.
There’smore…Ausecasemightnotdemandthedeletionofallthedocumentsafterafixedintervalhaselapsed.Whatifwewanttocustomizethepointuntilwhichadocumentstaysinthecollection?Thistoocanbeachieved,andwillbedemonstratedinthenextrecipe.
ExpiringdocumentsatagiventimeusingtheTTLindexInthepreviousrecipe,wesawhowdocumentscanbeexpiredafterafixedtimeperiod.However,therecanbesomecaseswherewemightwanttohavedocumentsthatexpireatdifferenttimes.Thisisnotwhatwesawinthepreviousrecipe.Inthisrecipe,wewillseehowwecanspecifythetimeatwhichadocumentcanbeexpired(itmightbedifferentfordifferentdocuments).
GettingreadyForthisrecipe,wewillcreateacollectioncalledttlTest2.Wewillrequireaservertobeupandrunning.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Also,starttheshellwiththeTTLData.jsscriptloaded.Thisscriptwillbeavailableonthebook’swebsitefordownload.Toknowhowtostarttheshellwithascriptreloaded,refertotheConnectingtoasinglenodefromtheMongoshellwithapreloadedJavaScriptrecipeinChapter1,InstallingandStartingtheMongoDBServer.
Howtodoit…1. LoadtherequireddatainthecollectionusingtheaddTTLTestData2method.Execute
thefollowingcommandontheMongoshell:
>addTTLTestData2()
2. Now,createtheTTLindexonthettlTest2collection:
>db.ttlTest2.ensureIndex({expiryDate:1},{expireAfterSeconds:0})
3. Executethefollowingfindquerytoviewthethreedocumentsinthecollection:
>db.ttlTest2.find()
4. Now,afterapproximately4,5,and7minutes,seethedocumentswithIDs2,1,and3,respectively,gettingdeleted.
Howitworks…Let’sstartbyopeningtheTTLData.jsfileandseeingwhatmustbegoingoninit.OurmethodforthisrecipeisaddTTLTestData2.ThismethodsimplycreatesthreedocumentsinthetllTest2collectionwith_idof1,2,and3,withtheirexipryDatefieldsetto5,4,and7minutes,respectively,afterthecurrenttime.Notethatthisfieldhasafuturedate,unlikethedategiveninthepreviousrecipewhereitwasacreationdate.
Next,wewillcreateanindex:
>db.ttlTest2.ensureIndex({expiryDate:1},{expireAfterSeconds:0})
Thisisdifferentfromthewaywecreatedtheindexforthepreviousrecipe,wheretheexpireAfterSecondsfieldoftheobjectwassettoanon-zerovalue.ThisishowthevalueoftheexpireAfterSecondsattributeisinterpreted.Ifthevalueisnon-zero,thatis,thetimeinsecondselapsedafterabasetime,thenthedocumentwillbedeletedfromthecollectionbyMongo.Thisbasetimeisthevalueheldinthefieldonwhichtheindexiscreated(createTime,asinthepreviousrecipe).Ifthisvalueis0,thedatevalueonwhichtheindexiscreated(expiryDateinthiscase)willbethetimewhenthedocumentwillexpire.
Toconclude,TTLindexesworkwellifyouwanttodeletethedocumentuponexpiry.Therearequiteafewcaseswherewemightwanttomovethedocumenttoanarchivecollectionwherethearchivedcollectionmightbecreatedbasedon,say,theyearandmonth.Inanysuchscenario,theTTLindexisnothelpful,andwemightourselveshavetowriteanexternaljobthatdoesthisworkorreadsthecollectionforarangeofdocuments,addsthemtothetargetcollection,anddeletesthemfromthesourcecollection.JIRA(https://jira.mongodb.org/browse/SERVER-6895)isalreadyopentoaddressthisissue.YoumightwanttokeepaneyeonJIRAforfurtherdevelopmentonit.
There’smore…Inthisandthepreviousrecipe,welookedatwhatTTLindexesareandhowtousethem.However,whatifaftercreatingaTTLindexwewanttomodifyittochangethevalueoftheexpireAfterSecondsvalue?ItispossibleusingthecollModoption.SeemoreonthisoptioninChapter4,Administration.
Chapter3.ProgrammingLanguageDriversInthischapter,wewillcoverthefollowingrecipes:
InstallingPyMongoExecutingqueryandinsertoperationsusingPyMongoExecutingupdateanddeleteoperationsusingPyMongoAggregationinMongousingPyMongoMapReduceinMongousingPyMongoExecutingqueryandinsertoperationsusingaJavaclientExecutingupdateanddeleteoperationsusingaJavaclientAggregationinMongousingaJavaclientMapReduceinMongousingaJavaclient
IntroductionWhatwehaveseensofarusingMongoDBisthatweexecutethemajorityofoperationsfromtheshell.TheMongoshellisagreattoolforadministratorstoperformadministrativetasksandfordeveloperswhowouldliketoquicklytestthingsbyqueryingthedatabeforecodingthelogicintheapplication.However,howdowewriteapplicationcodethatwillallowustoquery,insert,update,anddelete(amongotherthings)thedatainMongoDB?Therehastobealibraryfortheprogramminglanguageinwhichwewriteourapplication.
WeshouldbeabletoinstantiatesomethingorinvokemethodsfromtheprogramtoperformsomeoperationsontheremoteMongoprocess.HowwillthishappenunlessthereisabridgethatunderstandstheprotocolofcommunicationwiththeremoteserverandisabletotransmittheoperationtoexecuteoverthewirewerequireontheMongoDBserverprocessandgettheresultbacktotheclient?Thisbridge,simplyput,iscalledthedriver,whichisalsoreferredtoasaclientlibrary.DriversformthebackboneofMongo’sprogramminglanguageinterface.Intheabsenceofdrivers,itwouldhavebeentheresponsibilityoftheapplicationtocommunicatewiththeMongoDBserverusingaprotocolthattheserverunderstands.Thiswouldhavebeenalotofworknotonlytodevelopbutalsototestandmaintain.Thoughthecommunicationprotocolisstandard,therecannotbeoneimplementationthatworksforalllanguages.Avarietyofprogramminglanguagesneedtohavetheirownimplementationsthatexpose,moreorless,thesamesortofprogramminginterfacetoalllanguages.ThecoreconceptsofclientAPIsthatwewillseeinthechapterholdsgoodforalllanguages.
NoteMongohassupportforallthemajorprogramminglanguagesandissupportedbyMongoDB,Inc.Thereisevenahugearrayofprogramminglanguagessupportedbythecommunity.YoumighttakealookatthevariousplatformssupportedbyMongobyvisitinghttp://docs.mongodb.org/ecosystem/drivers/community-supported-drivers/.
ToknowmoreabouttheunderlyingprotocolusedbyMongoDBoraboutcommunicationbetweentheclientandtheserver,andtoseewhatgoesoverthewire,refertomyblogathttp://amolnayak.blogspot.in/2014/09/mongodb-wire-protocol-analysis_14.html.
InstallingPyMongoThisrecipeisaboutsettingupPyMongo,whichisthePythondriverforMongoDB.Inthisrecipe,wewilldemonstratetheinstallationofPyMongoonboththeWindowsandLinuxplatforms.
GettingreadyAsimplesinglenodeiswhatwewillneedforthesanitytestingofthedriver,oncetheinstallationiscomplete.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.WewillalsorequireanInternetconnectiontodownloadPythonandPyMongo.Oncetheseprerequisitesaremet,wearereadytobegin.
ThefirststepistoinstallPythononthecomputerifitisnotalreadythere.Visithttp://www.python.org/getit/,downloadthelatestversionofPythonforyourplatform,andinstallit.ThestepsfortheinstallationPythonarenotcoveredinthisrecipe.However,beforeyouproceedtothenextsection,Pythonshouldbeavailableonthehostoperatingsystem.
Howtodoit…1. WewillfirstsetupPyMongoontheWindowsplatform.Visit
https://pypi.python.org/pypi/pymongoanddownloadtheMSWindowsInstallerthatisappropriatefortheversionofPythoninstalled.MyPythonversionis2.7,andhence,thiswastheversiondownloaded.
2. Double-clickonthedownloadedinstallerandclickonNext,asshowninthefollowingscreenshot:
3. OnclickingNext,iftherightversionofthePythoninstallationisfound,wecangoaheadwiththeinstallation.WithacouplemoreclicksontheNextbutton,weshouldhavePyMongoinstalled.
Let’sdoasanitytestoftheinstallationasfollows:
1. Fromthecommandprompt,startthePythonshellbytypinginpythonasfollows:
C:\Users\Amol>python
Python2.7.5(default,May152013,22:43:36)[MSCv.150032bit
(Intel)]onwin32
Type"help","copyright","credits"or"license"formoreinformation.
2. WewillthenimportPyMongo;thisshouldhappenwithoutanyerror.Ifwedon’tseeanyimporterror,ourinstallationhasgonethroughsuccessfully:
>>>importpymongo
>>>
Inthissection,wewillseehowtosetupPyMongoonaLinuxsystem.Wewillinstallit
onaDebianflavor,Ubuntu.WewillnotuseUbuntu’sadvancedpackagingtool(apt)toinstallPyMongoforacoupleofreasons:
Ubuntu’sdefaultrepositorymightnothavethelatestreleaseofthedriverTheapttoolisspecifictoDebianLinuxanditsvariants
Therefore,wewillusepip,atooltomanagePythonpackages.ThistoolusesPythonPackageIndex(PyPI)toretrievethedependencies;thisistheofficialrepositoryforthird-partylibrariesinPython.
So,ourinstallationissplitintosectionsasfollows:
Installingpip,ifitisnotalreadyinstalledonUbuntu,usingapt(itwillbedoneinadifferentwayonnon-Debianvariants)UsingpiptoinstallPyMongo;thisstepisthesame,irrespectiveoftheflavorofLinuxyouareusing
Let’sstartbyinstallingpiponUbuntuasfollows:
1. Executethefollowingcommand:
sudoapt-getinstallpython-pip
2. Typeinytoconfirmtheinstallation,anditwilldownloadthepackage.3. Now,installthepackagebyexecutingthefollowingcommand:
amol@Amol-PC:~$sudoapt-getinstallpython-pip
4. Oncethesetupiscomplete,executethefollowingcommandfromtheshelltoinstallPyMongo:
$pipinstallpymongo
5. ThiswillinstallPyMongo.MyPythonversionisthe2.7.xrelease.ForthePython3.xrelease,usepymongo3asthepackagenameinstead:
$pipinstallpymongo3
6. OncetheinstallationofPyMongoiscomplete,wewilldoaquicksanitytest.Fromthecommandpromptoftheoperatingsystem,startthePythonshellbytypinginpython:
amol@Amol-PC:~$python
Python2.7.3(default,Apr102013,05:46:21)
[GCC4.6.3]onlinux2
Type"help","copyright","credits"or"license"formoreinformation.
7. WewillthenimportPyMongo;thiswillhappenwithoutanyerror:
>>>importpymongo
>>>
8. WearenowdonewiththeinstallationofthePyMongosetup.
There’smore…InstallationofPyMongoisjustaprerequisiteforrunningPythoncodethatcanconnecttoMongotoperformtheoperations.Thenextcoupleofrecipes,ExecutingqueryandinsertoperationsusingPyMongoandExecutingupdateanddeleteoperationsusingPyMongo,areallaboutdemonstratingthesebasicoperationsinPythonprogrammingusingPyMongotoconnecttothedatabaseandexecutethem.
ExecutingqueryandinsertoperationsusingPyMongoThisrecipeisallaboutexecutingbasicqueryandinsertoperationsusingPyMongo.ThisissimilartowhatwedidwiththeMongoshellearlierinthebook.
GettingreadyToexecutesimplequeries,weneedtohaveaserverupandrunning.Asimplesinglenodeiswhatwewillneed.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Thedataonwhichwewilloperateneedstobeimportedinthedatabase.ThestepstoimportthedataaregivenintheCreatingtestdatarecipeinChapter2,Command-lineOperationsandIndexes.PythonisexpectedtobeinstalledonthehostoperatingsystemandMongo’sclientforpython,PyMongo,needstobeinstalled.LookatthepreviousrecipetoknowhowtoinstallPyMongoforyourhostoperatingsystem.Also,inthisrecipe,wewillexecuteinsertoperationsandprovideawriteconcerntouse.
Howtodoit…Let’sstartwithsomequeryingforMongofromthePythonshell.ThiswillbeidenticaltowhatwedofromtheMongoshell,exceptthatthisisinthePythonprogramminglanguageasopposedtoJavaScriptthatwehaveintheMongoshell.WecanusethebasicsthatwewillseeheretodeveloplargescaleproductionsystemsthatrunonPythonanduseMongoDBasadatastore.
Let’sgetstartedbyfirststartingthePythonshellfromtheoperatingsystem’scommandprompt.Thefollowingstepsareindependentofthehostoperatingsystem:
1. Typeinthefollowingcommandintheshell,andthePythonshellwillstart:
$python
Python2.7.5(default,May152013,22:43:36)[MSCv.150032bit
(Intel)]onwin32Type"help","copyright","credits"or"license"for
moreinformation.
>>>
2. Then,importthepymongopackageandcreatetheclientasfollows:
>>>importpymongo
>>>client=pymongo.MongoClient('localhost',27017)
Analternativewaytoconnectisasfollows:
>>>client=pymongo.MongoClient('mongodb://localhost:27017')
3. Thisworkswelltooandachievesthesameresult.Nowthatwehavetheclient,ournextstepistogetthedatabaseonwhichwewillperformtheoperations.Now,unlikesomeprogramminglanguageswherewehaveagetDatabase()methodtogetaninstanceofthedatabase,wewillgetareferencetothedatabaseobjectonwhichwewillperformtheoperations(testinthiscase).Wewilldothisinthefollowingway:
>>>db=client.test
Anotheralternativewayisasfollows:
>>>db=client['test']
4. WewillquerythepostalCodescollection.Wewilllimitourresultsto10itemsasfollows:
>>>postCodes=db.postalCodes.find().limit(10)
5. Iterateovertheresultsasfollows.Watchoutfortheindentationoftheprintaftertheforstatement.Thefollowingfragmentshouldprint10documentsthatarereturned:
>>>forpostCodeinpostCodes:print'City:',postCode['city'],',
State:',postCode['state'],',PinCode:',postCode['pincode']
6. Tofindonedocument,executethefollowingcommand:
>>>postCode=db.postalCodes.find_one()
7. Printthestateandcityofthereturnedresultasfollows:
>>>print'City:',postCode['city'],',State:',postCode['state'],
',PinCode:',postCode['pincode']
8. Let’squerythetop10citiesinthestateofGujaratsortedbythenameofthecity,andwewilljustselectthecity,state,andthepincode.ExecutethefollowingqueryfromthePythonshell:
>>>cursor=db.postalCodes.find({'state':'Gujarat'},{'_id':0,
'city':1,'state':1,'pincode':1}).sort('city',
pymongo.ASCENDING).limit(10)
Theprecedingcursor’sresultscanbeprintedinthesamewayinwhichweprintedtheresultsinstep5.
9. Let’ssortthedatawequery.Wewanttosortbythedescendingorderofthestateandthenbytheascendingorderofthecity.Wewillwritethequeryasfollows:
>>>city=db.postalCodes.find().sort([('state',pymongo.DESCENDING),
('city',pymongo.ASCENDING)]).limit(5)
10. Iteratethroughthiscursor;thisshouldprintoutfiveresultsontheconsole.Refertostep5forhowweiterateoveracursorreturnedtoprinttheresults.
11. So,weplayedabittofinddocumentsandcoveredbasicoperationsfromPythonasfarasqueryingMongoDBisconcerned.Now,let’sseeabitabouttheinsertoperation.Wewilluseatestcollectiontoperformtheseoperationsandnotdisturbourpostalcodestestdata.WewilluseapymongoTestcollectionforthispurposeandadddocumentsinalooptoitasfollows:
>>>foriinrange(1,21):db.pymongoTest.insert({'i':i})
12. Theinsertoperationcantakealistofdictionaryobjectsandperformabulkinsert.Sonow,somethinglikethefollowinginsertqueryisperfectlyvalid:
>>>db.pythonTest.insert([{'name':'John'},{'name':'Mark'}])
Anyguessesonthereturnvalue?Inthecaseofasingledocumentinsert,thereturnvalueisthevalueof_idforthenewlycreateddocument.Inthiscase,itisalistofIDs.
13. Let’sexecuteaninsertqueryagain,thistime,withawriteconcernprovided.Executethefollowingwriteconcernwithw=1andj=True:
>>>db.pymongoTest.insert({'name':'Jones'},w=1,j=True)
Howitworks…Weinstantiatedtheclientandthengotthereferencetotheobjectthatwillbeusedtoaccessthedatabaseonwhichwewishtoperformoperationsinstep3.Thereareacoupleofwaystogetthisreference.Thefirstoption(db=client.test)ismoreconvenient,unlessyourdatabasenamehasaspecialcharacter,suchasahyphen(-).Forexample,ifthenameisdb-test,wewouldhavenooptionotherthantousethe[]operatortoaccessthedatabase.Usingeitherofthealternatives,wenowhaveanobjectforthetestdatabaseinthedbvariable.AfterwegottheclientandthedbinstanceinPython,wequeriedtofindthetop10documentsinthenaturalorderfromthecollectioninstep4.Thesyntaxisexactlyidenticaltohowthisquerywouldhavebeenexecutedfromtheshell.Step5simplyprintedouttheresults,10oftheminthiscase.Generally,ifyouneedinstanthelponaparticularclassusingtheclassnameoraninstanceofthisclassfromthePythoninterpreter,simplyexecutedir(<class_name>)ordir(<objectofaclass>);whichgivesalistingoftheattributesandfunctionsdefinedinthemodulepassed.Forexample,dir('pymongo.MongoClient')ordir(client),whereclientisthevariablethatholdsthereferencetoaninstanceofpymongo.MongoClient,canbeusedtogetthelistingofallthesupportedattributesandfunctions.Thehelpfunctionismoreinformativeandprintsoutthemodule’sdocumentation,whichisagreatsourceofreferencejustincaseyouneedinstanthelp.Trytypinginhelp('pymongo.MongoClient')orhelp(client).
Insteps4and5,wequeriedthepostalCodescollection,limitedtheresulttothetop10results,andprintedthem.Thereturnedobjectisoftypepymongo.cursor.Cursorclass.Thenextstepgotjustonedocumentfromthecollectionusingthefind_one()function.ThisissynonymoustothefindOne()methodonthecollectioninvokedfromtheshell.Thevaluereturnedbythisfunctionisaninbuiltdictobject.
Instep8,weexecutedanotherfindtoquerythedata.However,thistimearound,wepassedtwoparameterstoit.Thefirstonewasthequery,whichlookedsimilartohowweexecutefromtheMongoshell.However,thetypeoftheparameterinPythonisdict.Thesecondparameterwasanotherobjectoftypedict.Thisdictionaryisusedtoprovidethefieldstobereturnedintheresult.Avalue1forafieldindicatesthatthevalueistobeselectedandreturnedintheresult.Thisissynonymoustoselectintherelationaldatabase,withafewsetsofcolumnsprovidedexplicitlytobeselected.The_idfieldisselectedbydefault,unlessitisexplicitlysetto0intheselectordictobject.Theselectorprovidedhereis{'_id':0,'city':1,'state':1,'pincode':1},whichselectsthecity,state,andpincodeandsuppressesthe_idfield.Wehavethesortmethodtoo.Thismethodhastwoformats:sort(sort_field,sort_direction)andsort([(sort_field,sort_direction)…(sort_field,sort_direction)]).
Thefirstoneisusedwhenwewanttosortbyonefieldonly.Thesecondrepresentationacceptsalistofpairsofsortfieldsandsortdirectionsandisusedwhenwewanttosortbymultiplefields.Weusedthefirstformatinthequeryinstep8andthesecondformatinourqueryinstep9,aswesortedfirstbystatenameandthenbycity.
Ifwelookatthewayweinvokedsort,itwasinvokedonthecursorinstance.Similarly,
thelimitfunctionwasalsoontheCursorclass.Theevaluationislazyandisdeferreduntiltheiterationisperformedtoretrievetheresultsfromthecursor.Untilthatpoint,thecursorobjectisnotevaluatedontheserver.
Instep12,weinsertedadocument20timesinacollection.Eachinsert,asweseeinthePythonshell,willreturnagenerated_idfield.Intermsofthesyntaxofinsert,itisexactlyidenticaltotheoperationweperformfromtheshell.Theparameterpassedfortheinsertoperationisagainanobjectoftypedict.
Instep13,wepassedalistofdocumentstoinsertinthecollection.Thisinsertsmultipledocumentsinonecalltotheserver;thisisabulkinsert.ThereturnvalueinthiscaseisalistofIDs,oneforeachdocumentinsertedandinthesameorderaspassedintheinputlist.However,asMongoDBdoesn’tsupporttransactions,allinsertswillbeindependentofeachother,andafailureofoneinsertdoesn’tautomaticallyrollbacktheentireoperation.
Addingtothefunctionalitytoinsertmultipledocumentsdemandedanotherparameterforthebehavior.Whenoneoftheinsertsinthegivenlistfails,shouldtheremaininginsertscontinueorshouldtheinsertionstopassoonasthefirsterrorisencountered?Thenameoftheparametertocontrolthisbehavioriscontinue_on_error,anditsdefaultvalueisFalse,thatis,stopassoonasthefirsterrorisencountered.IfthisvalueisTrueandmultipleerrorsoccurduringinsertion,onlythelatesterrorwillbeavailable.Hence,thedefaultoptionisFalse,asthevalueissensible.Let’stakealookatacoupleofexamples.InthePythonshell,executethefollowingcommands:
>>>db.contOnError.drop()
>>>db.contOnError.insert([{'_id':1},{'_id':1},{'_id':2},{'_id':2}])
>>>db.contOnError.count()
Thecountwewillgetis1,whichisforthefirstdocumentwiththe_idfieldas1.Themomentanotherdocumentwiththesamevalueofthe_idfieldisfound,1inthiscase,anerroristhrown,andthebulkinsertstops.Now,executethefollowinginsertoperation:
>>>db.contOnError.drop()
>>>db.contOnError.insert([{'_id':1},{'_id':1},{'_id':2},{'_id':2}],
continue_on_error=True)
>>>db.contOnError.count()
Here,wepassedanadditionalparameter,continue_on_error,whosevalueisTrue.Asaresultofthisparameter,theinsertoperationwillcontinuewiththenextdocumentevenifanintermediateinsertoperationfailed.Thesecondinsertwith_id:1fails;yet,thenextinsertgoesthroughbeforeanotherinsertwith_id:2fails(asonedocumentwiththis_idisalreadypresent).Also,theerrorreportedisforthelastfailure,theonewith_id:2inthiscase.
Anotherparameterischeck_keys,whichchecksforkeynamesthatstartwith$andtheexistenceof.inthekey.Ifoneisfound,itwillraisebson.errors.InvalidDocument.Thus,thefollowinginsertoperationwillfail:
>>>db.pymongoTest.insert({'a.b':1})
Bydefault,thecheckwilltakeplace,unlessyouexplicitlydisableitbysettingthevalue
ofthisparametertoFalse.Thus,thefollowingquerywillpassandreturnanobjectIDoftheinserteddocument:
>>>db.pymongoTest.insert({'a.b':1},check_keys=False)
Step13executedtheinsertoperationbutprovidedawriteoperationtobeusedfortheinserttobeexecuted.
ExecutingupdateanddeleteoperationsusingPyMongoInthepreviousrecipe,wesawhowtoexecutefindandinsertoperationsinMongoDBusingPyMongo.Inthisrecipe,wewillseehowupdatesanddeletionsworkfromPython.Wewillalsoseewhatatomicfindandupdate/deleteisandhowtoexecutetheseoperations.Wewillthenconcludebyrevisitingfindoperationsandlookatsomeinterestingfunctionsofthecursorobject.
GettingreadyIfyouhavealreadyseenandcompletedthepreviousrecipe,youareallsettogo.Ifnot,itisrecommendedthatyoufirstcompletethepreviousrecipebeforegoingaheadwiththisrecipe.
Beforewegetstarted,let’sdefineasmallfunctionthatiteratesthroughthecursorandshowstheresultsofacursorontheconsole.WewillusethisfunctionwheneverwewanttodisplaytheresultsofaqueryonthepymongoTestscollection.Thefollowingisthefunction’sbody:
>>>defshowResults(cursor):
ifcursor.count()!=0:
foreincursor:
printe
else:
print'Nodocumentsfound'
Also,refertosteps1and2inthepreviousrecipetolearnhowtocreateaconnectiontotheMongoDBserverandcreatethedbobjectusedtoperformtheCRUDoperationonthisdatabase.Also,refertostep11inthepreviousrecipetolearnhowtoinserttherequiredtestdatainthepymongoTestcollection.YoumightconfirmthedatainthiscollectionbyexecutingthefollowingcommandfromthePythonshelloncethedataispresent:
>>>showResults(db.pymongoTest.find())
Forapartoftherecipe,oneisalsoexpectedtoknowandstartareplicasetinstance.RefertotheStartingmultipleinstancesaspartofareplicasetandConnectingtothereplicasetfromtheshelltoqueryandinsertdatarecipesinChapter1,InstallingandStartingtheMongoDBServer.
Howtodoit…1. WewillsetafieldnamedgtTen,specifyingwithaBooleanvalueTrueifthefieldi
hasavaluegreaterthan10.Let’sexecutethefollowingupdatecommand:
>>>db.pymongoTest.update({'i':{'$gt':10}},{'$set':{'gtTen':True}})
{u'updatedExisting':True,u'connectionId':8,u'ok':1.0,u'err':
None,u'n':1}
2. Querythecollectionandviewitsdatabyexecutingthefollowingcommand,andcheckthedatathatgotupdated:
>>>showResults(db.pymongoTest.find())
3. Theresultsdisplayedconfirmthatonlyonedocumentgotupdated.Wewillnowexecutethesameupdateagainbut,thistimearound,wewillupdateallthedocumentsthatmatchtheprovidedquery.ExecutethefollowingupdateoperationfromthePythonshell.Notethatthisupdateisidenticaltotheoneweperformedinstep1,exceptfortheadditionalparametercalledmultiwhosevalueisgivenasTrue.Also,notethevalueofnintheresponse;itis10thistime:
>>>db.pymongoTest.update({'i':{'$gt':10}},{'$set':{'gtTen':True}},
multi=True)
{u'updatedExisting':True,u'connectionId':8,u'ok':1.0,u'err':
None,u'n':10}
4. Executetheoperationweperformedinstep2againtoviewthecontentsinthepymongoTestcollectionandverifythedocumentsupdated.
5. Let’stakealookathowupsertoperationscanbeperformed.Upsertsareupdatesplusinserts.Theyupdateadocumentifoneexists,justasanupdatewilldo;otherwise,itwillinsertanewdocument.Let’stakealookatanexample.Considerthefollowingcommandonadocumentthatdoesn’texistinthecollection:
>>>db.pymongoTest.update({'i':21},{'$set':{'gtTen':True}})
6. Theupdateherewillnotupdateanythingandwillreturnthenumberofupdateddocumentsas0.However,let’sconsiderthatwewanttoupdateadocumentifitexistsorinsertanewdocumentandapplytheupdateonitatomicallyandthenperformanupsertoperation.Inthiscase,theupsertoperationisexecutedasfollows(notetheresponsethatmentionsupsert,ObjectIdofthenewlyinserteddocument,andtheupdatedExistingvalue,whichisFalse):
>>>db.pymongoTest.update({'i':21},{'$set':{'gtTen':True}},
upsert=True)
{u'ok':1.0,u'upserted':ObjectId('52a8b2f47a809beb067ecd8a'),u'err':
None,u'connectionId':8,u'n':1,u'updatedExisting':False}
7. Let’sseehowtodeletedocumentsfromthecollectionusingtheremovemethod:
>>>db.pymongoTest.remove({'i':21})
{u'connectionId':8,u'ok':1.0,u'err':None,u'n':1}
8. Ifwelookatthevalueofnintheprecedingresponse,wewillseethatitis1.This
meansthatonedocumentgotremoved.Thereisanotherwaytoremovethedocumentby_id.Let’sinsertonedocumentinthecollectionandlaterremoveit.Insertthedocumentasfollows:
>>>db.pymongoTest.insert({'i':23,'_id':23})
9. Now,removethisdocumentfromthecollectionasfollows:
>>>db.pymongoTest.remove(23)
{u'connectionId':8,u'ok':1.0,u'err':None,u'n':1}
10. Wewilllookatthefindandmodifyoperationsnow.Wecanlookatthisoperationasawaytofindadocumentandthenupdate/removeit;bothoftheseoperationsareperformedatomically.Oncetheoperationisperformed,thedocumentreturnediseithertheonebeforeoraftertheupdateoperationwasdone(inthecaseofremove,therewillbenodocumentaftertheoperation).Intheabsenceofthisoperation,wecannotguaranteeanatomicfind,updatethedocument,andreturntheresultingdocumentbefore/aftertheupdateinscenarioswheremultipleclientconnectionscouldbeperformingsimilaroperationsonthesamedocument.ThefollowingisanexampleofhowtoperformthesefindandmodifyoperationsinPython:
>>>db.pymongoTest.find_and_modify({'i':20},{'$set':
{'inWords':'Twenty'}})
{u'i':20,u'gtTen':True,u'_id':
ObjectId('52a8a1eb072f651578ed98b2')}
Theprecedingresultshowsusthattheresultingdocumentreturnedistheonebeforetheupdatewasapplied.
11. Executethefollowingfindoperationtoqueryandviewthedocumentthatweupdatedinthepreviousstep.TheresultingdocumentwillcontainthenewlyaddedinWordsfield:
>>>db.pymongoTest.find_one({'i':20})
{u'i':20,u'_id':ObjectId('52aa0cfe072f651578ed98b7'),u'inWords':
u'Twenty'}
12. Wewillexecutethefindandmodifyoperationsagainbut,thistimearound,wewillreturntheupdateddocumentratherthanthedocumentbeforetheupdate,whichwesawinstep9.ExecutethefollowingcommandfromthePythonshell:
>>>db.pymongoTest.find_and_modify({'i':19},{'$set':
{'inWords':'Nineteen'}},new=True)
{u'i':19,u'gtTen':True,u'_id':
ObjectId('52a8a1eb072f651578ed98b1'),u'inWords':u'Nineteen'}
13. WesawhowtoqueryusingPyMongointhepreviousrecipe.Here,wewillcontinuewiththequeryoperation.Wesawhowthesortandlimitfunctionswerechainedtothefindoperation.TheprototypeofthecallonthepostalCodescollectionisasfollows:
db.postalCode.find(..).limit(..).sort(..)
14. Thereisanalternatewaythatachievesthesameresultastheoneachievedearlier.ExecutethefollowingqueryinthePythonshelltoachievethesameresult:
>>>cursor=db.postalCodes.find({'state':'Gujarat'},{'_id':0,
'city':1,'state':1,'pincode':1},limit=10,sort=[('city',
pymongo.ASCENDING)])
15. PrinttheprecedingcursorusingtheshowResultfunctionthatisalreadydefined.16. Torestrictafulltablescanonthecollectionbyquerieswithoutindexes,thereisa
parametercalledmax_scan,whichtakesanintegervalue.Thisvalueofthemax_scanparameterensuresthataquerydoesn’tscanmorethanthevalueprovided.Forinstance,thefollowingqueryensuresthatnomorethan50documentsarescannedtogettheresults.Again,usetheshowResultsfunctiontodisplaytheresultsinthecursor:
>>>showResults(db.postalCodes.find({'state':'AndhraPradesh'},
max_scan=50))
Howitworks…Let’stakealookatwhatwedidinthisrecipe.Westartedbyupdatingthedocumentsinacollectioninstep1.Theupdate,however,updatedonlythefirstmatchingdocumentbydefaultandtherestofthematchingdocumentswerenotupdated.Instep2,weaddedaparametercalledmultiwithavalueTruetoupdatemultipledocumentsaspartofthesameupdateoperation.Notethatallthesedocumentsarenotupdatedatomicallyaspartofonetransaction.IfwelookattheupdatedonefromthePythonshell,wewillseeastrikingresemblancetowhatwewouldhavedonefromtheMongoshell.Ifwewanttonametheargumentsoftheupdateoperation,thenamesoftheparameterarecalledspecanddocument,whichareforthedocumentprovidedasaquerytobeusedtoselectthedocumentsandtoupdatedocumentsrespectively.Forinstance,thefollowingupdateoperationisvalid:
>>>db.pymongoTest.update(spec={'i':{'$gt':10}},document={'$set':
{'gtTen':True}})
Therearesomemoreargumentsthatanupdatefunctiontakes,withmostofthemcarryingthesamemeaningastheinsertfunctionwesawinthepreviousrecipe.Theseparametersarew,wtimeout,j,fsync,andcheck_keys.Refertothepreviousrecipefortheexplanationgivenfortheseparametersusedwiththeinsertfunction.
Instep6,wedidanupsert(updateplusinsert).AllwehadwasanadditionalupsertparameterwiththevalueasTrue.However,whatexactlyhappensinthecaseofanupsert?Mongotriestoupdatethedocumentthatmatchestheprovidedcondition;ifitfindsone,thiswillhavebeenaregularupdate.However,inthiscase(upsertinstep6),thedocumentwasnotfound.Theserverinsertedthedocumentgivenasspec(thefirstparameter)inthecollectionandthenappliedanupdateoperationonitwithboththeseoperationstakingplaceatomically.
Insteps7and8,wesawtheremoveoperation.Thefirstvariantacceptedaqueryandallthematchingdocumentswereremoved.Thesecondvariant,instep8,acceptedoneinteger,whichisthevalueofthe_idfieldtobedeleted.Thisvariantisusefulwheneverweplantodeletebythe_idfield’svalue.Similartoupdate,theremovefunctiontooacceptsotherparametersforthewriteconcern.Thew,wtimeout,j,andfsyncparametershavemeaningssimilartowhatwediscussedinthepreviousrecipewhenweinsertedthedocuments.Refertothepreviousrecipeforadetaileddescriptionoftheseparameters.Thecalltotheremovemethodonthecollectionwithoutanyparameterwillremoveallthedocumentsinthecollection.
Insteps10to12,weexecutedthefindandmodifyoperations.Informationontheseoperationsisprovidedintheprevioussection.Whatwedidn’tseeisthatthisoperationcanalsobeusedtofindandremovedocumentsfromthecollection.AnadditionalparametercalledremoveneedstobeaddedwiththevalueasTrue.Inthefollowingoperation,wewillremovethedocumentwith_idequals31andreturnthedocumentbeforedeletingit:
>>>db.pymongoTest.find_and_modify(query={'_id':31},remove=True)
Notethat,withtheremoveoptionprovided,theparameternamednewisnotsupported,asthereisnothingtoreturnafterthedocumentisdeleted.
Alltheoperationswesawinthisrecipewerefortheclientsconnectedtoastandaloneinstance.If,however,youareconnectedtoareplicaset,theclientisinstantiatedinadifferentway.Also,weareawareofthefactthat,bydefault,wearenotallowedtoquerythesecondarynodesfordata.Weneedtoexplicitlyexecuters.slaveOk()fromtheMongoshellconnectedtoasecondarynodetoqueryit.ThisisdoneinasimilarwayfromaPythonclientaswell.Ifweareconnectedtoasecondarynode,wecannotqueryitbydefault,butthewayinwhichwespecifythatweareoktoqueryonasecondarynodeisslightlydifferent.Thereisaparametercalledslave_okaytoletusqueryfromthesecondarynodewhosevalueisFalsebydefault;ifthevalueisTrue,thequerywillgothroughsuccessfullyandreturnresultsfromasecondarynode.IftheparameterisnotsettoTrue,queryingthesecondarynodewillthrowanexceptionthatstatesthatthenodequeriedisnotamaster.Forinstance,ifourclientisconnectedtoasecondaryinstanceandwewanttoqueryitbasedonthenameofthestate,wewillexecutethefollowingquery:
>>>cursor=db.postalCodes.find({'state':'Maharashtra'},slave_ok=True)
Wewillgetthecursorfortheresultssuccessfullyifthecollectiondoesindeedhavedocumentswiththenameofthestate,Maharashtra.
Anotherparameterthatisbetterleftuntouchedandhasasensibledefaultiscalledtimeout,anditsvaluebydefaultisTrue.NotethatthisvalueisnotanumberforsomesortoftimeoutbutaBooleanvalue.IfthevalueisTrue,thecursoropenedbyaqueryontheserverwillbeauto-closedafter10minutesofinactivityonit.Let’ssay,itisasortofagarbagecollectionoftheserver-sideresources.However,ifthisissettoFalse,itisnolongertheresponsibilityoftheservertocleanitup,buttheresponsibilityoftheclienttocloseit.
Anotherparametercalledtailableisusedtodenotethatthecursorreturnedbyfindisatailablecursor.Explainingwhattailablecursorsareandgivingmoredetailsisnotinthescopeofthisrecipe;thisisexplainedintheCreatingandtailingcappedcollectioncursorsinMongoDBrecipeinChapter5,AdvancedOperations.
Sofarintherecipe,weconnectedtoasinglenodeusingpymongo.MongoClient.However,wecannotusethesameclasstoconnecttoareplicasetbecauseofthefollowingreasons:
WewilljustbeconnectedtooneinstanceToallowustoperformwriteoperations,wewillhavetoconnecttotheprimaryinstanceIftheprimaryinstancegoesdown,therehastobeanautomaticfailovertothenewprimaryinstance
Therefore,toconnecttoareplicasetandaddresstheprecedingthreepoints,wewillusepymongo.MongoReplicaSetClient.Thefollowingisthewayinwhichwecaninitiatetheclient:
>>>client=pymongo.MongoReplicaSetClient('mongodb://localhost:27000',
replicaSet='replSetTest')
>>>
Aswecansee,wejustprovidedonehostfromthereplicasetandthenameofthereplicasetweusedwhenstartingit.Theclientwillautomaticallydiscovertheremaininghostsfromthereplicasetconfiguration.Thehostname(s)thatweprovidedisknownastheseedlist,usingwhichwecanprovidemultipleinstancesinthereplicaset.Thenameoftheparameterthatgivesthehostnamesishosts_or_uri.
However,whataboutreadpreferencesandhowdowespecifythem?Therearesomemoreparametersthatwewillneedtolookatwhileinitiatingtheclient.
>>>frompymongo.read_preferencesimportReadPreference
>>>frompymongoimportMongoReplicaSetClient
>>>client=MongoReplicaSetClient('mongodb://localhost:27000',
replicaSet='replSetTest',read_preference=ReadPreference.NEAREST)
>>>client.read_preference
4
TheprecedingstepsinitializedareplicasetclientwithareadpreferenceNEAREST.Thereisanadditionalparameter,secondary_acceptable_latency_ms,whichgivesthetimeinmilliseconds.Now,thistimewillbeusedbytheclienttoconsideramemberofthereplicasetasacontenderforselectionwhenthereadpreferenceNEARESTisspecified.Aminimumlatencyisfirstcomputedforallthereplicasetinstancesfromthedriver,andalltheinstanceswithalatencynomorethantheprovidedvaluewillbeaddedtothecontenderinstances’listforselectionasthenearestinstancetothedriver.Therewasafairlylongdiscussiononthisbehaviorinthereadpreferencerecipe,andsomecodesnippetsfromaJavaclientwereusedtoexplaintheinternals.Thedefaultvalueforthisparameteris15milliseconds.
Asweknow,readpreferencecanbeprovidedattheclientlevel,atthedatabaselevelthatgetsinheritedfromtheclient,andalsoatthecursorlevel.Bydefault,read_preferenceforaclientinitializedwithoutanexplicitreadpreferenceisPRIMARY(withthevalue0).However,ifwenowgetthedatabaseobjectfromtheclientinitializedearlier,thereadpreferencewillbeNEAREST(withthevalue4).
>>>db=client.test
>>>db.read_preference
4
>>>
Settingthereadpreferenceisassimpleasexecutingthefollowingcommand:
>>>db.read_preference=ReadPreference.PRIMARY_PREFERRED
Again,asthereadpreferencegetsinheritedfromtheclienttothedatabaseobject,itgetsinheritedfromthedatabaseobjecttothecollectionobject,anditwillbeusedasthedefaultvalueforallthequeriesexecutedagainstthatcollection,unlessreadpreferenceisspecifiedexplicitlyinthefindoperation.
Thus,db.pymongoTest.find()willhaveacursor,whichusesthereadpreferenceasPRIMARY_PREFERRED(wejustsetitearliertoPRIMARY_PREFERREDatthedatabase-objectlevel)whereasdb.pymongoTest.find(read_preference=ReadPreference.NEAREST)will
usethereadpreferenceasNEAREST.
WewillnowwrapupthebasicoperationsfromaPythondriverbytryingtodosomecommonoperationsthatwedofromtheMongoshell,suchasgettingallthedatabasenames,gettingalistofcollectionsinadatabase,andcreatinganindexonacollection.
Fromtheshell,wewillexecuteshowdbstoshowallthedatabasenamesintheMongoinstancethatisconnected.FromthePythonclient,wewillexecutethefollowingcommandontheclientinstance:
>>>client.database_names()
[u'local',u'test']
Similarly,toseethelistofcollections,wewilltypeshowcollectionsintheMongoshell.InPython,allthatwewilldoonthedatabaseobjectisasfollows:
>>>db.collection_names()
[u'system.indexes',u'writeConcernTest',u'pymongoTest']
Now,forindexoperations,wewillfirstseewhatindexesarepresentinthepymongoTestcollection.ExecutethefollowingcommandfromthePythonshelltoviewtheindexesonacollection:
>>>db.pymongoTest.index_information()
{u'_id_':{u'key':[(u'_id',1)],u'v':1}}
Wenowwillcreateanindexonkeyx,whichissortedinascendingorderonthepymongoTestcollectionasfollows:
>>>db.pymongoTest.ensure_index([('x',pymongo.ASCENDING)])
u'x_1'
Wecanagainlisttheindexesasfollowstoconfirmthecreationoftheindex:
>>>db.pymongoTest.index_information()
{u'_id_':{u'key':[(u'_id',1)],u'v':1},u'x_1':{u'key':[(u'x',1)],
u'v':1}}
Wecanseethattheindexgotcreated.Generallyspeaking,theformatoftheensure_indexmethodisasfollows:
>>>db.<collectionname>.ensure_index([(<fieldname1>,<orderoffield1>)
….(<fieldnamen>,<orderoffieldn>)])
AggregationinMongousingPyMongoWealreadysawPyMongousingPython’sclientinterfaceforMongoDBintheExecutingqueryandinsertoperationsusingPyMongoandExecutingupdateanddeleteoperationsusingPyMongorecipes.Inthisrecipe,wewillusethepostalcodecollectionandrunanaggregationexampleusingPyMongo.TheintentionofthisrecipeisnottoexplainaggregationbuttoshowhowaggregationcanbeimplementedusingPyMongo.Inthisrecipe,wewillaggregatethedatabasedonthestatenamesandgetthetopfivestatenamesbythenumberofdocumentstheyappearin.Wewillmakeuseofthe$project,$group,$sort,and$limitoperatorsfortheprocess.
GettingreadyToexecutetheaggregationoperations,weneedtohaveaserverupandrunning.Asimplesinglenodeiswhatwewillneed.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Thedataonwhichwewilloperateneedstobeimportedinthedatabase.ThestepstoimportthedataaregivenintheCreatingtestdatarecipeinChapter2,Command-lineOperationsandIndexes.PythonandPyMongoareexpectedtobeinstalled.LookattheInstallingPyMongorecipetoknowhowtoinstallPyMongoforyourhostoperatingsystem.SincethisisawaytoimplementaggregationinPython,thatthereaderisexpectedtobeawareoftheaggregationframeworkonMongoDB.
Howtodoit…Let’stakealookatthestepsindetail:
1. OpenthePythonterminalbytypingthefollowingcommand:
$python
2. OncethePythonshellopens,importPyMongoasfollows:
>>>importpymongo
3. CreateaninstanceofMongoClientasfollows:
>>>client=pymongo.MongoClient('mongodb://localhost:27017')
4. Getthetestdatabase’sobjectasfollows:
>>>db=client.test
5. Now,wewillexecutetheaggregationoperationonthepostalCodescollectionasfollows:
result=db.postalCodes.aggregate(
[
{'$project':{'state':1,'_id':0}},
{'$group':{'_id':'$state','count':{'$sum':1}}},
{'$sort':{'count':-1}},
{'$limit':5}
]
)
6. Typethefollowingcommandtoviewtheresults:
>>>result['result']
Howitworks…Thestepsareprettystraightforward.Weconnectedtothedatabasethatrunsonthelocalhostandcreatedadatabaseobject.Theaggregationoperationweinvokedonthecollectionusingtheaggregatefunctionisverysimilartohowwewillinvokeaggregationfromtheshell.Theobjectinthereturnvalue,result,isanobjectoftypedict;ithastwokeysofinterest.Oneofthekeysiscalledok,whosevaluewillbe1iftheaggregationoperationexecutedsuccessfully.Theotherkeyiscalledresultanditstypeisalist.Inourcase,itwillcontainfivedocumentsthatcontainthenameofthestateandthecountofthenumberoftheiroccurrences.
MapReduceinMongousingPyMongoInthepreviousrecipe,wesawhowtoexecuteaggregationoperationsinMongousingPyMongo.Inthisrecipe,wewillworkonthesameusecaseaswedidfortheaggregationoperation,butusingMapReduce.Theintentistoaggregatethedatabasedonstatenamesandgetthetopfivestatenamesbythenumberofdocumentstheyappearin.
ProgramminglanguagedriversprovideuswithaninterfacetoletusinvokeMapReducejobswritteninJavaScriptontheserver.
GettingreadyToexecutetheMapReduceoperations,weneedtohaveaserverupandrunning.Asimplesinglenodeiswhatwewillneed.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Thedataonwhichwewillbeoperatingneedstobeimportedinthedatabase.ThestepstoimportthedataaregivenintheCreatingtestdatarecipeinChapter2,Command-lineOperationsandIndexes.Pythonisexpectedtobeinstalledonthehostoperatingsystem,andPyMongoalsoneedstobeinstalled.TakealookattheInstallingPyMongorecipetoknowhowtoinstallPyMongoforyourhostoperatingsystem.
Howtodoit…Let’stakealookatthestepsindetail:
1. OpenthePythonterminalbytypinginthefollowingcommand:
>>>python
2. OncethePythonshellopens,importthebsonpackageasfollows:
>>>importbson
3. Now,importthepymongopackageasfollows:
>>>importpymongo
4. CreateaninstanceofMongoClientasfollows:
>>>client=pymongo.MongoClient('mongodb://localhost:27017')
5. Getthetestdatabase’sobjectasfollows:
>>>db=client.test
6. Writethemapperfunctionasfollows:
>>>map=bson.Code('''function(){emit(this.state,1)}''')
7. Writethereducefunctionasfollows:
>>>reduce=bson.Code('''function(key,values){return
Array.sum(values)}''')
8. InvokeMapReduceasfollows(notethattheresultwillbesenttothepymr_outcollection):
>>>db.postalCodes.map_reduce(map=map,reduce=reduce,out='pymr_out')
9. Verifytheresultasfollows:
>>>c=db.pymr_out.find(sort=[('value',pymongo.DESCENDING)],limit=5)
>>>foreleminc:
...printelem
...
{u'_id':u'Maharashtra',u'value':6446.0}
{u'_id':u'Kerala',u'value':4684.0}
{u'_id':u'TamilNadu',u'value':3784.0}
{u'_id':u'AndhraPradesh',u'value':3550.0}
{u'_id':u'Karnataka',u'value':3204.0}
>>>
Howitworks…ApartfromtheregularimportforPyMongo,hereweimportedthebsonpackagetoo.ThisiswherewehavetheCodeclassthatweuseforwritingtheJavaScriptmapandreducefunctions.ItisinstantiatedbypassingtheJavaScriptfunctionbodyasaconstructorargument.
OncetwoinstancesoftheCodeclassareinstantiated,oneformapandoneforreduce,allweneedtodoisinvokethemap_reducefunctiononthecollection.Inthiscase,wepassedthreeparameters:thetwoCodeinstancesforthemapandreducefunctionswithparameternamesmapandreduce,respectively,andonestringvalue,usedtoprovidethenameoftheoutputcollectionintowhichtheresultsarewritten.
Wewon’tbeexplainingtheMapReduceJavaScriptfunctionhere,butitisprettysimple;allitdoesisemitkeysasthenamesofthestatesandvalues,whichisthenumberoftimestheparticularstatenameoccurs.Thisresultingdocumentwiththekeyused,thestate’snameasthe_idfield,andanotherfieldcalledvalue,whichisthesumofthetimestheparticularstate’snamegiveninthe_idfieldappearedinthecollection,areaddedtotheoutputcollection,pymr_outinthiscase.Forexample,intheentirecollection,thestate,Maharashtra,appeared6446times.Thus,thedocumentforthestateofMaharashtrais{u'_id':u'Maharashtra',u'value':6446.0}.Toconfirmwhetherthisisatruevalueornot,youcanexecutethefollowingqueryfromtheMongoshellandseethattheresultisindeed6446:
>db.postalCodes.count({state:'Maharashtra'})
6446
Wearestillnotdoneastherequirementistofindthetopfivestatesbytheiroccurrenceinthecollection.Westillhavejustthestatesandtheiroccurrences,sothefinalstepistosortthedocumentsbythevaluefield,whichisthenumberoftimesthestate’snameoccurredinthedescendingorder,andlimittheresulttofivedocuments.
SeealsoChapter8,IntegrationwithHadoop,fordifferentrecipesonexecutingMapReducejobsonMongoDBusingtheHadoopconnector,whichallowsustowritethemapandreducefunctionsinlanguagessuchasJavaandPython
ExecutingqueryandinsertoperationsusingaJavaclientInthisrecipe,wewilllookatexecutingthequeryandinsertoperationsusingaJavaclientforMongoDB.UnlikethePythonprogramminglanguage,Javacodesnippetscannotbeexecutedfromaninteractiveinterpreter.Thus,wewillhavesomeunittestcasesalreadyimplemented;theirrelevantcodesnippetswillbeshownandexplained.
GettingreadyForthisrecipe,wewillstartastandaloneinstance.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.
Thenextstepistodownloadthemongo-cookbook-javadriverJavaprojectfromthebook’swebsite.ThisrecipeusesaJUnittestcasetotestvariousfeaturesoftheJavaclient.Inthiswholeprocess,wewillmakeuseofsomeofthemostcommonAPIcallsand,thus,learntousethem.
Howtodoit…Toexecutethetestcase,onecaneitherimporttheprojectinanIDEsuchasEclipseandexecutethetestcase,orexecutethetestcasefromthecommandpromptusingMaven.
Thetestcasewearegoingtoexecuteforthisrecipeiscom.packtpub.mongo.cookbook.MongoDriverQueryAndInsertTest.
IfyouareusinganIDE,openthistestclassandexecuteitasaJUnittestcase.IfyouareplanningtouseMaventoexecutethistestcase,gotothecommandprompt,changethedirectorytotherootoftheproject,andexecutethefollowingcommandtoexecutethissingletestcase:
$mvn-Dtest=com.packtpub.mongo.cookbook.MongoDriverQueryAndInsertTesttest
Everythingshouldexecutefine,andthetestcaseshouldsucceediftheJavaSDKandMavenareproperlysetupandtheMongoDBserverisupandrunningandlisteningtoport27017forincomingconnections.
Howitworks…WewillnowopenthetestclassweexecutedandseesomeoftheimportantAPIcallsinthetestmethod.Thesuperclassofourtestclassiscom.packtpub.mongo.cookbook.AbstractMongoTest.
WewillstartbylookingatthegetClientmethodinthisclass.Theclientinstancethatgetscreatedisaninstanceoftypecom.mongodb.MongoClient.Thereareseveraloverloadedconstructorsforthisclass;however,wewillusethefollowingconstructortoinstantiatetheclient:
MongoClientclient=newMongoClient("localhost:27017");
AnothermethodtolookatisgetJavaDriverTestDatabaseinthesameabstractclassthatgetsusthedatabaseinstance.Thisinstanceissynonymoustotheimplicitvariabledbintheshell.Here,inJava,thisclassisaninstanceoftypecom.mongodb.DB.WewillgetaninstanceofthisDBclassbyinvokingthegetDB()methodontheclientinstance.Inourcase,wewanttheDBinstanceforthejavaDriverTestdatabase,whichisasfollows:
getClient().getDB("javaDriverTest");
Oncewegettheinstanceofcom.mongodb.DB,wewilluseittogettheinstanceofcom.mongodb.DBCollection,whichwillbeusedtoperformvariousoperations,findandinsertinourcase,onthecollection.ThegetJavaTestCollectionmethodintheabstracttestclassreturnsoneinstanceofDBCollection.WewillgetaninstanceofDBCollectionclassforthejavaTestcollectionbyinvokingthegetCollection()methodoncom.mongodb.DBasfollows:
getJavaDriverTestDatabase().getCollection("javaTest")
OncewegetaninstanceofDBCollection,wewillbereadytoperformoperationsonit.Inthescopeofthisrecipe,itislimitedtofindandinsertoperations.
Now,wewillopenthemaintestcaseclasscom.packtpub.mongo.cookbook.MongoDriverQueryAndInsertTest.OpenthisclassinanIDEortexteditor.Wewilllookatthemethodsofthisclass.ThefirstmethodthatwewilllookatisfindOneDocument.Here,thelineofourinterestiscollection.findOne(newBasicDBObject("_id",3));thisqueriesthedocumentwiththevalueof_idas3
Thismethodreturnsaninstanceofcom.mongodb.DBObject,whichisakey-valuemapthatreturnsthefieldsofadocumentasakeyandthevalueasthevalueofthiscorrespondingkey.Forinstance,togetthevalueof_idfromthereturnedDBObjectinstance,wewillinvokeresult.get("_id")onthereturnedresult.
OurnextmethodtoinspectisgetDocumentsFromTestCollection.Thistestcaseexecutesafindoperationonthecollectionandgetsallthedocumentsinit.Thecollection.find()callexecutesthefindoperationontheDBCollection’sinstance.Thereturnvalueofthefindoperationiscom.mongodb.DBCursor.Animportantpointtonoteisthatinvokingthefindoperationdoesn’titselfexecutethequerybutjustreturnstheDBCursor’sinstance.Thisisaninexpensiveoperationanddoesn’tconsumeserver-side
resources.TheactualquerygetsexecutedontheserversideonlywhenthehasNextornextmethodisinvokedontheDBCursorinstance.ThehasNext()methodisusedtocheckiftherearemoreresults,andthenext()methodisusedtonavigatetothenextDBObjectintheresult.AnexampleusageoftheDBCursorinstancereturnedtoiteratethroughtheresultsisasfollows:
while(cursor.hasNext()){
DBObjectobject=cursor.next();
//Someoperationonthereturnedobjecttogetthefieldsand
//valuesinthedocument
}
Wewillnowlookattwomethods:withLimitAndSkipandwithQueryProjectionAndSort.Thesemethodsshowushowtosort,limitthenumberofresults,andskipthenumberofinitialresults.Aswecanseeinthefollowingcodesnippet,themethodssort,limit,skip,andchaintoeachother.
DBCursorcursor=collection
.find(null)
.sort(newBasicDBObject("_id",-1))
.limit(2)
.skip(1);
AllthesemethodsreturnaninstanceofDBCursoritself;thisallowsustochainthecalls.ThesemethodsaredefinedintheDBCursorclass,whichchangescertainstatesaccordingtotheoperationtheyperformintheinstanceandhasreturnthisattheendofthemethodtoreturnthesameinstance.
RememberthattheactualoperationisinvokedontheserveronlyuponinvokingthehasNextornextmethodonDBCursor.Invokinganymethodsuchassort,limit,andskipaftertheexecutionofthequeryontheserverwillthrowjava.lang.IllegalStateException.
Weusedtwovariantsofthefindmethod:onethatacceptsoneparameterforthequerytobeexecutedandanotheronethathastwoparameters;thefirstoneisforthequeryandthesecondoneisanotherDBObject,whichisusedfortheprojectionthatwillreturnonlyaselectedsetoffieldsfromthedocumentintheresult.
Thefollowingquery,forinstance,fromthewithQueryProjectionAndSortmethodofthetestcase,selectsallthedocuments,asthefirstargumentisnull,andthereturnedDBCursorinstancewillhavedocumentsthatcontainjustonefieldcalledvalue:
DBCursorcursor=collection
.find(null,newBasicDBObject("value",1).append("_id",0))
.sort(newBasicDBObject("_id",1));
The_idfieldistobeexplicitlysetto0;otherwise,itwillbereturnedbydefault.
Finally,wewilllookattwomoremethodsinthetestcase:insertDataTestandinsertTestDataWithWriteConcern.Wewilluseacoupleofvariantsoftheinsertmethodinthesetwomethods.AlltheinsertmethodsareinvokedontheDBCollectioninstanceandreturnthecom.mongodb.WriteResultinstance.Theresultcanbeusedtoget
theerrorthatoccurredduringthewriteoperationbyinvokingthegetLastError()method,togetthenumberofdocumentsinsertedusingthegetN()method,andgetthewriteconcernoftheoperationamongthesmallnumberofoperations.RefertotheJavadocoftheMongoDBAPIathttps://api.mongodb.org/java/current/formoredetailsonthemethods.Thetwoinsertoperationsthatweperformedareasfollows:
collection.insert(newBasicDBObject("value","HelloWorld"));
collection.insert(newBasicDBObject("value","HelloWorld"),
WriteConcern.JOURNALED);
BothoftheseacceptaDBObjectinstanceforthedocumenttobeinsertedasthefirstparameter.Thesecondmethodallowsustoprovidethewriteconcerntobeusedforthewriteoperation.ThereareinsertmethodsintheDBCollectionclassthatallowsbulkinserttoo.RefertotheJavadocathttps://api.mongodb.org/java/current/formoredetailsonvariousoverloadedversionsoftheinsertmethod.
ExecutingupdateanddeleteoperationsusingaJavaclientInthepreviousrecipe,wesawhowtoexecutethefindandinsertoperationsinMongoDBusingaJavaclient.Inthisrecipe,wewillseehowtheupdateanddeleteoperationsworkfromaJavaclient.
GettingreadyForthisrecipe,wewillstartastandaloneinstance.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.
ThenextstepistodownloadtheJavaprojectmongo-cookbook-javadriverfromthebook’swebsite.ThisrecipeusesaJUnittestcasetotestoutvariousfeaturesoftheJavaclient.Inthiswholeprocess,wewillmakeuseofsomeofthemostcommonAPIcallsand,thus,learnhowtousethem.
Howtodoit…Toexecutethetestcase,onecaneitherimporttheprojectinanIDEsuchasEclipseandexecutethetestcaseorexecutethetestcasefromthecommandpromptusingMaven.
Thetestcasewearegoingtoexecuteforthisrecipeiscom.packtpub.mongo.cookbook.MongoDriverUpdateAndDeleteTest.
IfyouareusinganIDE,openthistestclassandexecuteitasaJUnittestcase.IfyouplantouseMaventoexecutethistestcase,gotothecommandprompt,changethedirectorytotherootoftheproject,andexecutethefollowingcommandtoexecutethissingletestcase:
$mvn-Dtest=com.packtpub.mongo.cookbook.MongoDriverUpdateAndDeleteTest
test
EverythingshouldexecutefineiftheJavaSDKandMavenareproperlysetupandtheMongoDBserverisupandrunningandlisteningtoport27017forincomingconnections.
Howitworks…WewillcreatetestdatafortherecipesusingthesetupUpdateTestData()method.Here,wewillsimplyputdocumentsinthejavaTestcollectioninthejavaDriverTestdatabase.Wewilladd20documentsinthiscollectionwiththevalueofirangingfrom1to20.Thistestdataisusedindifferenttestcasemethodstocreatetestdata.
Let’snowtakealookatthemethodsinthisclass.WewillfirstlookatbasicUpdateTest().Inthismethod,wewillfirstcreatetestdataandthenexecutethefollowingupdatemethod:
collection.update(
newBasicDBObject("i",newBasicDBObject("$gt",10)),
newBasicDBObject("$set",newBasicDBObject("gtTen",true)));
Theupdatemethodheretakestwoarguments:thefirstoneisthequerythatwillbeusedtoselecttheeligibledocumentsfortheupdate,andthesecondparameteristheactualupdate.ThefirstparameterlooksconfusingduetonestedBasicDBObjectinstances;however,itisthe{'i':{'$gt':10}}condition,andthesecondparameteristhe{'$set':{'gtTen':true}}update.Theresultoftheupdateisaninstanceofcom.mongodb.WriteResult.TheinstanceofWriteResulttellsusaboutthenumberofdocumentsthatgotupdated,theerrorthatoccurredwhileexecutingthewriteoperation,andthewriteconcernusedfortheupdate.RefertotheJavadocsoftheWriteConcernclassformoredetails.Thisupdate,bydefault,onlyupdatesthefirstmatchingdocument,andonlyifmultipledocumentsmatchthequery.
ThenextmethodthatwewilllookatismultiUpdateTest,whichwillupdateallthematchingdocumentsforthegivenqueryinsteadofthefirstmatchingdocument.ThemethodweusedonthecollectioninstanceisupdateMulti.TheupdateMultimethodisjustaconvenientmethodtoupdatemultipledocuments.Thefollowingisthecallthatwewillmaketoupdatemultipledocuments:
collection.updateMulti(newBasicDBObject("i",
newBasicDBObject("$gt",10)),
newBasicDBObject("$set",newBasicDBObject("gtTen",true)));
Thenextoperationthatwewillperformistoremovedocuments.ThetestcasemethodtoremovedocumentsisdeleteTest().Thedocumentsareremovedasfollows:
collection.remove(newBasicDBObject(
"i",newBasicDBObject("$gt",10)),
WriteConcern.JOURNALED);
Wehavetwoparametershere.Thefirstoneisthequeryforwhichmatchingdocumentswillberemovedfromthecollection.Notethatallthematchingdocumentswillberemovedbydefault,unlikeinupdate,whereonlythefirstmatchingdocumentwillberemovedbydefault.Thesecondparameteristhewriteconcerntobeusedfortheremoveoperation.
Notethat,whentheserverisstartedona32-bitmachine,journalingisdisabledbydefault.
Usingthejournalingwriteconcernonsuchmachinescausestheoperationtofailwiththefollowingexception:
com.mongodb.CommandFailureException:{"serverUsed":
"localhost/127.0.0.1:27017","connectionId":5,"n":0,"badGLE":{
"getlasterror":1,"j":true},"ok":0.0,"errmsg":"cannotuse'j'
optionwhenahostdoesnothavejournalingenabled","code":2}
Thiswillneedtheservertobestartedwiththe--journaloption.On64-bitmachines,thisisnotnecessary,asjournalingisenabledbydefault.
WewilllookatthefindAndModifyoperationnext.ThetestcasemethodtoperformthisoperationisfindAndModifyTest.Thefollowinglinesofcodeareusedtoperformthisoperation:
DBObjectold=collection.findAndModify(
newBasicDBObject("i",10),
newBasicDBObject("i",100));
Theoperationisthequerythatwillfindthematchingdocumentsandthenupdatethem.ThereturntypeoftheoperationisaninstanceofDBObjectbeforetheupdateisapplied.OneimportantfeatureofthefindAndModifyoperationisthatthefindandupdateoperationsareperformedatomically.
TheprecedingmethodisasimpleversionofthefindAndModifyoperation.Thereisanoverloadedversionofthismethodwiththefollowingsignature:
DBObjectfindAndModify(DBObjectquery,DBObjectfields,DBObjectsort
,booleanremove,DBObjectupdate,booleanreturnNew,booleanupsert)
Let’sseewhattheseparametersareinthefollowingtable:
Parameter Description
queryThefindandmodifyoperationshavetofindandmodifythedocuments.Thisvalueofthisparameteristhequerythatisusedtoquerythedocumentsthatwouldbelatermodified.
fieldsThefindmethodsupportsaprojectionofthefieldsthatitneedstobeselectedintheresultdocument(s).Theparameterheredoesthesamejobofselectingafixedsetoffieldsfromtheresultingdocument.
sort
Ifyouhaven’tnoticedalready,letmetellyouthatthesortmethodcanperformanatomicoperationononlyonedocumentandalsoreturnsonedocument.Thissortfunctioncanbeusedincaseswherethequeryselectsmultipledocuments,andonlythefirstgetschosenfortheoperation.Thesortoperationisappliedontheresultbeforepickingupthefirstdocumenttoupdateit.
removeThisisaBooleanflagthatindicateswhethertoremoveorupdatethedocument.Ifthisvalueistrue,thedocumentwillberemoved.
update
This,unliketheremoveattribute,isnotaBooleanvaluebutaDBObjectinstancethatwilltellwhattheupdateneedstobe.NotethattheremovedBooleanflaggetsprecedenceoverthisparameter.Iftheremoveattributeistrue,theupdatewillnothappenevenifoneisprovided.
returnNew
Thefindoperationreturnsadocument,butwhichone?Theonebeforetheupdatewasexecutedortheoneaftertheupdategetsexecuted?WhenthisBooleanflagisgivenastrue,itreturnsthedocumentaftertheupdateisexecuted.
upsertThisisaBooleanflagagainthat,whentrue,executestheupsertoperation.Itisrelevantonlywhentheintendedoperationisupdate.
Therearemoreoverloadedmethodsofthisoperation.RefertotheJavadocsofcom.mongodb.DBCollectionformoremethods.ThefindAndModifymethodweusedultimatelyinvokesthemethodwediscussedearlierwiththefieldsandsortparametersasnullandtheremainingremove,returnNew,andupsertparametersbeingfalse.
Finally,wewilllookatquerybuildersupportinMongoDBJavaAPI.
AllthequeriesinMongoareDBObjectinstanceswithpossiblymorenestedDBObjectinstancesinthem.Thingsaresimpleforsmallqueries,buttheystartgettinguglyformorecomplicatedqueries.Considerarelativelysimplequerywherewewanttoqueryfordocumentswithi>10andi<15.TheMongoqueryforthisis{$and:[{i:{$gt:10}},{i:{$lt:15}}]}.WritingthisinJavausingBasicDBObjectinstancesispainful,anditlooksasfollows:
DBObjectquery=newBasicDBObject("$and",
newBasicDBObject[]{
newBasicDBObject("i",newBasicDBObject("$gt",10)),
newBasicDBObject("i",newBasicDBObject("$lt",15))
}
);
Thankfully,however,thereisaclasscalledcom.mongodb.QueryBuilderthatisautilityclasstobuildcomplexqueries.Theprecedingqueryisbuiltusingquerybuilderasfollows:
DBObjectquery=
QueryBuilder.start("i").greaterThan(10).and("i").lessThan(15).get();
Thisislesserror-pronewhenwritingaqueryandiseasytoreadaswell.Therearealotofmethodsinthecom.mongodb.QueryBuilderclass,andIwouldlikeyoutogothroughtheJavadocsofthisclass.Thebasicideaistostartconstructionusingthestartmethodandthekey.Wewillthenchainthemethod’scallstoadddifferentconditionsand,whentheadditionofvariousconditionsisdone,thequeryisconstructedusingtheget()method,whichreturnsDBObject.RefertothequeryBuilderSamplemethodinthetestclassforasampleusageofquerybuildersupportofMongoDBJavaAPI.
SeealsoChapter5,AdvancedOperations,toknowsomemoreoperationsusingGridFSandgeospatialindexesandhowtousethemfromtheJavaapplicationwithasmallsampleTheJavadocsforthecurrentversionoftheMongoDBdriverathttps://api.mongodb.org/java/current/
AggregationinMongousingaJavaclientTheintentionofthisrecipeisnottoexplainaggregationbuttoshowhowaggregationcanbeimplementedusingaJavaclientfromaJavaprogram.Inthisrecipe,wewillaggregatethedatabasedonthestatenamesandgetthetopfivestatenamesbythenumberofdocumentstheyappearin.Wewillmakeuseofthe$project,$group,$sort,and$limitoperatorsfortheprocess.
GettingreadyThetestclassusedforthisrecipeiscom.packtpub.mongo.cookbook.MongoAggregationTest.Toexecutetheaggregationoperations,weneedtohaveaserverupandrunning.Asimplesinglenodeiswhatwewillneed.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Thedataonwhichwewilloperateneedstobeimportedinthedatabase.ThestepstoimportthedataaregivenintheCreatingtestdatarecipeinChapter2,Command-lineOperationsandIndexes.Thenextstepistodownloadthemongo-cookbook-javadriverJavaprojectfromthebook’swebsite.ThoughMavencanbeusedtoexecutethetestcase,itisconvenienttoimporttheprojectinanIDEandexecutethetestcaseclass.ItisassumedthatyouarefamiliarwiththeJavaprograminglanguageandcomfortableusingtheIDEintowhichtheprojectwillbeimported.
Howtodoit…Toexecutethetestcase,onecaneitherimporttheprojectinanIDEsuchasEclipseandexecutethetestcaseorexecutethetestcasefromthecommandpromptusingMaven.
IfyouareusinganIDE,openthetestclassandexecuteitasaJUnittestcase.IfyouplantouseMaventoexecutethistestcase,gotothecommandprompt,changethedirectorytotherootoftheproject,andexecutethefollowingcommandtoexecutethissingletestcase:
$mvn-Dtest=com.packtpub.mongo.cookbook.MongoAggregationTesttest
EverythingshouldexecutefineiftheJavaSDKandMavenareproperlysetupandtheMongoDBserverisupandrunningandlisteningtoport27017forincomingconnections.
Howitworks…ThemethodusedtolookataggregationfunctionalityisaggregationTest()inourtestclass.TheaggregationoperationisperformedonMongoDBfromaJavaclientusingtheaggregate()methoddefinedintheDBCollectionclass.Themethodhasthefollowingsignature:
AggregationOutputaggregate(firstOp,additionalOps)
Onlythefirstargumentismandatory;thisformsthefirstoperationinthepipeline.Thesecondargumentisavaragrsargument(avariablenumberofargumentswithzeroormorevalues)thatallowsmorepipelineoperators.Alltheseargumentsareoftypecom.mongodb.DBObject.Ifanyexceptionoccursduringtheexecutionoftheaggregationcommand,theaggregationoperationwillthrowcom.mongodb.MongoExceptionwiththecauseoftheexception.
Thereturntypecom.mongodb.AggregationOutputisusedtogettheresultoftheaggregationoperation.Fromadeveloper’sperspective,wearemoreinterestedintheresultsfieldofthisinstance,whichcanbeaccessedusingtheresults()methodofthereturnedobject.Theresults()methodreturnsanobjectoftypeIterable<DBObject>,whichonecaniteratetogettheresultsoftheaggregation.
Let’slookathowweimplementedtheaggregationpipelineinourtestclass:
AggregationOutputoutput=collection.aggregate(
//{'$project':{'state':1,'_id':0}},
newBasicDBObject("$project",newBasicDBObject("state",1).append("_id",
0)),
//{'$group':{'_id':'$state','count':{'$sum':1}}}
newBasicDBObject("$group",newBasicDBObject("_id","$state")
.append("count",newBasicDBObject("$sum",1))),
//{'$sort':{'count':-1}}
newBasicDBObject("$sort",newBasicDBObject("count",-1)),
//{'$limit':5}
newBasicDBObject("$limit",5)
);
Therearefouroperationsinthepipelineinthefollowingorder.A$projectoperation,followedby$group,$sort,andthen$limit.
Thelasttwooperationslookinefficient;usingthem,wewillsorteverythingbutthenjusttakethetopfiveelements.TheMongoDBserverinsuchscenariosisintelligentenoughtoconsiderthelimitoperationwhilesorting;asaresultofthis,onlythetopfiveresultsneedtobemaintainedratherthansortingalltheresults.
ForVersion2.6ofMongoDB,theaggregationresultcanreturnacursor.Thoughtheprecedingcodesnippetisstillvalid,theAggregationResultobjectisnolongertheonlywaytogettheresultsoftheoperation,butwecanusecom.mongodb.Cursortoiteratetheresults.Also,theprecedingformatisnowdeprecatedinfavoroftheformatthatacceptsalistofpipelineoperatorsratherthanvarargsfortheoperatorstobeused.RefertotheJavadocsofthecom.mongodb.DBCollectionclassandlookforvariousoverloaded
MapReduceinMongousingaJavaclientInthepreviousrecipe,wesawhowtoexecuteaggregationoperationsinMongousingtheJavaclient.Inthisrecipe,wewillworkonthesameusecaseaswedidfortheaggregationoperation,butusingMapReduce.Theintentistoaggregatethedatabasedonthestatenamesandgetthetopfivestatenamesbythenumberofdocumentstheyappearin.
IfyouarenotawareofhowtowriteMapReducecodeforMongofromaprogramminglanguageclientandareseeingitforthefirsttime,youmightbesurprisedtoseehowitisactuallydone.Youmighthaveimaginedthatyouwillbewritingthemapandreducefunctionsintheprogramminglanguageinwhichyouarewritingthecode,Javainthiscase,andthenusingittoexecuteMapReduce.However,youneedtobearinmindthatMapReducejobsrunontheMongoservers,andtheyexecuteJavaScriptfunctions.Hence,irrespectiveoftheprogramminglanguagedriver,theMapReducefunctionsarewritteninJavaScript.TheprogramminglanguagedriversjustactasameansoflettingusinvokeandexecutetheMapReducefunctions(writteninJavaScript)ontheserver.
GettingreadyThetestclassusedforthisrecipeiscom.packtpub.mongo.cookbook.MongoMapReduceTest.ToexecutetheMapReduceoperations,weneedtohaveaserverupandrunning.Asimplesinglenodeiswhatwewillneed.RefertotheSinglenodeinstallationofMongoDBinChapter1,InstallingandStartingtheMongoDBServer,tolearnhowtostarttheserver.Thedataonwhichwewilloperateneedstobeimportedinthedatabase.ThestepstoimportthedataaregivenintheCreatingtestdatarecipeinChapter2,Command-lineOperationsandIndexes.Thenextstepistodownloadthemongo-cookbook-javadriverJavaprojectfromthebook’swebsite.ThoughMavencanbeusedtoexecutethetestcase,itisconvenienttoimporttheprojectinanIDEandexecutethetestcaseclass.ItisassumedthatyouarefamiliarwiththeJavaprograminglanguageandcomfortableusingtheIDEtowhichtheprojectwillbeimported.
Howtodoit…Toexecutethetestcase,onecaneitherimporttheprojectinanIDEsuchasEclipseandexecutethetestcaseorexecutethetestcasefromthecommandpromptusingMaven.
IfyouareusinganIDE,openthetestclassandexecuteitasaJUnittestcase.IfyouplantouseMaventoexecutethistestcase,gotothecommandprompt,changethedirectorytotherootoftheproject,andexecutethefollowingcommandtoexecutethissingletestcase:
$mvn-Dtest=com.packtpub.mongo.cookbook.MongoMapReduceTesttest
EverythingshouldexecutefineiftheJavaSDKandMavenareproperlysetupandtheMongoDBserverisupandrunningandlisteningtoport27017forincomingconnections.
Howitworks…ThetestcasemethodforourMapReducetestismapReduceTest().
MapReduceoperationscanbedoneinMongofromaJavaclientusingthemapReduce()methoddefinedintheDBCollectionclass.Therearealotofoverloadedversions,andyoumightrefertoJavadocsofthecom.mongodb.DBCollectionclassformoredetailsonthevariousflavorsofthismethod,buttheoneweusedisasfollows:
collection.mapReduce(mapper,reducer,outputcollection,query)
Themethodacceptsfourparameters:
Thefirstoneisthemapperfunction,whichisoftypestringandisaJavaScriptcodethatwillbeexecutedontheMongodatabaseserverThesecondoneisthereducerfunction,whichisoftypestringandisaJavaScriptcodethatwillbeexecutedontheMongodatabaseserverThethirdoneisthenameofthecollectiontowhichtheoutputoftheMapReduceexecutionwillbewrittenFinally,itisthequerythatwillbeexecutedbytheserver,andtheresultofthisquerywillbetheinputtotheMapReducejobexecution
SincetheassumptionisthatthereaderiswellversedinMapReduceoperationsfromtheshell,wewon’texplaintheMapReduceJavaScriptfunctionsthatwehaveinthetestcasemethod.However,itisprettysimple,andallitdoesisemitkeysasthenamesofthestatesandvalues,whichisthenumberoftimestheparticularstatenameoccurs.Thisresultingdocumentwiththekeyused,thestate’snameasthe_idfield,andanotherfieldcalledvalue,whichisthesumofthetimestheparticularstate’snamegiveninthe_idfieldappearedinthecollection,areaddedtotheoutputcollection,javaMROutputinthiscase.Forexample,intheentirecollection,thestate,Maharashtra,appeared6446times.Thus,thedocumentforthestateofMaharashtrais{'_id':'Maharashtra','value':6446}.Toconfirmwhetherthisisthetruevalueornot,youcanexecutethefollowingqueryfromtheMongoshellandseethattheresultisindeed6446:
>db.postalCodes.count({state:'Maharashtra'})
6446
Wearestillnotdoneastherequirementistofindthetopfivestatesbytheiroccurrenceinthecollection.Westillhavejustthestatesandtheiroccurrences,sothefinalstepistosortthedocumentsbythevaluefield,whichisthenumberoftimesthestate’snameoccurredindescendingorder,andlimittheresulttofivedocuments.
SeealsoChapter8,IntegrationwithHadoop,fordifferentrecipesonexecutingMapReducejobsonMongoDBusingtheHadoopconnector.ThisallowsustowritethemapandreducefunctionsinlanguagessuchasJavaandPython.
Chapter4.AdministrationInthischapter,wewillcoverthefollowingrecipesrelatedtoMongoDBadministration:
RenamingacollectionViewingcollectionstatsViewingdatabasestatsDisablingthepreallocationofdatafilesManuallypaddingadocumentUnderstandingthemongostatandmongotoputilitiesEstimatingtheworkingsetViewingandkillingthecurrentlyexecutingoperationsUsingprofilertoprofileoperationsSettingupusersinMongoDBUnderstandinginterprocesssecurityinMongoDBModifyingcollectionbehaviorusingthecollModcommandSettingupMongoDBasaWindowsServiceConfiguringareplicasetSteppingdownasaprimaryinstancefromthereplicasetExploringthelocaldatabaseofareplicasetUnderstandingandanalyzingoplogsBuildingtaggedreplicasetsConfiguringthedefaultshardfornonshardedcollectionsManuallysplittingandmigratingchunksPerformingdomain-drivenshardingusingtagsExploringtheconfigdatabaseinashardedsetup
RenamingacollectionHaveyouevercomeacrossascenariowhereyouhavenamedatableinarelationaldatabase,andatalaterpointoftime,feltthatthenamecouldhavebeenbetter?Orperhaps,theorganizationyouworkforwaslateinrealizingthatthetablenamesarereallygettingmessyandwantstoenforcesomestandardsonthenames?Relationaldatabasesdohavesomeproprietarywaystorenamethetables,andadatabaseadmincandothatforyou.
Thisraisesaquestionthough.IntheMongoworld,wherecollectionsaresynonymouswithtables,isthereawaytorenameacollectionafteritiscreated?Inthisrecipe,wewillexplorethisfeatureofMongo,wherewerenameanexistingcollectionwithsomedatainit.
GettingreadyRunningaMongoDBinstanceiswhatwewillneedforperformingthiscollectionrenamingexperiment.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,forhowtostarttheserver.TheoperationswewillbeperformingwouldbefromtheMongoshell.
Howtodoit…Let’stakealookatthestepsindetail:
1. Oncetheserverisstarted,andassumingitislisteningforclientconnectionsonthedefaultport27017,executethefollowingcommandtoconnecttoitfromtheshell:
>mongo
2. Onceconnected,usingthedefaulttestdatabase,letuscreateacollectionwithsometestdata.ThecollectionwewillbeusingisnamedsloppyNamedCollection:
>for(i=0;i<10;i++)db.sloppyNamedCollection.insert({'i':i})
3. Thetestdatawillnowbecreated(wemayverifythedatabyqueryingthesloppyNamedCollectioncollection).
4. RenamethecollectionasneatNamedCollectionusingthefollowingcommand:
>db.sloppyNamedCollection.renameCollection('neatNamedCollection')
{"ok":1}
5. VerifythatthesloppyNamedCollectioncollectionisnolongerpresent,byexecutingthefollowingcommand:
>showcollections
6. Finally,querytheneatNamedCollectioncollectiontoverifythatthedatathatwasoriginallyinsloppyNamedCollectionisindeedpresentinit.SimplyexecutethefollowingcommandontheMongoshell:
>db.neatNamedCollection.find()
Howitworks…Renamingacollectionisprettysimple.ItisaccomplishedwiththerenameCollectionmethod,whichtakestwoarguments.Generally,thefunctionsignatureisasfollows:
>db.<collectiontorename>.renameCollection('<targetnameofthe
collection>',<droptargetifexists>)
Thefirstargumentisthenamebywhichthecollectionistoberenamed.
Thesecondparameterthatwedidn’tuseisaBooleanvaluethattellsthecommandwhethertodropthetargetcollection(ifitexists)ornot.Thisvaluedefaultstofalse,whichmeansthetargetmustnotbedroppedbutmustgiveanerrorinstead.Thisisasensibledefault,elsetheresultswouldbeghastlyifweaccidentallygaveacollectionnamethatexistsanddidn’twishtodropit.Ifhowever,youknowwhatyouaredoingandwantthetargettobedroppedwhilerenamingthecollection,passthesecondparameterastrue.ThenameofthisparameterisdropTarget.Inourcase,thecallwouldhavebeen:
>db.sloppyNamedCollection.renameCollection('neatNamedCollection',true)
Now,asanexercise,trycreatingsloppyNamedCollectionagainandrenameitwithoutthesecondparameter(orfalseasthevalue).YoushouldseeMongocomplainingthatthetargetnamespaceexists.Then,renameitagainwiththesecondparameterastrue.Thistime,therenamingoperationexecutessuccessfully.
Notethattherenameoperationwillkeeptheoriginalandthenewlyrenamedcollectioninthesamedatabase.ThisrenameCollectionmethodisnotenoughtomove/renamethecollectionacrossanotherdatabase.Insuchcases,weneedtoruntherenameCollectioncommandasfollows:
>db.runCommand({renameCollection:"<source_namespace>",to:"
<target_namespace>",dropTarget:<true|false>});
Inourcase,supposewewanttomovesloppyNamedCollectionfromthetestdatabasetonewDatabase,renameitneatNamedCollection,anddropthetargetdatabaseifitexists;wewillexecutethefollowingcommand:
>db.runCommand({renameCollection:"test.sloppyNamedCollection",to:
"newDatabase.neatNamedCollection",dropTarget:true});
Also,notethattherenamecollectionoperationdoesn’tworkonshardedcollections.
ViewingcollectionstatsWhenitcomestotheusageofstorage,oneoftheinterestingstatisticsfromanadministrativepointofviewisperhapsthenumberofdocumentsinacollection,possiblytoestimatefuturespaceandmemoryrequirementsbasedonthegrowthofthedatatogethigh-levelstatisticsofthecollection.
GettingreadyTofindthestatsofthecollection,weneedtohaveaserverupandrunning,andasinglenodeshouldbeok.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,forhowtostarttheserver.Thedataonwhichwewillbeoperatingneedstobeimportedintothedatabase.ThestepstoimportthedataaregivenintheCreatingtestdatarecipeinChapter2,Command-lineOperationsandIndexes.Oncethesestepsarecompleted,weareallsettogoaheadwiththisrecipe.
Howtodoit…WewillbeusingthepostalCodescollectiontoviewthestats.Let’stakealookatthestepsindetail:
1. OpentheMongoshellandconnectittotherunningMongoDBinstance.Inthiscase,startMongoonthedefaultport27017andexecutethefollowingcommand:
$mongo
2. Withthedataimported,createanindexinthepincodefield,ifonedoesn’texist,asfollows:
>db.postalCodes.ensureIndex({'pincode':1})
3. OntheMongoterminal,executethefollowingcommand:
>db.postalCodes.stats()
4. Observetheoutput.Nowexecutethefollowingcommandontheshell:
>db.postalCodes.stats(1024)
{
"ns":"test.postalCodes",
"count":39732,
"size":5561,
"avgObjSize":0.1399627504278667,
"storageSize":16380,
"numExtents":6,
"nindexes":2,
"lastExtentSize":12288,
"paddingFactor":1,
"systemFlags":1,
"userFlags":0,
"totalIndexSize":2243,
"indexSizes":{
"_id_":1261,
"pincode_1":982
},
"ok":1
}
Again,observetheoutput.Wewillnowseewhatthesevaluesmeantousinthenextsection.
Howitworks…Ifweobservetheoutputforthedb.postalCodes.stats()anddb.postalCodes.stats(1024)commands,weseethatthesecondonehasallthefiguresinKBwhereasthefirstoneisinbytes.Theparameterprovidedisknownasscaleandallthefiguresindicatingsizearedividedbythisscale.Inthiscase,aswegavethevalueas1024,wegetallthevaluesinKB;whereasif1024*1024ispassedasthevalueofthescale,thesizeshownwillbeinMB.Forouranalysis,wewillusetheonethatshowsthesizesinKB:
>db.postalCodes.stats(1024)
{
"ns":"test.postalCodes",
"count":39732,
"size":5561,"avgObjSize":0.1399627504278667,
"storageSize":16380,
"numExtents":6,
"nindexes":2,
"lastExtentSize":12288,
"paddingFactor":1,
"systemFlags":1,
"userFlags":0,
"totalIndexSize":2243,
"indexSizes":{
"_id_":1261,
"pincode_1":982
},
"ok":1
}
Thefollowingtableshowsthemeaningoftheimportantfields:
Field Description
ns Thisisthefullyqualifiednameofthecollectionwiththe<database>.<collectionname>format.
count Thisisthenumberofdocumentsinthecollection.
size
Thisistheactualstoragesizeoccupiedbythedocumentsinthecollection.Adding,deleting,orupdatingdocumentsinthecollectioncanchangethisfigure.Thescaleparameteraffectsthisfield’svalueandinourcase,thisvalueisinKBas1024isthescale.
avgObjSize
Thisistheaveragesizeofthedocumentinthecollection.Itissimplythesizefielddividedbythecountofdocumentsinthecollection.Thescaleparameteraffectsthisfield’svalueandinourcase,thisvalueisinKBas1024isthescale.
storageSize
Mongopreallocatesthespaceonthedisktoensurethatthedocumentsinthecollectionarekeptoncontinuouslocationstoprovidebetterperformanceindiskaccess.Thispreallocationfillsupthefileswithzerosandthenstartsallocatingspacetotheseinserteddocuments.Thisfieldrevealsthesizeofthestorageusedbythiscollection.Thisfigurewillgenerallybemuchmorethantheactualsizeofthecollection.Thescaleparameteraffectsthisfield’svalueandinourcase,thisvalueisinKBas1024isthescale.
Aswesaw,Mongopreallocatescontinuousdiskspacetothecollectionsforperformancepurposes.
numExtents However,asthecollectiongrows,newspaceneedstobeallocated.Thisfieldgivesthenumberofsuchcontinuouschunkallocation.Thiscontinuouschunkiscalledextent.
nindexesThisfieldgivesthenumberofindexespresentinthecollection.Thisvaluewouldbe1,evenifwedonotcreateanindexonthecollection,asMongoimplicitlycreatesanindexonthe_idfield.
lastExtentSizeThisisthesizeofthelastextentallocated.Thescaleparameteraffectsthisfield’svalueandinourcase,thisvalueisinKBas1024isthescale.
paddingFactor
Wecanlookatthisfactorasamultipliertotheactualdocumentsizeinordertocomputethestoragesize.Forexample,ifthedocumenttobeinsertedis2KB,withapaddingFactorfieldof1,thesizeallocatedtothedocumentis2KB;thatis,withnopadding.Ontheotherhand,ifthepaddingFactorfieldis1.5,thespaceallocatedtothedocumentwillbe3KB(2*1.5),whichgivesapaddingof1KB.Inourcase,thepaddingFactorfieldis1becausewedidamongoimport.Wewilldiscusspaddingandpaddingfactorinthenextsection.
totalIndexSizeIndexestakeupspacetostore.Thisfieldgivesthetotalsizetakenupbytheindexesonthedisk.Thescaleparameteraffectsthisfield’svalueandinourcase,thisvalueisinKBas1024isthescale.
indexSizes
Thisfielditselfisadocument,withthekeyasthenameoftheindexandthevalueasthesizeoftheindexinquestion.Inourcase,wehadcreatedanindexexplicitlyonthepincodefield.Thus,weseethenameoftheindexasthekeyandthesizeoftheindexondiskasthevalue.Thetotalofthesevaluesofalltheindexesisthesameasthevaluegivenearlier,thatis,totalIndexSize.Thescaleparameteraffectsthisfield’svalueandinourcase,thisvalueisinKBas1024isthescale.
Let’stakealookatthepaddingFactorfield.Documentsareplacedonthestoragedeviceincontinuouslocations.If,however,anupdateoccursthatcausesthesizeofthedocumenttoincrease,Mongoobviouslywillnotbeabletoincreasethedocumentsizeiftherewasnobufferspacekeptafterthedocument.Theonlysolutionistocopytheentiredocumenttowardstheendofthecollectionwiththenecessaryupdatesmadetoit.Thisoperationturnsouttobeexpensive,affectingtheperformanceofsuchupdateoperations.IfthepaddingFactorfieldis1,nopaddingorbufferspaceiskeptbetweentwoconsecutivedocuments,makingitimpossibleforthefirstofthesetwodocumentstogrowonupdates.IfthispaddingFactorfieldismorethan1,therewouldbesomebufferspaceaccommodatingsomesmallsizechangesforthedocuments.ThispaddingFactorfield,however,isnotsetbytheuserandMongoDBcalculatesitforthecollectionoveraperiodoftime.ItthenusesthiscalculatedpaddingFactorfieldtoallocateaspaceforthenewdocumentsinserted.Togetafeelofhowthispaddingfactorchanges,letusdoasmallexercise:
1. ExecutethefollowingcommandintheMongoshell:
>for(i=0;i<10;i++){
db.paddingFactorTest.insert({value:'HelloWorld'})
}
2. NowexecutethefollowingcommandandtakenoteofthepaddingFactorvalue(itwouldbe1):
>db.paddingFactorTest.stats()
3. Wewillnowmakesomeupdatestoletthedocumentgrowinsizeasfollows:
>for(i=0;i<5;i++){
db.paddingFactorTest.update({value:'HelloWorld'},{$push:
{value1:'Value'}},false,true)
}
>db.paddingFactorTest.stats()
QuerythestatsagainandobservethevalueofpaddingFactorthathasgoneslightlyover1,whichshowsthattheMongoDBserveradjustedthisvaluewhileallocatingspaceforadocumentinsertionatalaterpointintime.
WesawhowpaddingFactoraffectsthestorageallocatedtoadocument,butneitherdowehavecontrolonthisvalue,norcanweinstructMongobeforehandonwhatadditionalbufferneedstobeallocatedtoeachdocumentinsertedbasedontheanticipatedgrowthofadocument.Thereis,however,atechniquethatletusachievesthisinawaythatwewillseeintheManuallypaddingadocumentrecipe.
ViewingdatabasestatsInthepreviousrecipe,wesawhowtoviewsomeimportantstatisticsofacollectionfromanadministrativeperspective.Inthisrecipe,we’llgetanevenclearerpicture;gettingthose(ormostofthose)statisticsatthedatabaselevel.
GettingreadyTofindthestatsofthedatabase,weneedtohaveaserverupandrunning,andasinglenodeshouldbeok.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,forhowtostarttheserver.Thedataonwhichwewouldbeoperatingneedstobeimportedintothedatabase.ThestepstoimportthedataaregivenintheCreatingtestdatarecipeinChapter2,Command-lineOperationsandIndexes.Oncethesestepsarecompleted,weareallsettogoaheadwiththisrecipe.Refertothepreviousrecipe,Viewingcollectionstats,ifyouneedtoseehowtoviewstatsatthecollectionlevel.
Howtodoit…Wewillbeusingthetestdatabaseforthepurposeofthisrecipe.ItalreadyhasthepostalCodescollectioninit.Let’stakealookatthestepsindetail:
1. ConnecttotheserverusingtheMongoshellbytypinginthefollowingcommandfromtheoperatingsystemterminal(itisassumedthattheserverislisteningtoport27017):
$mongo
2. Ontheshell,executethefollowingcommandandobservetheoutput:
>db.stats()
3. Now,executethefollowingcommand,butthistimewiththescaleparameter(observetheoutput):
>db.stats(1024)
{
"db":"test",
"collections":3,
"objects":39738,
"avgObjSize":143.32699179626553,
"dataSize":5562,
"storageSize":16388,
"numExtents":8,
"indexes":2,
"indexSize":2243,
"fileSize":196608,
"nsSizeMB":16,
"dataFileVersion":{
"major":4,
"minor":5
},
"ok":1
}
Howitworks…Letusstartbylookingatthecollectionsfield.IfyoulookcarefullyatthenumberandalsoexecutetheshowcollectionscommandontheMongoshell,youwillfindoneextracollectioninthestatsascomparedtothoseachievedbyexecutingthecommand.Thedifferenceisdenotesonecollection,whichishidden,anditsnameissystem.namespaces.Youmayexecutedb.system.namespaces.find()toviewitscontents.
Gettingbacktotheoutputofstatsoperationonthedatabase,theobjectsfieldintheresulthasaninterestingvaluetoo.IfwefindthecountofdocumentsinthepostalCodescollection,weseethatitis39732.Thecountshownhereis39738,whichmeanstherearesixmoredocuments.Thesesixdocumentscomefromthesystem.namespacesandsystem.indexescollection.Executingacountqueryonthesetwocollectionswillconfirmit.Notethatthetestdatabasedoesn’tcontainanyothercollectionapartfrompostalCodes.Thefigureswillchangeifthedatabasecontainsmorecollectionswithdocumentsinit.
Thescaleparameter,whichisaparametertothestatsfunction,dividesthenumberofbyteswiththegivenscalevalue.Inthiscase,itis1024,andhence,allthevalueswillbeinKB.Let’sanalyzetheoutput:
>db.stats(1024)
{
"db":"test",
"collections":3,
"objects":39738,
"avgObjSize":143.32699179626553,
"dataSize":5562,
"storageSize":16388,
"numExtents":8,
"indexes":2,
"indexSize":2243,
"fileSize":196608,
"nsSizeMB":16,
"dataFileVersion":{
"major":4,
"minor":5
},
"ok":1
}
Thefollowingtableshowsthemeaningoftheimportantfields:
Field Description
db Thisisthenameofthedatabasewhosestatsarebeingviewed.
collections Thisisthetotalnumberofcollectionsinthedatabase.
objects
Thisisthecountofdocumentsacrossallcollectionsinthedatabase.Ifwefindthestatsofacollectionbyexecutingdb.<collection>.stats(),wegetthecountofdocumentsinthecollection.Thisattributeisthesumofcountsofallthecollectionsinthedatabase.
avgObjectSize Thisissimplythesize(inbytes)ofalltheobjectsinallthecollectionsinthedatabase,dividedbythecountofthedocumentsacrossallthecollections.Thisvalueisnotaffectedbythescaleprovidedeventhoughthisisasizefield.
dataSizeThisisthetotalsizeofthedataheldacrossallthecollectionsinthedatabase.Thisvalueisaffectedbythescaleprovided.
storageSizeThisisthetotalamountofstorageallocatedtocollectionsinthisdatabaseforstoringdocuments.Thisvalueisaffectedbythescaleprovided.
numExtentsThisisthecountofalltheextentsinthedatabaseacrossallthecollections.ThisisbasicallythesumofnumExtentsinthecollectionstatsforcollectionsinthisdatabase.
indexes Thisisthesumofindexesacrossallcollectionsinthedatabase.
indexSizeThisisthesize(inbytes)foralltheindexesofallthecollectionsinthedatabase.Thisvalueisaffectedbythescaleprovided.
fileSize
Thisissimplytheadditionofthesizeofallthedatabasefilesyoushouldfindonthefilesystemforthisdatabase.Thefileswillbenamedtest.0,test.1,andsoonforthetestdatabase.Thisvalueisaffectedbythescaleprovided.
nsSizeMB ThisisthesizeofthefileinMBsforthe.nsfileofthedatabase.
AnotherthingtonoteisthevalueofavgObjectSize,andthereissomethingweirdinthisvalue.Unlikethisveryfieldinthecollection’sstats,whichisaffectedbythevalueofthescaleprovided,indatabasestatsthisvalueisalwaysinbytes,whichisprettyconfusingandonecannotreallybesurewhythisisnotscaledaccordingtotheprovidedscale.
SeealsoInstantMongoDB,PacktPublishing(https://www.packtpub.com/big-data-and-business-intelligence/instant-mongodb-instant)
DisablingthepreallocationofdatafilesDatafilesarepreallocatedinMongoandfilledwithzerosevenbeforedataisinsertedintocollectionstopreventdiskfragmentation.Thesedatafilesareallocatedstartingfrom64MBforthefirst,128MBforthesecond,256MBforthethird,andsoon,tillamaximumsizeof2GBafterwhichallfileswouldbe2GB.Thoughthesepreallocateddatafilespreventdiskfragmentation,prepopulatingrequirestimeandconsumesdiskspace.However,preallocatingsuchfilesjustwhenthedataisinsertedcantakeasignificantamountoftimeandthus,Mongopreallocatesanadditionalfileinthebackgroundandkeepsanadditionaldatafileready.If,however,thispreallocationisnotdesiredfor,say,atestdatabase,wherequickstartupisdesiredandlessdiskspaceconsumptionismoreimportant,thispreallocationcanbedisabled.This,however,shouldnotbedoneonproductionsystems.
Howtodoit…Whenstartingaserver,wecanstarttheMongoDBserverwiththe--nopreallocflagtodisablethispreallocation.Forinstance,aserverstartedtolistenonthedefaultportwithpreallocationdisabledwillbestartedasfollows:
$mongod--noprealloc
ManuallypaddingadocumentWithoutgettingtoomuchintotheinternalsofstorage,MongoDBusesmemory-mappedfiles,whichmeansthatthedataisstoredinfilesexactlyasitwouldbeinmemory;itwilluselow-levelOSservicestomapthesepagestomemory.ThedocumentsarestoredincontinuouslocationsinMongodatafilesandtheproblemariseswhenthedocumentgrowsandnolongerfitsinthespace.Insuchscenarios,Mongorewritesthedocumenttowardstheendofthecollectionwiththeupdateddataandclearsupthespacewhereitwasoriginallyplaced(notethatthisspaceisnotreleasedtotheOSasfreespace).
Thisisnotabigproblemforapplications,whichdon’texpectthedocumentstogrowinsize;however,thisisabigperformancehitforthosewhoforeseethisgrowthinthedocumentsizeoveraperiodoftimeandpotentially,alotofsuchdocumentmovements.ThepaddingFactorfield,thatwesawintheViewingcollectionstatsrecipe,getsupdatedoveraperiodoftime,tosomeextent,andallocatessomebufferforthedocumenttogrow.However,thisisonlyoveraperiodoftimeoncealotofdocumentshavealreadybeenmovedacrossthecollectionandtheMongoDBserveradjuststhepaddingsize.Moreover,atthetimeofwriting,thispaddingfactorcannotbesetinanywayforthecollectionbeforehand,basedonyouranticipatedincreaseinthesizeofthedocument,tocounterthisdocument’srewritesbyMongo,andissettoadefaultvalueof1.However,thereisasmalltrickthatdoesletyoudothis,andthatiswhatwewillseeinthisrecipe.Thisisacommonlyusedpracticeforsuchrequirements.
GettingreadyNothingisparticularlyneededforthisrecipe,unlessyouplantotryoutthissimpletechnique;inwhichcase,youwouldneedasingleinstanceupandrunning.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,forhowtostarttheserver.
Howtodoit…Theideaofthistechniqueistoaddsomedummydatatothedocumentthatistobeinserted.Thisdummydata’ssize,inadditiontootherdatainthedocument,isapproximatelythesameastheanticipatedsizeofthedocument.
Forexample,iftheaveragesizeofthedocumentisestimatedtobearound1200bytesoveraperiodoftime,andthereis300bytesofdatapresentinthedocumentwhileinsertingit,wewilladdadummyfieldthatisaround900bytesinsize,sothatthetotaldocumentsizesumsupto1200bytes.
Oncethedocumentisinserted,weunsetthisdummyfield,whichleavesaholeinthefilebetweenthetwoconsecutivedocuments.Thisemptyspacewillthenbeusedwhenthedocumentgrowsoveraperiodoftime,minimizingthedocument’smovements.Thisisnotafoolproofmethod,asanydocumentgrowingbeyondtheanticipatedaveragegrowthwillhavetobecopiedbytheservertotheendofthecollection.Also,documentsnotgrowingtotheanticipatedsizetendtowastediskspace.
Theapplicationscancomeupwithanintelligentstrategyto,perhaps,adjustthesizeofthepaddingfieldbasedonafieldinthedocumenttotakecareoftheseshortcomings.However,thisissomethingthatisuptotheapplicationdevelopers.
Letusnowseeasampleofthisapproach:
1. WedefineasmallfunctionthatwilladdafieldcalledpadFieldwithanarrayofstringvaluestothedocumentasfollows:
functionpadDocument(doc){
doc.padField=[]
for(i=0;i<20;i++){
doc.padField[i]='Dummy'
}
}
ItwilladdanarraycalledpadFieldandastringcalledDummy20times.Thereisnorestrictiononwhattypeyouaddtothedocumentandhowmanytimesitisaddedaslongasitconsumesthespaceyoudesire.Theprecedingcodesnippetisjustasample.
2. Thenextstepistoinsertadocument.Wewilldefineanotherfunctioncalledinsertinthefollowingmanner:
functioninsert(collection,doc){
//1.PadthedocumentwithpadField
padDocument(doc);
//2.Createorstorethe_idfieldthatwouldbeusedlater
if(typeof(doc._id)=='undefined'){
_id=ObjectId()
doc._id=_id
}
else{
_id=doc._id
}
//3.Insertthedocumentwiththepaddedfield
collection.insert(doc)
//4.Removethepaddedfield.Usethesaved_idtofindthedocument
tobeupdated.
collection.update({'_id':_id},{$unset:{'padField':1}})
}
3. WewillnowputthisintoactionbyinsertingadocumentinthetestColcollectioninthefollowingmanner:
insert(db.testCol,{i:1})
4. YoumayquerythetestColcollectionusingthefollowingqueryandcheckwhethertheinserteddocumentexistsornot:
>db.testCol.findOne({i:1})
Notethatonquerying,youwouldnotfindpadFieldinthetestColcollection.However,thespaceonceoccupiedbythearraystaysbetweenthesubsequentlyinserteddocumentsevenifthefieldwasunset.
Howitworks…Theinsertfunctionisself-explanatoryandhascommentsinittotellyouwhatitdoes.Anobviousquestionis,howcanwebesurethisisindeedwhatweintendedtodo?Forthispurpose,weshalldoasmallactivityasfollows.WewillworkonamanualPadTestcollectionforthispurpose.FromtheMongoshell,executethefollowingcommands:
>db.manualPadTest.drop()
>db.manualPadTest.insert({i:1})
>db.manualPadTest.insert({i:2})
>db.manualPadTest.stats()
TakenoteoftheavgObjSizefieldinthestats.Next,executethefollowingcommandsfromtheMongoshell:
>db.manualPadTest.drop()
>insert(db.manualPadTest,{i:1})
>insert(db.manualPadTest,{i:2})
>db.manualPadTest.stats()
TakenoteoftheavgObjSizefieldinthestats.Thisfigureismuchlargerthantheonewesawearlierinaregularinsertwithoutpadding.ThepaddingFactorfield,asweseeinbothcases,stillis1,butthelattercasehasmorebufferforthedocumenttogrow.
Onecatchintheinsertfunctionweusedinthisrecipeisthattheinsertintothecollectionandtheupdatedocumentoperationsarenotatomic.
UnderstandingthemongostatandmongotoputilitiesMostofyoumightfindthesenamessimilartotwopopularUnixcommands,iostatandtop.ForMongoDB,mongostatandmongotoparetwoutilitiesthatdoprettymuchthesamejobasthetwoUnixcommands,andthereisnoprizeforguessingthattheseareusedtomonitortheMongoinstance.
GettingreadyInthisrecipe,wewillbesimulatingsomeoperationsonastandaloneMongoinstancebyrunningascriptthatwillattempttokeepyourserverbusy;then,inanotherterminal,wewillberunningtheseutilitiestomonitorthedbinstance.
Youneedtostartastandaloneserverlisteningtoanyportforclientconnections;inthiscase,wewillsticktothedefault27017.Incaseyouarenotawareofhowtostartastandaloneserver,refertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer.WealsoneedtodownloadtheKeepServerBusy.jsscriptfromthebook’swebsiteandkeepithandyforexecutiononthelocaldrive.Also,itisassumedthatthebindirectoryofyourMongoinstallationispresentinthepathvariableofyouroperatingsystem.Ifnot,thenthesecommandsneedtobeexecutedwiththeabsolutepathoftheexecutablefromtheshell.ThemongostatandmongotoputilitiescomeasstandardwiththeMongoinstallation.
Howtodoit…Let’stakealookatthestepsindetail:
1. StarttheMongoDBserver.Letitlistentothedefaultportforconnections.2. Inaseparateterminal,executetheKeepServerBusy.jsJavaScriptasfollows:
$mongoKeepServerBusy.js--quiet
3. OpenanewOSterminalandexecutethefollowingcommand:
$mongostat
4. CapturetheoutputcontentforsometimeandthenhitCtrl+Ctostopthecommandfromcapturingmorestats.Keeptheterminalopenorcopythestatstoanotherfile.
5. Nowexecutethefollowingcommandfromtheterminal:
$mongotop
6. CapturetheoutputcontentforsometimeandthenhitCtrl+Ctostopthecommandfromcapturingmorestats.Keeptheterminalopenorcopythestatstoanotherfile.
7. HitCtrl+Cintheshell,wheretheKeepServerBusy.jsJavaScriptwasexecuted,tostoptheoperationthatkeepstheserverbusy.
Howitworks…Letusseewhatwehavecapturedfromthesetwoutilities.Westartbyanalyzingmongostat.Onmylaptop,theoutputofthe$mongostatcommandisasfollows:
$mongostat
connectedto:127.0.0.1
insertqueryupdatedeletegetmorecommandflushesmappedvsizeres
faultslockeddb
idxmiss%qr|qwar|awnetInnetOut conntime
100011000179411|00320m808m
54m37test:85.7%00|00|1271k94k
223:24:30
200011326120611|00320m808m
54m113test:83.3%00|00|1339k51k
223:24:31
10001952100011|00320m808m
54m28test:84.4%00|00|1219k51k
223:24:32
771722100011|00320m808m
54m87test:73.0%00|00|1131k51k
223:24:33
9231100079211|00320m808m
54m42 test:83.3%00|00|0206k
51k223:24:34
10001100093411|00320m808m
54m150test:84.6%00|00|1220k51k
223:24:35
10001100092011|00320m808m
54m13test:84.9%00|00|1219k51k
223:24:36
YoumaychoosetolookatwhattheKeepServerBusy.jsscriptisdoingtokeeptheserverbusy.Allitdoesisinsert1000documentsinthemonitoringTestcollection;updatethemonebyonetosetanewkeyinthem;executeafindanditeratethroughallofthem;andfinally,deletethemonebyone.Basically,itisawrite-intensiveoperation.
Theoutputdoeslookuglywiththecontentwrapping,butletusanalyzethefieldsonebyoneandseewhattolookoutfor.Thefollowingtablegivesadescriptionofeachcolumn:
Column(s) Description
insert,query,update,anddelete
Thesearethefirstfourcolumnsindicatingthenumberofinsert,query,update,anddeleteoperationspersecond.Itispersecondasthetimeframeinwhichthesefiguresarecapturedisseparatedby1second,whichisindicatedbythelastcolumn.
getmore
Thisisusedwhenthecursorrunsoutofdataforthequery;itexecutesagetmoreoperationontheservertogetmoreresultsforthequeryexecutedearlier.Thiscolumnshowsthenumberofgetmoreoperationsexecutedinthisgiventimeframeof1second.Inourcase,notmanygetmoreoperationsareexecuted.
command
Thisshowsthenumberofcommandsexecutedontheserverinthegiventimeframeof1second.Inourcase,itwasn’tmuchandwasonly1.Thenumberaftera|is0inourcaseasthiswasinthestandalonemode.Tryexecutingmongostatconnectingtoareplicasetprimaryandsecondary.Youshouldseeslightly
differentfiguresthere.
flushes Thisisthenumberoftimesdatawasflushedtothediskinanintervalof1second.
mapped,vsize,andres
MappedmemoryistheamountofmemorymappedbytheMongoprocesstothedatabase.Thistypicallywillbesameasthesizeofthedatabase.Virtualmemoryontheotherhandisthememoryallocatedtotheentiremongodprocess.Thistypicallywillbemorethantwicethesizeofmappedmemory,especiallywhenjournalingisenabled.TheresidentmemoryisthephysicalmemoryusedbyMongo.AllthesefiguresaregiveninMB.ThetotalamountofphysicalmemorymightbealotmorethanwhatisbeingusedbyMongo,butthatisnotaconcernunlessalotofpagefaultsoccur,(wesawthisintheprecedingpoint).
faults
Thesearethenumberofpagefaultsoccurringpersecond.Thesenumbersshouldbeaslowaspossible.ItindicatesthenumberoftimesMongohadtogotodisktoobtainthedocument/indexthatwasmissinginthemainmemory.ThisproblemisnotasbigaproblemwhenusingSSDforpersistentstorageasitiswhenusingspinningdiskdrives.
locked
Fromversion2.2,allwriteoperationstoacollectionlockthedatabaseinwhichthecollectionisanddonotacquireaglobal-levellock.Thisfieldshowsthedatabasethatwaslockedforthemajorityoftimeinagiventimeinterval.Inourcase,thetestdatabaseislocked.
idxmiss
%
Thisfieldgivesthenumberoftimesaparticularindexwasneededandwasnotpresentinmemory.Thiscausesapagefault,andthediskneedstobeaccessedtogettheindex.Anotherdiskaccessmightbeneededtogetthedocumentaswell.Thisfiguretooshouldbelow.Ahighpercentageofindexmissesissomethingthatwillneedattention.
qr|qw
Thesearethequeued-upreadsandwritesthatarewaitingforthechancetobeexecuted.Ifthisnumbergoesup,itshowsthatthedatabaseisgettingoverwhelmedbythevolumeofreadsandwrites,whicharemorethanitcanhandle.Thepagefaultswithmemorystatsandadatabaselockpercentagearesomeofthestatsthatneedtobeexaminedaswellifthisfigureishigh.Ifthedatasetistoolarge,shardingthecollectioncanimprovetheperformancesignificantly.
ar|awThisisthenumberofactivereadersandwriters(clients).Notsomethingtoworryaboutevenforalargenumber,aslongasotherstatswesawearlierareundercontrol.
netInandnetOut
ThisisthenetworktrafficinandoutoftheMongoDBserverinthegiventimeframe.Thefigureisinbits.Forexample,271kbitmeans271kilobits.
connThisindicatesthenumberofopenconnections.Somethingtokeepawatchontoseethisdoesn’tkeepgettinghigher.
time Thisisthetimeintervalwhenthissamplewascaptured.
Therearesomemorefieldsseenifmongostatisconnectedtoareplicasetprimaryorsecondary.Asanassignment,oncethestatsorastandaloneinstancearecollected,startareplicasetserverandexecutethesamescripttokeeptheserverbusy.Usemongostattoconnecttoprimaryandsecondaryinstancesandseeifdifferentstatsarecaptured.
Apartfrommongostat,wealsousedthemongotoputilitytocapturethestats.Letusseeitsoutputandmakesomesenseoutofit:
$mongotop
connectedto:127.0.0.1
nstotalread
write
2014-01-15T17:55:13
test.monitoringTest899ms1ms898ms
test.system.users0ms0ms0ms
test.system.namespaces0ms0ms0ms
test.system.js0ms0ms0ms
test.system.indexes0ms0ms0ms
nstotalread
write
2014-01-15T17:55:14
test.monitoringTest959ms0ms959ms
test.system.users0ms0ms0ms
test.system.namespaces0ms0ms0ms
test.system.js0ms0ms0ms
test.system.indexes0ms0ms0ms
nstotalread
write
2014-01-15T17:55:15
test.monitoringTest954ms1ms953ms
test.system.users0ms0ms0ms
test.system.namespaces0ms0ms0ms
test.system.js0ms0ms0ms
test.system.indexes0ms0ms0ms
Thereisnotmuchtolookatinthisstat.Weseethetotaltimeforwhichthedatabasewasbusyreadingorwritinginthegivensliceof1second.Thevaluegiveninthetotalwillbethesumofthereadandthewritetime.Ifweactuallycomparethemongotopandmongostatutilitiesforthesametimeslice,thepercentageoftimedurationforwhichthewritewastakingplacewillbeveryclosetothefiguregiveninthepercentagetimethedatabasewaslockedinthemongostat’soutput.
Themongotopcommandacceptsaparameteronthecommandlineasfollows:
$mongotop5
Inthiscase,theintervalafterwhichthestatswillbeprintedoutwillbe5seconds,asagainstthedefaultvalueof1second.
SeealsoTheEstimatingtheworkingsetrecipe,tolearnhowtoestimatetheworkingsetusingtheworkingsetestimatorcommandintroducedinMongo2.4TheViewingandkillingthecurrentlyexecutingoperationsrecipe,tolearnhowtogetthecurrentexecutingoperationsfromtheshellandkillthemifneededTheUsingprofilertoprofileoperationsrecipetolearnhowtousethein-builtprofilingfeatureofMongotologoperation’sexecutiontime
EstimatingtheworkingsetWestartbydefiningwhattheworkingsetis.Itisasubsetofthetotaldatafrequentlyaccessedbytheapplication.Inanapplication,whichstoresinformationoveraperiodoftime,theworkingsetismostlytherecentlyaccesseddata.Theword”recently”issubjective;forsomeitmightbeadayortwo,forothersitmightbeacoupleofmonths.Thisismostlysomethingthatneedstobethoughtofwhiledesigningtheapplicationandsizingthedatabase.TheworkingsetissomethingthatneedstobeintheRAMofthedatabaseservertominimizethepagefaultsandgettheoptimumperformance.
Inthisrecipe,wewillseeawaythatgivestheestimateofyourworkingsetandisafeatureintroducedinMongo2.4.Theword”estimator”isslightlymisleading,astheinitialsizingstillisamanualactivity,andthesystemdesignersneedtobejudiciousabouttheserverconfiguration.Theworkingsetestimatorutilitywewillseenowismoreofareactiveapproach,whichwillkickinoncetheapplicationisupandrunning.Itprovidesmetricsthatcanbeusedbymonitoringtools,andtellsusiftheRAMontheservercanaccommodatetheworkingsetorifthesetoutrunstheavailableRAM.Thisthendemandssomeresizingofthehardwareorscalingofthedatabasehorizontally.
GettingreadyInthisrecipe,wewillbesimulatingsomeoperationsonastandaloneMongoinstance.Weneedtostartastandaloneserverlisteningtoanyportforclientconnections;inthiscase,wewillsticktothedefault27017.Incaseyouarenotawareofhowtostartastandaloneserver,refertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer.ConnecttotheserverfromtheMongoshell.
Howtodoit…Theworkingsetnowisapartoftheserver’sstatusoutput.ThereisafieldcalledworkingSet,whosevalueisadocumentthatgivestheseestimates.
ThisworkingsetisnotavailableaspartofthestandardserverStatuscommandandneedstobedemandedexplicitly.Itisnotanoperationcheaponresources,andthusneedstobemonitoredifitisexecutedfrequently.Frequentinvocationscanhaveadetrimentaleffectontheperformanceoftheserver.
WeneedtorunthefollowingcommandfromtheMongoshelltogettheworkingsetestimates:
>db.runCommand({serverStatus:1,workingSet:1}).workingSet
{
"note":"thisIsAnEstimate",
"pagesInMemory":6188,
"computationTimeMicros":11524,
"overSeconds":3977
}
Howitworks…Therearejustfourfieldsinthisdocumentfortheworkingsetestimate,withthefirstjuststatingintextthatthisisanestimate.ThepagesInMemory,computationTimeMicros,andoverSecondsfieldsaresomethingwewillbemoreinterestedin.
WewilllookattheoverSecondsfieldfirst.ThisisthetimeinsecondsbetweenthefirstandthelastpageloadedbyMongointhememory.Whentheserverisstarted,thisvaluewillobviouslybelessbuteventually,withmoredatabeingaccessedwithtime,morepageswillbeloadedbyMongointhememory.IftheRAMavailableisabundant,thefirstloadedpagewillstayinmemoryandnewpageswillcontinuetoloadasandwhenneeded.Hence,thetimewillalsoincrease,asthedifferencebetweenthemostrecentlyloadedpageandtheoldestpagewillincrease.Ifthistimestayslow,orevendecreases,wecansaythattheoldestandnewestpageinMongowereloadedinjustthenumberofsecondsgivenbythisfigure.ThiscanbeanindicationthatthenumberofpagesaccessedandloadedinmemorybytheMongoDBserverismorethanthosethatcanbeheldinmemory.AsMongousestheleastrecentlyused(LRU)policytoevictapagefromthememorytomakespaceforthenewpage,wepossiblyareriskingevictingpagesthatmightbeneededagain,causingmorepagefaults.
ThisiswherethepagesInMemoryfieldcomesin.Thistellsus,overaperiodoftime,thenumberofpagesMongoloadedinthememory.Eachpagemultipliedbyaround4KBgivesthesizeofdataloadedinthememoryinbytes.Thus,ifalldataisbeingaccessedaftertheserverisstarted,thissizewillbearoundyourdatasize.Thisnumberwillkeepincreasingwithtimehencethisfield,inconjunctionwiththeoverSecondsfield,isanimportantstatistic.
Thefinalfield,computationTimeMicros,givesthetimeinmicrosecondstakenbytheservertogivethisstatisticfortheworkingset.Aswecansee,itisnotanincrediblycheapoperationtoexecuteandthus,thisstatisticshouldbedemandedwithcaution,especiallyonhigh-throughputsystems.
ViewingandkillingthecurrentlyexecutingoperationsInthisrecipe,wewillseehowtoviewthecurrentrunningoperationsandkillsomeoperationsthathavebeenrunningforalongtime.
GettingreadyInthisrecipe,wewillsimulatesomeoperationsonastandaloneMongoinstance.Weneedtostartastandaloneserverlisteningtoanyportforclientconnections;inthiscase,wewillsticktothedefault27017.Ifyouarenotawareofhowtostartastandaloneserver,refertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer.Wealsoneedtostarttwoshellsconnectedtotheserverstarted.Oneshellwillbeusedforbackgroundindexcreation,andtheotherwillbeusedtomonitorthecurrentoperationandthenkillit.
Howtodoit…Unlikeinourtestenvironment,wewillnotbeabletosimulatetheactuallong-runningoperation.We,however,willtrytocreateanindexandhopeittakesalongtimetocreate.Dependingonyourtargethardwareconfiguration,theoperationmaytakesometime.Let’sseethestepsindetail:
1. Tostartthistest,letusexecutethefollowingcommandontheMongoshell:
>db.currentOpTest.drop()
>for(i=1;i<10000000;i++){db.currentOpTest.insert({'i':i})}
Theprecedinginsertionmighttakesometimetoinsert10milliondocuments.
Oncethedocumentsareinserted,wewillexecuteanoperationthatwillcreatetheindexinthebackground.Ifyouwouldliketoknowmoreaboutindexcreation,refertotherecipeBackgroundandforegroundindexcreationfromtheshellinChapter2,Command-lineOperationsandIndexes,butitisnotaprerequisiteforthisrecipe.
2. Createabackgroundindexontheifieldinthedocument.Thisindex-creationoperationiswhatwewillbeviewingfromthecurrentOpoperationandiswhatwewillattempttokillbyusingthekilloperation.Executethefollowingcommandinoneshelltoinitiatethebackgroundindexcreationoperation:
>db.currentOpTest.ensureIndex({i:1},{background:1})
Thistakesafairlylongtime;onmylaptop,ittookwellover100seconds
3. Inthesecondshell,executethefollowingcommandtogetthecurrentexecutingoperations:
>db.currentOp().inprog
4. Takeanoteofthein-progressoperationsandfindtheoneforindexcreation.Inourcase,onthetestmachine,itwastheonlyoneinprogress.Itwillbeanoperationonsystem.indexesandtheoperationwillbeinsert.Thekeystolookoutforintheoutputdocumentarensandoprespectively.Weneedtonotethefirstfield,namelyopid,ofthisoperation.Inthiscase,itis11587458.Thesampleoutputofthecommandisgiveninthenextsection.
5. Killtheoperationfromtheshellusingopid,whichwegotearlier:
>db.killOp(11587458)
Howitworks…Wewillsplitourexplanationintotwosections,thefirstaboutthecurrentoperationdetailsandthesecondaboutkillingtheoperation.
Theindexcreationprocess,inourcase,isthelong-runningoperationthatweintendtokill.Wecreateabigcollectionwithabout10milliondocuments,andinitiateabackgroundindexcreationprocess.
Onexecutingthedb.currentOp()operation,wegetadocumentastheresult,withaninprogfieldwhosevalueisanarrayofotherdocuments,eachrepresentingacurrentlyrunningoperation.Itiscommontogetabiglistofdocumentsonabusysystem.Thefollowingisadocumenttakenfortheindexcreationoperation:
{
"opid":11587458,
"active":true,
"secs_running":31,
"op":"insert",
"ns":"test.system.indexes",
"insert":{
"v":1,
"key":{
"i":1
},
"ns":"test.currentOpTest",
"name":"i_1",
"background":1
},
"client":"127.0.0.1:50895",
"desc":"conn10",
"connectionId":10,
"locks":{
"^":"w",
"^test":"W"
},
"waitingForLock":false,
"msg":"bgindexbuildBackgroundIndexBuildProgress:2214738/10586935
20%",
"progress":{
"done":2214740,
"total":10586935
},
"numYields":3070,
"lockStats":{
"timeLockedMicros":{
"r":NumberLong(0),
"w":NumberLong(53831938)
},
"timeAcquiringMicros":{
"r":NumberLong(0),
"w":NumberLong(31387832)
}
}
}
Wewillseewhatthesefieldsmeaninthefollowingtable:
Field Description
opid ThisisauniqueoperationIDidentifyingtheoperation.ThisistheIDtobeusedtokillanoperation.
active
TheBooleanvalueindicateswhethertheoperationhasstartedornot.Itisfalseonlyifitiswaitingtoacquirethelocktoexecutetheoperation.Thevaluewillbetrueonceitstarts,evenifatapointoftimewhereithasyieldedthelockandisnotexecuting.
secs_running Thisgivesthetimetheoperationisexecutingforinseconds.
opThisindicatesthetypeoftheoperation.Inthecaseofindexcreation,itisinsertedintoasystemcollectionofindexes.Thepossiblevaluesareinsert,query,getmore,update,remove,andcommand.
nsThisisafullyqualifiednamespaceforthetarget.Itwillbeofthe<databasename>.<collectionname>form.
insert Thisshowsthedocumentthatwillbeinsertedinthecollection.
query Thisisafieldthatwillbepresentforoperationsotherthantheinsertandgetmorecommands.
client ThisistheIPaddress/hostnameandtheportoftheclientwhoinitiatedtheoperation.
desc Thisisthedescriptionoftheclient,mostlytheclient’sconnectionname.
connectionId Thisistheidentifieroftheclientconnectionfromwhichtherequestoriginated.
locks
Thisisadocumentcontainingthelocksheldforthisoperation.Thedocumentshowsthelocksheldfortheoperationbeinganalyzedforvariousdatabases.The^indicatesgloballockand^testindicatesthelockonthetestdatabase.Thevalueshereareinteresting.Thevalueof^isw(lowercase).Thismeansthatitisnotanexclusivewritelock,andmultipledatabasescanwriteconcurrently.Itisalockheldatthedatabaselevel.^testhasavalueW,whichisaglobalwritelock.Thismeansthatthewritelockonthetestdatabaseisexclusiveandnootheroperationonanydatabasecanoccurwhenthislockisheld.TheprecedingoutputisforVersion2.4ofMongo.
waitingForLock
Thisfieldindicateswhethertheoperationiswaitingforalocktobeacquired.Forinstance,iftheprecedingindexcreationwasnotabackgroundprocess,otheroperationsonthisdatabasewouldqueueupforthelocktobeacquired.Thisflagforthoseoperationswillthenbetrue.
msgThisisahuman-readablemessagefortheoperation.Inthiscase,wedoseeapercentageofoperationcomplete,asthisisanindexcreationoperation.
progress
Thisisthestateoftheoperation.Thetotalgivesthetotalnumberofdocumentsinthecollectionanddonegivesthenumbersindexedsofar.Inthiscase,thecollectionalreadyhadsomemoredocuments(over10milliondocuments).Thepercentageofoperationcompletediscomputedfromthesefigures.
numYields
Thisisthenumberoftimestheprocesshasyieldedthelocktoallowotheroperationstoexecute.Asthisisabackgroundindexcreationprocess,thisnumberwillkeeponincreasingastheserveryieldsitfrequentlytoletotheroperationsexecute.Haditbeenaforegroundprocess,thelockwouldneverbeyieldedtilltheoperationcompletes.
lockStats
Thisdocumenthasmorenesteddocumentsgivingstatsofthetotaltimethisoperationhasheldthereadorwritelock,andalsothetimeitwaitedtoacquirethelock.Thefollowingarethepossiblevalues:
r:Thisisthetimelockedforaspecific(databaselevel)readlockw:Thisisthetimelockedforaspecific(databaselevel)writelockR:ThisisthetimelockedforglobalreadlockW:Thisisthetimelockedforglobalwritelock
Ifyouhaveareplicaset,therewillbemanymoregetmoreoperationsonoplogontheprimaryfromsecondary.
Toseeifsystemoperationsareexecuted,weneedtopassatruevalueastheparametertothecurrentOpfunctioncallasfollows:
>db.currentOp(true)
Next,wewillseehowtokilltheuser-initiatedoperationusingthekillOpfunction.Theoperationissimplycalledasfollows:
>db.killOp(<operationid>)
Inourcase,theindexcreationprocesshadtheprocessID11587458andthusitwillbekilledasfollows:
>db.killOp(11587458)
Onkillinganyoperation,irrespectiveofwhetherthegivenoperationIDexistsornot,weseethefollowingmessageontheconsole:
{"info":"attemptingtokillop"}
Thus,seeingthismessagedoesn’tmeanthattheoperationwaskilled.Itjustmeansthattheoperation,ifitexists,willbeattempted.
IfanoperationcannotbekilledimmediatelyandifthekillOpcommandisissuedforit,thekillPendingfieldincurrentOpwillstartappearingforthegivenoperation.Forexample,executethefollowingqueryontheshell:
>db.currentOpTest.find({$where:'sleep(100000)'})
Thiswillnotreturn,andthethreadexecutingthequerywillsleepfor100seconds.ThisisanoperationthatcannotbekilledusingkillOp.TryexecutingcurrentOpfromanothershell(donottabforautocompletion;yourshellmayjusthang),gettheoperationID,andthenkillitusingthekillOpcommand.YoushouldseethattheprocesswillstillberunningifyouexecutethecurrentOpcommand,butthedocumentfortheprocessdetailswillnowcontainanewkey,killPending,statingthatthekillforthisoperationisrequestedbutpending.
UsingprofilertoprofileoperationsInthisrecipe,wewilllookatMongo’sin-builtprofilerthatwillbeusedtoprofiletheoperationsexecutedontheMongoDBserver.Itisautilitythatisusedtologalloperationsortheslowonesandthatcanbeusedtoanalyzetheperformanceoftheserver.
GettingreadyInthisrecipe,wewillbeperformingsomeoperationsonastandaloneMongoinstanceandprofilingthem.Weneedtostartastandaloneserverlisteningtoanyportforclientconnections;inthiscase,wewillsticktothedefault27017.Ifyouarenotawareofhowtostartastandaloneserver,refertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer.Wealsoneedtostartashellthatwillbeusedtoperformquerying,enableprofiling,andviewtheprofilingoperation.
Howtodoit…1. Oncetheserverisstartedandtheshellisconnectedtoit,executethefollowingtoget
thecurrentprofilinglevel:
>db.getProfilingLevel()
Thedefaultlevelshouldbe0(noprofilingifwehavenotsetitearlier)
2. Letussettheprofilinglevelto1(logslowoperationsonly)andlogalltheoperationsslowerthan50ms.Executethefollowingcommandontheshell:
>db.setProfilingLevel(1,50)
3. Now,letusexecuteaninsertoperationintoacollectionandthenexecuteacoupleofqueries:
>db.profilingTest.insert({i:1})
>db.profilingTest.find()
>db.profilingTest.find({$where:'sleep(70)'})
4. Nowexecutethequeryonthefollowingcollectionasfollows:
>db.system.profile.find().pretty()
Howitworks…Profilingissomethingthatwillnotbeenabledbydefault.Ifyouarehappywiththeperformanceofthedatabase,thereisnoreasontoenabletheprofiler.Itisonlywhenonefeelsthatthereissomeroomforimprovementandwantstotargetsomeexpensiveoperationstakingplace.Animportantquestionis,whatclassifiesanoperationtobeslow?Theansweris,itvariesfromapplicationtoapplication.Bydefault,inMongo,slowmeansanyoperationabove100ms.However,whilesettingtheprofilinglevel,youmaychoosethethresholdvalue.
Therearethreepossiblevaluesforprofilinglevels:
0:Disableprofiling1:Enableprofilingforslowoperationswherethethresholdvalueforanoperationtobeclassifiedas”slow”isprovidedwiththecallwhilesettingtheprofilinglevel2:Profilealloperations
Whileprofilingalloperationsmightnotbeaverygoodideaandmightnotcommonlybeused,asweshallsoonsee,settingthevalueto1withathresholdprovidedtoitisagoodwaytomonitorslowoperations.
Ifwelookatthestepsweexecuted,weseethatwecangetthecurrentprofilinglevelbyexecutingthedb.getProfilingLevel()operation.Togetmoreinformation,suchaswhatvalueissetasathresholdfortheslowoperations,wecanexecutedb.getProfilingStatus(),whichreturnsadocumentwiththeprofilinglevelandthethresholdvalueforslowoperations.
Forsettingtheprofilinglevel,wecallthedb.setProfilingLevel()method.Inourcase,wesetitforloggingalloperationstakingmorethan50msasdb.setProfilingLevel(1,50).
Todisableprofiling,simplyexecutedb.setProfilingLevel(0).
Whatwedonextisexecutethreeoperations;onetoinsertadocument,onetofindalldocuments,andfinally,afindthatcallssleepwithavalueof70mstoslowitdown.
Thefinalstepistoseetheseprofiledoperationsthatareloggedinthesystem.profilecollection.Weexecuteafindoperationtoseetheoperationslogged.Formyexecution,theinsertandthefinalfindoperationwiththesleepwerelogged.
Obviously,thisprofilinghassomeoverheadbutanegligibleone.Hence,wewillnotenableitbydefault,butonlywhenwewanttoprofileslowoperations.Also,anotherquestionis,willthisprofilingcollectionincreaseoveraperiodoftime?Theanswerisno,asthisisacappedcollection.Cappedcollectionsarefixed-sizecollections,whichpreserveinsertionordersandactascircularqueuesfillinginthenewdocumentsanddiscardingtheoldestwhenitgetsfull.Aqueryonsystem.namespacesshouldshowthestats.Thequeryexecutionwillshowthefollowingoutputforthesystem.profilecollection:
{"name":"test.system.profile","options":{"capped":true,"size":1048576}}
Aswesee,thesizeofthecollectionis1MB,whichisincrediblysmall.Settingtheprofilinglevelto2willthuseasilyoverwritethedataonbusysystems.Onemayalsochoosetoexplicitlycreateacollection,withthenamesystem.profile,asacappedcollectionofanysizeyouprefer,shouldyouchoosetoretainmoreoperationsinit.Tocreateacappedcollectionexplicitly,youmayexecutethefollowingqueryfromtheMongoshell:
>db.createCollection('system.profile',{capped:1,size:1048576})
Obviously,thesizechosenisarbitrary,andyouarefreetoallocateanysizetothiscollection,basedonhowfrequentlythedatagetsfilledandhowmuchprofilingdatayouwanttokeepbeforeitgetsoverwritten.
Asthisisacappedcollection,andtheinsertionorderispreserved,aquerywiththesortorder{$natural:-1}willbeperfectlyfineandveryefficientatfindingoperationsinthereverseorderofexecutiontime.
Finally,wewilltakealookatthedocumentthatgotinsertedinthesystem.profilecollectionandseewhichoperationsithaslogged:
{
"op":"query",
"ns":"test.profilingTest",
"query":{
"$where":"sleep(70)"
},
"ntoreturn":0,
"ntoskip":0,
"nscanned":1,
"keyUpdates":0,
"numYield":0,
"lockStats":{
"timeLockedMicros":{
"r":NumberLong(188796),
"w":NumberLong(0)
},
"timeAcquiringMicros":{
"r":NumberLong(5),
"w":NumberLong(6)
}
},
"nreturned":0,
"responseLength":20,
"millis":188,
"ts":ISODate("2014-01-27T17:37:02.482Z"),
"client":"127.0.0.1",
"allUsers":[],
"user":""
}
Asweseeintheprecedingdocument,thereareindeedsomeinterestingstats.Letuslookatsomeoftheminthefollowingtable.Someofthesefieldsareidenticaltothefieldsweseewhenweexecutethedb.currentOp()operationfromtheshell:
Field Description
opThisistheoperationthatgotexecuted.Inthiscase,itwasafindoperationandthus,itisaqueryinthiscase.
nsThisisthefullyqualifiednameofthecollectiononwhichtheoperationwasperformed.Itwillbeinthe<database>.<collectionname>format.
query Thisshowsthequerythatgotexecutedontheserver.
nscannedThishasasimilarmeaningtoexplainplanintherelationaldatabase.Itisthetotalnumberofdocumentsandindexentriesscanned.
numYields Thisisthenumberoftimesthelockwasyieldedwhentheoperationwasexecuted.
lockStatsThishassomeinterestingstatsforthetimetakentoacquirethelockandthetimeforwhichthelockwasheld.
nreturned Thisisthenumberofdocumentsreturned.
responseLength Thisisthelengthoftheresponseinbytes.
millis Mostimportantofall,thisisthetimetakeninmillisecondstoexecutetheoperation.
ts Thisisthetimewhentheoperationwasexecuted.
client Thisisthehostname/IPaddressoftheclientwhoexecutedtheoperation.
SettingupusersinMongoDBSecurityisoneofthecornerstonesofanyenterprise-levelsystem.Notalwayswillyoufindasysteminacompletelysafeandsecureenvironmenttoallowunauthenticateduseraccesstoit.Apartfromtestenvironments,almosteveryproductionenvironmentrequiresproperaccessrightsandperhaps,anauditofthesystemaccesstoo.Mongosecurityhasmultipleaspects:
Accessrightsfortheendusersaccessingthesystem.Therewillbemultipleroles,suchasadmin,read-onlyusers,andread/writenonadministrativeusers.Authenticationofthenodesthatareaddedtothereplicaset.Inareplicaset,oneshouldonlybeallowedtoaddauthenticatedsystems.Theintegrityofthesystemwillbecompromisedifanyunauthenticatednodeisaddedtothereplicaset.Encryptionofthedatathatistransmittedacrossthewirebetweenthenodesofthereplicasets,oreventheclientandtheserver(orthemongosprocessinthecaseofashardedsetup).
Inthisrecipeandthenextone,wewillbelookingathowtoaddressthefirsttwopointsmentionedintheprecedingbulletlist.Thelastpoint,aboutencryptingthedatabeingtransmittedonthewire,isnotsupportedbydefaultbythecommunityeditionofMongo,anditwillneedarebuildoftheMongodatabasewiththessloptionenabled.
NoteAllthestepsareexecutedontheMongoDBserverVersion2.4.6,andalltheexplanationsholdtrueforthisversion.TherearequiteafewchangesrelatedtothecontentwediscussherethatarepresentinVersion2.6ofMongoDB.Any2.6-specificdetailswillbementionedasandwhenneeded.
GettingreadyInthisrecipe,wewillbesettingupusersforastandaloneMongoinstance.Weneedtostartastandaloneserverlisteningtoanyportforclientconnections;inthiscase,wewillsticktothedefault27017.Ifyouarenotawareofhowtostartastandaloneserver,refertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer.Wealsoneedtostartashellthatwillbeusedforthisadminoperation.Forareplicaset,wewillonlybeconnectedtoaprimaryinstanceandwillperformtheseoperations.
Howtodoit…Wewilladdanadminuser,aread-onlyuserforthetestdatabase,andaread-writeuserforthetestdatabaseinthisrecipe.
Thefollowingareassumedatthispoint:
Theserverisstarted,upandrunningandweareconnectedtoitfromtheshell.Theserverisstartedwithoutanyspecialcommand-lineargumentotherthanthosementionedinChapter1,InstallingandStartingtheMongoDBServer,forstartingasinglenode.Thus,wehavefullaccesstotheserverforanyuser.
Let’sgetstarted:
1. Thefirststepistocreateanadminuser.Notethat,tillVersion2.4ofMongoDB,themethodnameisaddUser.However,inVersion2.6ofMongoDB,themethodiscreateUser.Wewilllookatbothmethodsofcreatingtheusers.Executesteps3and4ifyouareworkingonaMongoDBserver,Version2.4andsteps5and6ifyouareworkingonaMongoDBserver,Version2.6.
2. ExecutethefollowingcommandintheMongoshelltoswitchtotheadmindatabase:
>useadmin
3. Intheadmindatabase,wewilladdausercalledadminandthepasswordasadmin:
>db.addUser('admin','admin')
{
"user":"admin",
"readOnly":false,
"pwd":"7c67ef13bbd4cae106d959320af3f704",
"_id":ObjectId("52ea98ef2d00f6e6fb1fcdba")
}
4. Wewillnowswitchtothetestdatabaseasfollows:
>usetest
5. Inthetestdatabase,wewillcreatetwousers,aread-onlyusercalledread_userandaread/writeusercalledwrite_user.Thepasswordforboththeseusersisthesameastheirusernames.
6. Executethefollowingcommandstocreatetheseusers:
>db.addUser({user:'read_user',pwd:'read_user',roles:['read']})
{
"user":"read_user",
"pwd":"60477dd7460977860674077dc0039102",
"roles":[
"read"
],
"_id":ObjectId("52ee29012d00f6e6fb1fcdbc")
}
>db.addUser({user:'write_user',pwd:'write_user',roles:
['readWrite']})
{
"user":"write_user",
"pwd":"7944cf3480b0eabbf0cff4498ed9652b",
"roles":[
"readWrite"
],
"_id":ObjectId("52ee292c2d00f6e6fb1fcdbd")
}
7. WewilllookathowtocreateusersintheadminandtestdatabasesinVersion2.6ofMongoDB.ThesestepsareidenticaltoVersion2.4ofMongoDB,exceptforthenameofthemethods.Therearesomeadditionalfeaturesforthismethodthatwewillseeindetailinthenextsection.First,westartbycreatingtheadminuserintheadmindatabase,asfollows:
>useadmin
>db.createUser({
user:'admin',pwd:'admin',
customData:{desc:'Theadminuserforadmindb'},
roles:['readWrite','dbAdmin',clusterAdmin']
}
)
8. Wewilladdread_userandwrite_usertothetestdatabase.Toaddtheusers,executethefollowingcommandsfromtheMongoshell:
>usetest
>db.createUser({
user:'read_user',pwd:'read_user',
customData:{desc:'Thereadonlyuserfortestdatabase'},
roles:['read']
}
)
>db.createUser({
user:'write_user',pwd:'write_user',
customData:{desc:'Theread/writeuserfortestdatabase'},
roles:['readWrite']
}
)
9. NowshutdowntheMongoDBserverandtheclosetheshelltoo.RestarttheMongoDBserverbutwiththe--authoptiononthecommandline,asfollows:
$mongod..<otheroptionsasprovidedearlier>--auth
10. NowconnecttotheserverfromthenewlyopenedMongoshellandexecutethefollowingcommand:
>db.testAuth.find()
ThetestAuthcollectionneednotexist,butyoushouldseeanerrorstatingthatwearenotauthorizedtoquerythecollection
11. Wewillnowloginfromtheshellusingread_userasfollows:
>db.auth('read_user','read_user')
12. Wewillnowexecutethesamefindoperationasfollows(notethatthefindoperationshouldnotgiveanerrorandmaynotreturnanyresults,dependingonwhetherthecollectionexistsornot):
>db.testAuth.find()
13. Nowwewilltrytoinsertadocumentasfollows(notethatweshouldgetanerrorstatingthatyouarenotauthorizedtoinsertdatainthiscollection):
>db.testAuth.insert({i:1})
14. Wewillnowlogoutandloginagain,butwithawriteuserasfollows.Notethedifferenceinthewayweloginthistimearound,asagainstthepreviousinstance.Weareprovidingadocumentastheparametertotheauthfunction,whereas,inthepreviouscase,wepassedtwoparametersfortheusernameandpassword.
>db.logout()
>db.auth({user:'write_user',pwd:'write_user'})
15. Nowexecutetheinsertoperationagainasfollows(thistime,itshouldwork):
>db.testAuth.insert({i:1})
16. Nowexecutethefollowingcommandontheshell.Youshouldgettheunauthorizederror:
>db.serverStatus()
17. Wewillnowswitchtotheadmindatabase.Wearecurrentlyconnectedtotheserverusingwrite_user,whichhasread/writepermissionsonthetestdatabase.FromtheMongoshell,trytoexecutethefollowingcommands:
>useadmin
>showcollections
18. ClosetheMongoshelloropenanewshell,asfollows,fromtheoperatingsystem’sconsole.Thisshouldtakeusdirectlytotheadmindatabase:
$mongo-uadmin-padminadmin
19. Nowexecutethefollowingontheshell.Itshouldshowusthecollectionsintheadmindatabase:
>showcollections
20. Tryandexecutethefollowingoperation:
>db.serverStatus()
ExecutethisstepifyouareonVersion2.4ofMongoDBandcreatetheadminuserusingthedb.addUser('<username>','<password>').
21. Switchtothetestdatabaseandexecutetheinsertandfindoperationsasfollows:
>usetest
>db.testAuth.insert({i:1})
Howitworks…Weexecutedalotofstepsandnowwewilltakeacloserlookatthem.
Initially,theserverisstartedwithoutthe--authoption;hence,nosecurityisenforcedbydefault.
Version2.4ofMongoDBiswherewecreateauserintheadmindatabaseusingtheaddUser(<userName>,<password>)formofthemethod.Thiscreatesauserintheadmindatabase;thisspecialuserhasread/writeaccesstoallthedatabasesandcanrunadmincommands,suchasdb.serverStatus(),andotherreplicationandsharding-relatedcommands.Alluserscreatedindatabasesotherthanadmin,whetherreadorwrite,willonlybeabletoaccessthecollectionsintheirrespectivedatabases.
Inversion2.6,however,wecreatetheadminuserusingthedb.createUsermethod.Letustakeacloserlookatthismethodfirst.ThesignatureofthemethodtocreatetheuseriscreateUser(user,writeConcern).Thefirstparameteristheuser,whichactuallyisaJSONdocument,andthesecondparameteristhewriteconcerntouseforusercreation.TheJSONdocumentfortheuserhasthefollowingformat:
{
'user':<username>,
'pwd':<password>,
'customData':{<JSONdocumentprovidinganyuserspecificdata>}
'roles':[<rolesoftheuser>]
}
Therolesprovidedherecanbeprovidedasfollows,assumingthatthecurrentdatabasewhentheuseriscreatedistestontheshell:
[{'role':'read','db':'reports'},'readWrite']
ThisgivestheuserthatisbeingcreatedreadaccesstothedbreportsandreadWriteaccesstothetestdatabase.Letusseethecompleteusercreationcallforthetestuser:
>usetest
>db.createUser({
user:'test',pwd:'test',
customData:{desc:'readaccessonreportsandreadWriteaccessontest'},
roles:[
{role:'read',db:'reports'},
'readWrite'
]
}
)
Thewriteconcern,whichisanoptionalparameter,canbeprovidedastheJSONdocument.Somesamplevaluesare{w:1}and{w:'majority'}.
Comingbacktotheadminusercreation,wecreatedtheuserinstep4usingthecreateUsermethodandgavethreerolestothisuserintheadmindatabase.
Insteps4and6,wecreatedthereadandread/writeusersinthetestdatabaseusingthe
addUsermethodforversion2.4andthecreateUsermethodforversion2.6.TheJSONdocumentforthecreationofauserinversion2.4isidenticaltotheuserJSONdocumentinversion2.6,exceptforacoupleofdifferences.First,thereisnocustomDatafieldandsecond,therolesarraycontainsstringvaluesonlyfortheuserroles.
TheJSONdocumentfortheuserinversion2.4hasthefollowingformat:
{
'user':<username>,
'pwd':<password>,
'roles':[<stringvaluesforrolesoftheuser>]
}
WeshutdowntheMongoDBserveraftertheadminreadandread-writeusercreation,andrestartitwiththe--authoption.
Onstartingtheserveragain,weconnecttoitfromtheshell,whichisinstep9,butunauthenticated.Here,wetrytoexecuteafindqueryonacollectioninthetestdatabase;thisfails,asweareunauthenticated.Thisshowsthattheservernowrequiresappropriatecredentialstoexecuteoperationsonit.Insteps10to11,weloginusingread_userandtrytoexecuteafindoperationfirst,whichsucceeds,andthenaninsertoperation,whichdoesn’t,astheuserhasreadprivilegesonly.Thewaytoauthenticateauserisbyinvokingdb.auth(<username>,<password>)fromtheshell,anddb.logout()willlogoutthecurrentloggedinuser.
Insteps13to15,wedemonstratethatwecanperforminsertoperationsusingwrite_user,butadminoperationssuchasdb.serverStatus()cannotbeexecutedastheseoperationsexecuteadminCommandontheserver.Thismeansthatanonadminuserisnotpermittedtoinvoketheseoperations.Similarly,whenwechangethedatabasetoadmin,thewrite_user,whichisfromthetestdatabase,isnotpermittedtoperformanyoperationssuchasgettingalistofcollectionsoranyoperationtoqueryacollectionintheadmindatabase.
Insteps16to19,welogintotheshellusingtheadminusertotheadmindatabase.Previously,weloggedintothedatabaseusingtheauthmethod;inthiscase,weusedthe-uand-poptionstoprovidetheusernameandthepassword.Wealsoprovidedthenameofthedatabasetoconnectto,whichisadmininthiscase.Here,weareabletoviewthecollectionsontheadmindatabaseandalsoexecuteadminoperationssuchasgettingtheserverstatus.Inversion2.6,executingthedb.serverStatuscallispossible,astheuserisgiventheclusterAdminrole.
Instep18,weareabletoswitchtoanyotherdatabaseandexecuteread/writeoperations.Thisisaspecialprivilegefortheusersoftheadmindatabase,whichnootheruserhas.Thisispossiblebecausewecreatedtheuserinversion2.4usingtheversion2.2styleofusercreation,db.addUser(<username>,<password>).Theadminusercreatedinversion2.6isnotabletoquerythetestdatabaseasitwouldneedappropriatereadandread/writeprivilegesontherespectivedatabasestoperformtheseoperations.
Onefinalthingtonote;apartfromwritingtoacollection,auserwithwriteprivilegescanalsocreateindexesonthecollectioninwhichhehaswriteaccess.
There’smore…Inthisrecipe,wesawhowwecancreatedifferentusersandwhatpermissionstheyhave,restrictingsomesetsofoperations.Inthenextrecipe,wewillseehowwecanhaveauthenticationdoneattheprocesslevel.Thatis,howoneMongoinstancecanauthenticateitselfforbeingaddedtoareplicaset.
Seealsohttp://docs.mongodb.org/manual/reference/built-in-roles/togetdetailsofvariousin-builtroleshttp://docs.mongodb.org/manual/core/authorization/#user-defined-rolestolearnmoreaboutdefiningcustomuserroles
UnderstandinginterprocesssecurityinMongoDBInthepreviousrecipe,wesawhowauthenticationcanbeenforcedforausertobeloggedinbeforeallowinganyoperationsonMongo.Inthisrecipe,wewilllookatinterprocesssecurity.Bytheterminterprocesssecurity,wedon’tmeanencryptingthecommunicationbutonlyensuringthatthenode,whichisaddedtoareplicaset,isauthenticatedbeforebeingaddedtothereplicaset.
GettingreadyInthisrecipe,wewillbestartingmultipleMongoinstancesaspartofareplicaset.Thus,youmighthavetorefertotheStartingmultipleinstancesaspartofareplicasetrecipeinChapter1,InstallingandStartingtheMongoDBServer,ifyouarenotawareofhowtostartareplicaset.Apartfromthat,inthisrecipe,allwewillbelookingatishowtogenerateakeyfiletobeused,andthebehaviorwhenanunauthenticatednodeisaddedtothereplicaset.
Howtodoit…Tosettheground,wewillbestartingthreeinstances,eachlisteningtoports27000,27001,and27002respectively.Thefirsttwowillbestartedbyprovidingthemwithapathtothekeyfilewhilethethirdwillnotreceivethis.Later,wewilltryaddingthesethreeinstancestothesamereplicaset.Letustakealookatthestepsindetail:
1. Letusgeneratethekeyfilefirst.Thereisnothingspectacularaboutgeneratingthekeyfile.Thisisassimpleashavingafilewith6to1024charactersfromthebase64characterset.OntheLinuxfilesystem,youmaychoosetogeneratepseudorandombytesusingopenssl,andencodethemtobase64.Thefollowingcommandwillgenerate500randombytes,andthesebyteswillthenbebase64encodedandwrittentokeyfile:
$opensslrand–base64500>keyfile
2. OnaUnixfilesystem,thekeyfileshouldnothavepermissionsforworldandgroup,andthus,afteritiscreated,weshouldexecutethefollowingcommand:
$chmod400keyfile
3. Notgivingwritepermissiontothecreatorensuresthatwedon’toverwritethecontentsaccidentally.OnaWindowsplatform,however,openssldoesn’tcomeoutoftheboxandthus,youhavetodownloadit.Thearchiveisextractedandthebinfolderisaddedtotheoperatingsystem’spathvariable.ForWindows,wecandownloadopensslfromhttp://gnuwin32.sourceforge.net/packages/openssl.htm.
4. Youmayevenchoosenottogeneratethekeyfileusingtheapproachmentionedearlier(thatis,usingopenssl)andcantakeaneasywayoutbyjusttypingplaintextinthekeyfilefromanytexteditorofyourchoice.However,notethatthecharacters\r,\n,andspacesarestrippedoffbyMongoandtheremainingtextisconsideredasthekey.Forexample,wemaycreateafilewiththefollowingcontentaddedtothekeyfile.Again,thefilewillbenamedkeyfile:
somecontentaddedtothekeyfilefromtheeditorwithoutspaces
Usinganyapproachmentionedearlier,wewouldnowhaveakeyfileinplacethatwillbeusedforthenextstepsoftherecipe
5. WewillnowsecuretheMongoprocessesbystartingtheMongoinstanceasfollows.IwillbestartingtheMongoinstancesonWindows;mykeyfileIDnamedkeyfileisplacedonc:\MongoDB,andthedatapathsarec:\MongoDB\data\c1,c:\MongoDB\data\c2,andc:\MongoDB\data\c3respectively,forthethreeinstances.
6. Startthefirstinstancelisteningtoport27000asfollows:
C:\>mongod--dbpathc:\MongoDB\data\c1--port27000--auth--keyFile
c:\MongoDB\keyfile--replSetsecureSet--smallfiles--oplogSize100
7. Similarly,startthesecondserverlisteningtoport27001asfollows:
C:\>mongod--dbpathc:\MongoDB\data\c2--port27001--auth--keyFile
c:\MongoDB\keyfile--replSetsecureSet--smallfiles--oplogSize100
8. Thethirdinstancewillbestarted,butwithoutthe--authand--keyFileoptions,listeningtoport27002asfollows:
C:\>mongod--dbpathc:\MongoDB\data\c3--port27002--replSet
secureSet--smallfiles--oplogSize100
9. WethenstartaMongoshellandconnectittoport27000,whichisthefirstinstancestarted.FromtheMongoshell,wetypethefollowingcommand:
>rs.initiate()
10. Inafewseconds,thereplicasetwillbeinitiatedwithjustoneinstanceinit.Wewillnowtrytoaddtwonewinstancestothisreplicaset.First,theonelisteningonport27001,asfollows(youwillneedtoaddtheappropriatehostname;Amol-PCisthehostnameinmycase):
>rs.add({_id:1,host:'Amol-PC:27001'})
11. Wewillthenexecutethefollowingcommandtoconfirmthestatusofthenewlyaddedinstance,byexecutingrs.status().Itshouldsooncomeupasasecondary.
12. Wewillnowfinallytryandaddaninstancethatwasstartedwithoutthe--authand--keyFileoptions,asfollows:
>rs.add({_id:2,host:'Amol-PC:27002'})
13. Thisshouldaddtheinstancetothereplicaset,butexecutingrs.status()willshowthestatusoftheinstanceasUNKNOWN.Theserverlogsfortheinstancerunningon27002shouldshowsomeauthenticationerrorsaswell.
14. Wewillfinallyhavetorestartthisinstance.However,thistimeweprovidethe--authand--keyFileoptionsasfollows:
C:\>mongod--dbpathc:\MongoDB\data\c3--port27002--replSet
secureSet--smallfiles--oplogSize100--auth--keyFile
c:\MongoDB\keyfile
15. Oncetheserverisstarted,connecttoitfromtheshellagainandtypeinrs.status().Inafewmoments,itshouldcomeupasasecondaryinstance.
There’smore…Inthisrecipe,weexploredinterprocesssecuritytopreventunauthenticatednodesfrombeingaddedtothemongoreplicaset.Westillhaven’tencryptedthedatathatisbeingsentoverthewiretoensureit’sdeliveredsecurely.InAppendix,ConceptsforReference,wewillseehowtobuildtheMongoDBserverfromthesourceandhowtoenableencryptionofthecontentsoverthewire.
ModifyingcollectionbehaviorusingthecollModcommandThisisacommandthatwillbeexecutedtochangethebehaviorofacollectioninMongo.Itcanbethoughtofasacollection-modifyingoperation(itisnotmentionedanywhereofficiallythough).
Forapartofthisrecipe,knowingaboutTTLindexesisrequired.
GettingreadyInthisrecipe,wewillbeexecutingthecollModoperationonacollection.Weneedtostartastandaloneserverlisteningtoanyportforclientconnections;inthiscase,wewillsticktothedefault27017.Ifyouarenotawareofhowtostartastandaloneserver,refertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer.Wealsoneedtostartashellthatwillbeusedforthisadministration.ItishighlyrecommendedyoutakealookattheExpiringdocumentsafterafixedintervalusingtheTTLindexandExpiringdocumentsatagiventimeusingtheTTLindexrecipesinChapter2,Command-lineOperationsandIndexes,ifyouarenotawareofthem.
Howtodoit…ThecollModoperationcanbeusedtodoafewthings:
1. Letuschangethespaceallocationonthediskforthenewdocumentbeingadded.AcollectionneedstoexisttoexecutethecollModcommand.Youcantrytoexecutethiscommandagainstanyexistingcollection.Inourcase,IamassumingwehaveacollectioninplacecalledpowerOfTwoCol,whichwascreatedusingthefollowingcommandfromtheMongoshell:
>db.createCollection('powerOfTwoCol')
2. Oncethecollectionisinplace/created,executethefollowingcommand:
>db.runCommand({collMod:'powerOfTwoCol',usePowerOf2Sizes:1})
3. LetusnowchangethesettingsoftheTTLindex.AssumingwehaveacollectionwithaTTLindex,aswesawinChapter2,Command-lineOperationsandIndexes,wecandothatbyexecutingthefollowingcommand:
>db.ttlTest.getIndexes()
4. Tochangetheexpirytimeto800msfrom300ms,executethefollowingcommand:
>db.runCommand({collMod:'ttlTest',index:{keyPattern:
{createDate:1},expireAfterSeconds:800}})
Howitworks…ThecollModcommandalwayshasthe{collMod:<nameofthecollection>,<collmodoperation>}format.Therearetwopossibleoperationscurrentlysupportedthatwewillsee.Wewillbreakourexplanationintotwoparts.
First,wewillseewhathappensbysettingusePowerOf2Sizes.Ifacollectionisheavyonupdatesandthedocumentsgrowinsize,itwillbemovedonthediskwhenitcannolongergrowwhereitisplaced.Thiscausesaholetobeleftonthediskspaceforthecollectionattheplacewherethedocumentoriginallywas.Mongousestheseholestoaccommodatenewdocumentswhereverpossible.However,byusingtheusePowerOf2Sizessetting,Mongoallocatesdiskspaceinnumbersbythepoweroftwo(32,64,128,256,…),withtheminimumvaluebeing32.Thissettingdoesuseafewmoreextraspacesascomparedtoanormaldocumentwithoutthissetting,asthediskspaceusedisroundedalwaysbythepoweroftwo.However,inthelongterm,whenthedocumentsgetupdatedfrequentlyandgrowinsize,thediskusageisbetter,soistheperformance,asdocumentmovementisreduced.Thus,ifyouforeseethispatternofdocumentsgrowinginsizewithtime,settingthisoptionmightbeagoodidea.However,forpatternswheredocumentsarejustinsertedandrarelyupdated,wearebetteroffwiththedefaultsettings(tillversion2.4).Also,ifthecollectionalreadyhasdatawhenthisoptionisset,thesubsequentallocationforthenewdocumentswouldbebythepoweroftwo,withoutaffectingtheexistingdocuments.
FromVersion2.6ofMongoDB,theusePowerOf2Sizesstrategyisthedefaultoptionforallcollectionsandthus,usePowerOf2Sizes:falseistheonlysensibleoptiontouseinthecollModoperation.Whenstartingtheserver,anewserverstartupparameternewCollectionUsePowerOf2Sizesisavailableanddefaultstothevaluetrue.ThisoptioncanbeusedtodisabletheusePowerOf2Sizessettingbyprovidingthevaluefalsetoit.Settingthisvaluetofalsewillensurethatthesizeallocatedtoanewdocumentwillusethestrategythatisfollowedtillversion2.4bydefault,whichprovidesspacethatisneededbythedocumenttimesthepaddingfactor.
ThesecondoperationbyusingcollModistochangetheTTLindex.IfaTTLindexhasalreadybeencreatedandthetimetoliveneedstobechangedaftercreation,weusethecollModcommand.Thisoperation-specificfieldisasfollows:
{index:{keyPattern:<thefieldonwhichtheindexwasoriginallycreated>,
expireAfterSeconds:<newtimetobeusedforTTLoftheindex>}}
ThekeyPatternisthefieldonwhichtheTTLindexiscreated,andexpireAfterSecondswillcontainthenewtimetobechangedto.Onsuccessfulexecution,weshouldseethefollowingoutputintheshell:
{"expireAfterSeconds_old":300,"expireAfterSeconds_new":800,"ok":1
}
SettingupMongoDBasaWindowsServiceWindowsServicesarelong-runningapplicationsthatruninthebackground,justlikedaemonthreads.Databasesaregoodcandidatesforsuchservices,wherebytheystartandstopwhenthehostmachinesstartandstop(youmay,however,choosetomanuallystart/stopaservice).Manydatabasevendorsdoprovideafeaturetostartthedatabaseasaservice,wheninstalledontheserver.MongoDBalsoletsyoudothat,andthatiswhatwewillseeinthisrecipe.
GettingreadyRefertotheSinglenodeinstallationofMongoDBwithoptionsfromtheconfigfilerecipeinChapter1,InstallingandStartingtheMongoDBServer,togetinformationonhowtostarttheMongoDBserverusinganexternalconfigurationfile.Asinthiscase,Mongoisrunasaservice,itcannotbeprovidedwithcommand-linearguments,andconfiguringitfromaconfigurationfileistheonlyalternative.RefertotheprerequisitesoftheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer.Thisisallwewillneedforthisrecipe.
Howtodoit…Let’stakealookatthestepsindetail:
1. Wewillfirstcreateaconfigfilewiththreeconfigurationvalues,namely,theport,dbpath,andlogfilepath.Wenamethefilemongo.confandkeepitinc:\conf\mongo.confwiththefollowingthreeentriesinit(youmaychooseanypathfortheconfigfilelocation,database,andlogs):
port=27000
dbpath=c:\data\mongo\db
logpath=c:\logs\mongo.log
2. ExecutethefollowingstepsfromtheWindowsterminal,whichyoumayneedtoexecuteasanadministrator.InWindows7,executethefollowingsteps:
1. PresstheWindowskeyonyourkeyboard.2. IntheSearchprogramsandfilesspace,typecmd.3. Intheprograms,thecommandpromptprogramwillbeseen.Right-clickonit
andselectRunasadministrator.
3. Intheshell,executethefollowingcommand:
C:\>mongod--configc:\conf\mongo.conf–install
Thelogprintedoutontheconsoleshouldconfirmthattheserviceisinstalledproperly
4. Theservicecanbestartedfromtheconsoleasfollows:
C:\>netstartMongoDB
5. Theservicecanbestoppedasfollows:
C:\>netstopMongoDB
6. Typeservices.mscintheRunwindow(Windowsbutton+R).Intheopenedmanagementconsole,searchfortheMongoDBservice.Weshouldseeitasfollows:
7. Theserviceisautomatic,thatis,itwillbestartedwhentheoperatingsystemstarts.Itcanbechangedtomanualbyright-clickingontheserviceandclickingonproperties.
8. Toremoveaservice,weneedtoexecutethefollowingcommandfromthecommandprompt:
C:\>mongod--remove
9. Therearemoreoptionsavailablethatcanbeusedtoconfigurethenameoftheservice,displayname,description,andtheuseraccountthatisusedtoruntheservice.Thesecanbeprovidedascommand-linearguments.Executethefollowingcommandtoseethepossibleoptions,andtakealookattheWindowsServiceControlManageroptions:
C:\>mongod--help
ConfiguringareplicasetWehavehadagooddiscussiononwhatareplicasetisandhowtostartasimplereplicaset,intheStartingmultipleinstancesaspartofareplicasetrecipeinChapter1,InstallingandStartingtheMongoDBServer.IntheUnderstandinginterprocesssecurityinMongoDBrecipe,wesawhowtostartareplicasetwithinterprocessauthentication.Tobehonest,thatisprettymuchwhatwedowhilesettingupastandardreplicaset.However,thereareafewconfigurationsthatonemustknow;onemustalsobeawareofhowtheyaffectthereplicaset’sbehavior.Notethatwearestillnotdiscussingtagawarereplicationinthisrecipe;itwillbetakenuplaterinthischapterasaseparaterecipeBuildingtaggedreplicasets.
GettingreadyRefertotherecipeStartingmultipleinstancesaspartofareplicasetinChapter1,InstallingandStartingtheMongoDBServer,fortheprerequisitesandtoknowaboutthereplicasetbasics.Goaheadandsetupasimplethree-nodereplicasetonyourcomputer,asmentionedintherecipe.
Beforewegoaheadwiththeconfigurations,wewillseewhatelectionsareinareplicasetandhowtheyworkfromahighlevel.Itisgoodtoknowaboutelectionsbecausesomeoftheconfigurationoptionsaffectthevotingprocessintheelections.
ElectionsinareplicasetAMongoreplicasethasoneprimaryinstanceandmultiplesecondaryinstances.Allwriteshappenonlythroughtheprimaryinstanceandarereplicatedtothesecondaryinstances.Readoperationscanhappenfromsecondaryinstances,dependingonthereadpreference.RefertotheReadpreferenceforqueryingsectioninAppendix,ConceptsforReference,toknowwhatreadpreferenceis.However,iftheprimarygoesdownorisnotreachableforsomereason,thereplicasetbecomesunavailableforwrites.AMongoreplicasethasafeaturetoautomaticallyfailovertoasecondary,bypromotingittoaprimaryandmakingthesetavailabletoclientsforbothreadandwriteoperations.Thereplicasetremainsunavailableforthatbriefmomenttillanewprimarycomesup.
Allthissoundsgood,butthequestionis,whodecideswhatthenewprimaryinstancewillbe?Theprocessofchoosinganewprimaryhappensthroughanelection.Wheneveranysecondarydetectsthatitcannotreachouttoaprimary,itasksallthereplicasetnodesintheinstancetoelectthemselvesasthenewprimary.
Allothernodesinthereplicasetthatreceivethisrequestfortheelectionoftheprimarywillperformcertainchecksbeforetheyvoteayestothesecondaryrequestinganelection.Let’stakealookatthesteps:
1. Theywillfirstcheckwhethertheexistingprimaryisreachable.Thisisnecessarybecausethesecondaryrequestingthere-electionisnotabletoreachtheprimary,possiblybecauseofanetworkpartition,inwhichcaseitshouldnotbeallowedtobecomeaprimary.Insuchacase,theinstancereceivingtherequestwillvoteano.
2. Secondly,theinstancewillcheckthestateofreplicatingitselfwiththesecondaryrequestingtheelection.Ifitfindsthattherequestingsecondaryisbehinditselfinthereplicateddata,itwouldvoteano.
3. Finally,theprimaryisnotreachable,butsomeinstancewithhigherprioritythanthesecondaryrequestingthere-electionisreachablefromit.Thisagainispossibleifthesecondaryrequestingthere-electioncan’treachouttothesecondarywithhigherpriority,possiblyduetoanetworkpartition.Inthisscenario,theinstancereceivingtherequestforelectionwillvoteano.
Theprecedingchecksareprettymuchwhatwillbehappening(notnecessarilyintheordermentionedhere)duringthere-election.Ifthesecheckspass,theinstancevotesayes.
Theelectionisvoidifevenasingleinstancevotesno.However,ifnoneoftheinstanceshavevotedno,thenthesecondarythatrequeststheelectionwillbecomeanewprimaryifitreceivesayesfromthemajorityofinstances.Iftheelectionbecomesvoid,therewillbeare-electionwiththesamesecondaryoranyotherinstancerequestinganelectionwiththeprecedingmentionedprocess,tillanewprimaryiselected.
Nowthatwehaveanideaaboutelectionsinareplicasetandtheterminologies,letuslookatsomereplicasetconfigurations.Afewoftheseoptionsarerelatedtovotes,andwestartbylookingattheseoptionsfirst.
BasicconfigurationforareplicasetFromChapter1,InstallingandStartingtheMongoDBServer,whenwesetupareplicaset,wehaveaconfigurationsimilartothefollowingone.Thebasicreplicasetconfigurationforathree-membersetisasfollows:
{
"_id":"replSet",
"members":[
{
"_id":0,
"host":"Amol-PC:27000"
},
{
"_id":1,
"host":"Amol-PC:27001"
},
{
"_id":2,
"host":"Amol-PC:27002"
}
]
}
Wewillnotberepeatingtheentireconfigurationinthestepsinthefollowingsections.Alltheflagswementionwillbeaddedtothedocumentofaparticularmemberinthemembersarray.Forexample,intheprecedingexample,ifanodewith_idas2istobemadeanarbiter,wewillhavethefollowingconfigurationforitintheconfigurationdocumentshownearlier:
{
"_id":2,
"host":"Amol-PC:27002"
"arbiterOnly":true
}
Generally,thestepstoreconfigureareplicasetthathasalreadybeensetupareasfollows:
1. Assigntheconfigurationdocumenttoavariable.Ifthereplicasetisalreadyconfigured,itcanbeobtainedusingthers.conf()callfromtheshellasfollows:
>varconf=rs.conf()
2. Themembersfieldinthedocumentisanarrayofdocumentsforeachindividual
memberofareplicaset.Toaddanewpropertytoaparticularmember,weneedtoexecutethefollowingcommand.Forinstance,ifwewanttoaddthevoteskeyandsetitsvalueto2forthethirdmemberofthereplicaset(index2inthearray),weexecutethefollowingcommand:
>conf.members[2].votes=2
3. JustchangingtheJSONdocumentwon’tchangethereplicaset.Weneedtoreconfigureitasfollowsifthereplicasetisalreadyinplace:
>rs.reconfig(conf)
4. Iftheconfigurationisdoneforthefirsttime,wewillcallthefollowingcommand:
>rs.initiate(conf)
Forallthestepsgiveninthenextsection,youneedtofollowtheprecedingstepstoreconfigureorinitiatethereplicaset,unlesssomeotherstepsarementionedexplicitly.
Howtodoit…Inthisrecipe,wewilllookatsomeofthepossibleconfigurationsthatcanbeusedinareplicaset.Theexplanationherewillbeminimalwithalltheexplanationsdoneasusualinthenextsection.
1. Thefirstconfigurationisanarbiteroptionthatisusedtoconfigureareplicasetmemberasamemberthatholdsnodatabutonlyhasrightstovote.Thefollowingkeyneedstobeaddedtotheconfigurationofthememberwhowillbemadeanarbiter:
{_id:...,'arbiterOnly':true}
2. Onethingtorememberregardingthisconfigurationisthatonceareplicasetisinitiated,noexistingmembercanbechangedtoanarbiterfromanonarbiternodeandviceversa.However,wecanaddanarbitertoanexistingreplicasetusingthehelperfunctionrs.addArb(<hostname>:<port>).Forexample,toaddanarbiterlisteningtoport27004toanexistingreplicaset,thefollowingcommandwasexecutedonmymachine:
>rs.addArb('Amol-PC:27004')
Whentheserverstartstolistentoport27004,andrs.status()isexecutedfromtheMongoshell,weseethatstateandstrStateforthismemberare7andARBITERrespectively.
3. Thenextoption,votes,affectsthenumberofvotesamembergetsintheelection.Bydefault,allmembersgetonevoteeach.Thisoptioncanbeusedtochangethenumberofvotesaparticularmembergets.Itcanbesetasfollows:
{_id:...,'votes':<numberofvotes>}
Thevotesofexistingmembersofareplicasetcanbechangedandthereplicasetcanbereconfiguredusingrs.reconfig().
Thoughtheoptionvotesisavailable,whichcanpotentiallychangethenumberofvotestoformamajority,itusuallydoesn’taddmuchvalueandisnotarecommendedoptiontouseinproduction.
4. Thenextreplicasetconfigurationoptioniscalledpriority.Itdeterminestheeligibilityofareplicasetmembertobecomeaprimary(ornottobecomeaprimary).Theoptionissetasfollows:
{_id:...,'priority':<prioritynumber>}
5. Ahighernumberindicatesmorelikelihoodofbecomingaprimary.Theprimarywillalwaysbetheonewiththehighestpriorityamongthemembersaliveinareplicaset.Settingthisoptioninanalreadyconfiguredreplicasetwilltriggeranelection.
6. Settingthepriorityoptionto0willensurethatamemberwillneverbecomeaprimary.
7. Thenextoptionwelookatishidden.Settingthevalueofthisoptiontotrueensures
thatthereplicasetmemberishidden.Theoptionissetasfollows:
{_id:...,'hidden':<true/false>}
Onethingtokeepinmindisthat,whenareplicasetmemberishidden,itsprioritytooshouldbemade0toensureitdoesn’tbecomeprimary.Thoughthisseemsredundant,asofthecurrentversion,thevalueorpriorityneedstobesetexplicitly.
8. Whenaprogramminglanguageclientconnectstoareplicaset,itwillnotbeabletodiscoverhiddenmembers.However,afterexecutingrs.status()fromtheshell,themember’sstatuswouldbevisible.
9. ThenextoptionwewilllookatistheslaveDelayoption.Thisoptionisusedtosetthelagintimefortheslavefromtheprimaryofthereplicaset.Theoptionissetasfollows:
{_id:...,'slaveDelay':<numberofsecondstolag>}
10. Likethehiddenmember,slavedelayedmemberstooshouldhavethepriorityoptionsetto0toensuretheydon’teverbecomeprimary.Thisneedstobesetexplicitly.
11. ThefinalconfigurationoptionwewillbelookingatisbuildIndexes.Thisvalueifnotspecified.Bydefault,thevalueistrue,whichindicatesthatifanindexiscreatedontheprimary,itneedstobereplicatedonthesecondarytoo.Theoptionissetasfollows:
{_id:...,'buildIndexes':<true/false>}
12. IfthevalueofbuildIndexesissettofalse,thepriorityissetto0toensuretheydon’teverbecomeprimary.Thisneedstobesetexplicitly.Also,thisoptioncannotbesetafterthereplicasetisinitiated.Justlikeanarbiternode,thisneedstobesetwhenthereplicasetisbeingcreatedorwhenanewmembernodeisbeingaddedtothereplicaset.
Howitworks…Inthissection,wewillexplainandunderstandthesignificanceofdifferenttypesofmembersandtheconfigurationoptionswesawintheprevioussection.
AreplicasetmemberasanarbiterTheEnglishmeaningoftheword”arbiter”isajudgewhoresolvesadispute.Inthecaseofreplicasets,thearbiternodeispresentjusttovoteinthecaseofelectionsandnottoreplicateanydata.Thisis,infact,aprettycommonscenarioduetothefactthatthataMongoreplicasetneedstohaveatleastthreeinstances(andpreferablyanoddnumberofinstances,threeormore).Alotofapplicationsdonotneedtomaintainthreecopiesofdataandarehappywithjusttwoinstances,oneprimaryandasecondarywiththedata.
Considerthescenariowhereonlytwoinstancesarepresentinthereplicaset.Whentheprimarygoesdown,thesecondaryinstancecannotformapropermajoritybecauseitonlyhas50percentofthevotes(itsownvotes)andthus,itcannotbecomeaprimary.Ifamajorityofthesecondaryinstancesgodown,thentheprimaryinstancestepsdownfromtheprimaryandbecomesasecondary,thusmakingthereplicasetunavailableforwrites.Thus,atwo-nodereplicasetisuseless,asitdoesn’tstayavailableevenwhenanyoftheinstancesgodown.Itdefeatsthepurposeofsettingupareplicasetandthus,aminimumofthreeinstancesareneededinareplicaset.
Arbiterscomeinhandyinsuchscenarios.Wesetupareplicasetinstancewiththreeinstances,withonlytwohavingdataandoneactingasanarbiter.Weneednotmaintainthreecopiesofdataatthesametime;weeliminatetheproblemweface,bysettingupatwo-instancereplicaset.
PriorityofreplicasetmembersThisisanoptionwhoseuseisenforcedbyotheroptionsaswell,thoughitcanbeusedonitsowninsomecases.Theoptionsthatenforceitsusagearehidden,slaveDelay,andbuildIndexes,wherewedon’twantthememberwithoneofthesethreeoptionstoeverbemadeprimary.Wewilllookattheseoptionssoon.
Somemorepossibleusecases,whereweneverwantareplicasettobecomeaprimary,areasfollows:
Whenthehardwareconfigurationofamemberisnotabletodealwiththewriteandreadrequests,shoulditbecomeaprimary;andtheonlyreasonitisbeingputinthereisforreplicatingthedata.Wehaveamultidatacentersetup,whereonereplicasetinstanceispresentinanotherdatacenterforthesakeofgeographicallydistributingthedatafordisasterrecoverypurposes.Ideally,thenetworklatencybetweentheapplicationserverhostingtheapplicationandthedatabaseshouldbeminimalforoptimumperformance.Thiscanbeachievedifboththeservers(theapplicationserveranddatabaseserver)areinthesamedatacenter.Notchangingthepriorityofthereplicasetinstanceinanotherdatacentermakesitequallyeligibleforbeingchosenasaprimary,thuscompromisingtheapplication’sperformanceiftheserverfromanotherdatacentergetschosenasthe
primary.Insuchscenarios,wecansettheprioritytobe0fortheserverintheseconddatacenter,andamanualcutoverwillbeneededbytheadministratortofailovertoanotherdatacenter,shouldanemergencyarise.
Inboththesescenarios,wecanalsohavetherespectivemembershiddensothattheapplicationclientdoesn’thaveaviewofthesemembersinthefirstplace.
Justaswesetthepriorityto0tonotallowonetobetheprimary,wecanalsobebiasedtowardsonememberbeingtheprimary,wheneveritisavailable,bysettingitsprioritytoavaluegreaterthanone,becausethedefaultvalueofthepriorityfieldis1.
Supposewehaveascenariowhere,forbudgetreasons,wehaveoneofthemembersstoringdataonSSDsandtheremainingdataonspinningdisks.WewillideallywantthememberwithSSDstobetheprimary,whenevertheprimaryserverisupandrunning.Itisonlywhenitisnotavailablethatwewillwantanothermembertobecomeaprimary.Insuchscenarios,wecansetthepriorityofthememberrunningonSSDtoavaluegreaterthan1.Thevaluedoesn’treallymatteraslongasitisgreaterthantherest;thatis,settingitto1.5or2makesnodifferenceaslongasthepriorityoftheothermembersisless.
Hidden,votes,slavedelayed,andbuildindexconfigurationsThetermhiddenforareplicasetnodeisforanapplicationclientthatisconnectedtothereplicasetandnotforanadministrator.Foranadministrator,itisequallyimportantforthehiddenmemberstobemonitoredandthus,theirstateisseeninthers.status()response.Hiddenmembersparticipateinelectionstoo,justlikeallothermembers.
Thoughvotesisanoptionthatisnotarecommendedsolutiontoaproblem,thereisaninterestingbehaviorthatneedstobementioned.Supposeyouhaveathree-memberreplicaset.Witheachinstanceofthereplicasethavingonevotebydefault,wehaveatotalofthreevotesinthereplicaset.Forareplicasettoallowwrites,amajorityofvotingmembersshouldbeup.However,thecalculationofamajoritydoesn’thappenusingthenumberofmembersupbutbythetotalnumberofvotes.Letusseehow.
Bydefault,withonevoteeach,ifoneofthemembersisdown,wehavetwooutofatotalofthreevotesavailable,andthus,thereplicasetcontinuestooperate.However,ifwehaveonememberwiththenumberofvotessetto2,wenowhaveatotaloffourvotes(1+1+2)inthereplicaset.Ifthismembergoesdown,eventhoughitissecondary,theprimarywillautomaticallystepdown,andthereplicasetwillbeleftwithnoprimary,thusnotallowingwrites.Thishappensbecausetwooutoffourpossiblevotesarenowgoneandwenolongerhaveamajorityofthevotesavailable.Ifthismemberwithtwovotesisaprimary,thenagainnomajoritycanbeformedastherearejustamaximumoftwovotesoutoffouravailable,andaprimarywon’tbeelected.Thusingeneral,asaruleofthumb,ifyouaretemptedtousethisvotesconfigurationoptionforyourusecase,thinkagain,asyoumayverywelluseotheroptionssuchaspriorityandarbiterOnlytoaddresstheseusecases.
FromVersion2.6ofMongoDB,thevotesoptionisdeprecated,andthefollowingmessagegetsprintedinthelogs:
[rsMgr]WARNING:Havingmorethan1voteonasinglereplicasetmemberis
[rsMgr]deprecated,asitcausesissueswithmajoritywriteconcern.For
[rsMgr]moreinformation,seehttp://dochub.mongodb.org/core/replica-set-
votes-deprecated
Thus,itisrecommendednottousethisoptionandpreferanalternativeconfigurationoption;insomefutureversionofMongoDB,itmightnotevenbesupported.
FortheslaveDelayoption,themostcommonusecaseistoensurethatthedatainamemberataparticularpointoftimelagsbehindtheprimarybytheprovidednumberofseconds.Itcanberestoredifsomeunforeseenerrorhappens,say,ahumanerroneouslyupdatingsomedata.Remember,thelongerthetimedelay,thelongerthetimewegettorecover,butatthecostofpossiblystaledata.
Finally,we’llseethebuildIndexesoption.Thisisusefulincaseswherewehaveareplicasetmemberwithnonproductionstandardhardwareandthecostofmaintainingtheindexesisnotworthit.Youmaychoosetosetthisoptionformemberswherenoqueriesareexecutedonthem.Obviously,ifyousetthisoption,theycanneverbecomeprimarymembersandthus,thepriorityoptionisenforcedtobesetto0.
There’smore…Youcanachievesomeinterestingthingsusingtagsinreplicasets.ThiswillbediscussedinalaterrecipeafterwelearnabouttagsintheBuildingtaggedreplicasetsrecipe.
SteppingdownasaprimaryinstancefromthereplicasetTherearetimeswhen,formaintenanceactivityduringbusinesshours,weneedtotakeaserveroutfromthereplicaset,performthemaintenance,andputitbackinthereplicaset.Iftheservertobeworkeduponistheprimary,wesomehowneedtostepdownfromtheprimarymemberposition,conductare-election,andensurethatitdoesn’tgetre-electedforaminimumgiventimeframe.Aftertheserverbecomesasecondaryoncethestepdownoperationissuccessful,wecantakeitoutofthereplicaset,performthemaintenanceactivity,andputitbackinthereplicaset.
GettingreadyRefertotheStartingmultipleinstancesaspartofareplicasetrecipeinChapter1,InstallingandStartingtheMongoDBServer,fortheprerequisitesandtoknowaboutreplicasetbasics.Setupasimplethree-nodereplicasetonyourcomputerasmentionedintherecipe.
Howtodoit…Assumingthatatthispointoftimewehaveareplicasetupandrunning,performthefollowingsteps:
1. Executethefollowingcommandfromtheshellconnectedtooneofthereplicasetmembersandseewhichinstanceiscurrentlytheprimary:
>rs.status()
2. ConnecttothatprimaryinstancefromtheMongoshellandexecutethefollowingcommandontheshell:
>rs.stepDown()
3. Theshellshouldreconnectagain,andyoushouldseethattheinstanceconnectedto,whichwasinitiallyaprimaryinstance,nowbecomessecondary.Executethefollowingcommandfromtheshellsothatanewprimaryisnowre-elected:
>rs.status()
4. Youmaynowconnecttotheprimary,modifythereplicasetconfiguration,andgoaheadwiththeadministrationontheservers.
Howitworks…Thestepswesawintheprevioussectionareprettysimple,butthereareacoupleofinterestingthingsthatwewillsee.
Thers.stepDown()methoddidnothaveanyparameter.Thefunctioncaninfacttakeanumericvalue,thenumberofsecondsforwhichtheinstancesteppeddownwon’tparticipateintheelectionsandwon’tbecomeaprimary;thedefaultvalueforthisis60seconds.
Anotherinterestingthingtotryoutis,whatiftheinstancethatwasaskedtostepdownhasahigherprioritythanotherinstances?Well,itturnsoutthattheprioritydoesn’tmatterwhenyoustepdown.Theinstancesteppeddownwillnotbecomeprimary,nomatterwhat,fortheprovidednumberofseconds.However,ifthepriorityissetfortheinstancesteppeddown,anditishigherthanothers,thenafterthetimegiventostepdownelapses,anelectionwillhappen,andtheinstancewiththehigherprioritywillbecomeprimaryagain.
ExploringthelocaldatabaseofareplicasetInthisrecipe,wewillexplorethelocaldatabasefromareplicaset’sperspective.Thelocaldatabasemaycontaincollectionsthatarenotspecifictoreplicasets,butwewillfocusonlyonthereplica-set-specificcollectionsandtrytotakealookatwhat’sinthemandwhattheymean.
GettingreadyRefertotheStartingmultipleinstancesaspartofareplicasetrecipeinChapter1,InstallingandStartingtheMongoDBServer,fortheprerequisitesandtoknowaboutreplicasetbasics.Goaheadandsetupasimplethree-nodereplicasetonyourcomputer,asmentionedintherecipe.
Howtodoit…1. Withthereplicasetupandrunning,weneedtoopenashellconnectedtotheprimary.
Youmayrandomlyconnecttoanyonemember,executers.status(),andthendeterminetheprimary.
2. Withtheshellopened,firstswitchtothelocaldatabaseandthenviewthecollectionsinthelocaldatabaseasfollows:
>uselocal
switchedtodblocal
>showcollections
3. Youshouldfindacollectioncalledme.Queryingthiscollectionshouldshowusadocumentthatcontainsthehostnameoftheservertowhichwearecurrentlyconnected:
>db.me.findOne()
Therewillbetwofields,hostnameand_id.Takenoteofthe_idfield;itisimportant.
4. Wewillnowquerytheslavescollectionasfollows:
>db.slaves.find().pretty()
5. Takeanoteofthefieldspresentinthesedocuments.6. Thenextcollectiontolookatisreplset.minvalid.Youwillhavetoconnecttoa
secondarymemberfromtheshelltoexecutethefollowingquery.Switchtothelocaldatabasefirstasfollows:
>uselocal
switchedtodblocal
>db.replset.minvalid.find()
Thiscollectionjustcontainsasingledocumentwithakeytsandavalue,whichisthetimestampforthetimethesecondaryweareconnectedtoissynchronized.Notedownthistime.
7. Fromtheshellintheprimary,insertadocumentinanycollection.Wewillusethedatabasetest.Executethefollowingcommandsfromtheshelloftheprimarymember:
>usetest
switchedtodbtest
>db.replTest.insert({i:1})
8. Querythesecondaryagainasfollows:
>db.replset.minvalid.find()
Weseethatthetimeagainstthetsfieldhasnowincrementedcorrespondingtothetimeatwhichthisreplicationhappenedfromprimarytosecondary.Withaslavedelayednode,youwillseethistimegettingupdatedonlyafterthedelayperiodhas
elapsed.
9. Finally,wewillseethesystem.replsetcollection.Thiscollectioniswherethereplicasetconfigurationisstored.Executethefollowingcommand:
>db.system.replset.find().pretty()
Actually,whenweexecuters.conf(),thefollowingquerygetsexecuted:
>db.getSisterDB("local").system.replset.findOne()
Howitworks…Thelocaldatabaseisaspecialdatabasethatisusedtoholdthereplicationandinstance-specificdetailsinit.Thisisanonreplicateddatabase.Trycreatingacollectionofyourowninthelocaldatabaseandinsertsomedatainit;itwillnotbereplicatedtothesecondarynodes.
ThisdatabasegivesusaviewofthedatastoredbyMongoforinternaluse.However,asanadministrator,itisgoodtoknowaboutthesecollectionsandthetypeofdatainthem.
Mostofthecollectionsareprettystraightforward.Wewilltakeacloserlookattheslavescollection.Let’stakealookatthefollowingexample:
{
"_id":ObjectId("52f138169da4944dff694e26"),
"config":{
"_id":1,
"host":"Amol-PC:27001"
},
"ns":"local.oplog.rs",
"syncedTo":Timestamp(1391928970,1)
}
Thiscollectioncontainsthedocumentforallthesecondarymembersthathavesynchedfromit.The_idfieldhereisnotarandomlychosenID,buthasthesamevalueasthe_idfieldofthedocumentinthemecollectionoftherespectivesecondarymembernodes.Fromtheshellofthesecondary,executethedb.me.findOne()queryinthelocaldatabaseandweshouldseethatthe_idfieldthereshouldmatchthe_idfieldofthedocumentpresentintheslavescollection.
Theconfigdocumentweseegivesthehostnameofthesecondaryinstancethatwearereferringto.Notethattheportandotherconfigurationoptionsofthereplicasetmemberarenotpresentinthisdocument.Finally,thesyncedTotimetellsuswhattimearesecondaryinstancesaresynceduptowiththeprimary.Wesawthereplset.minvalidcollectiononthesecondary,whichtellsthetimetowhichitissyncedwiththeprimary.ThisvalueinsyncedTointheprimarywillbethesameasinreplset.minvalidintherespectivesecondary.
UnderstandingandanalyzingoplogsOplogisaspecialcollectionandformsthebackboneoftheMongoDBreplication.Whenanywriteoperationorconfigurationchangesaredoneonthereplicaset’sprimary,theyarewrittentotheoplogontheprimary.Allthesecondarymembersthentailthiscollectiontogetthechangestobereplicated.TailingissynonymouswiththetailcommandinUnixandcanonlybedoneonaspecialtypeofcollectioncalledcappedcollections.Cappedcollectionsarefixedsizecollectionsthatmaintaintheinsertionorderjustlikeaqueue.Whenthecollection’sallocatedspacebecomesfull,theoldestdataisoverwritten.Ifyouarenotawareofcappedcollectionsandwhattailablecursorsare,refertotheCreatingandtailingcappedcollectioncursorsinMongoDBrecipeinChapter5,AdvancedOperations,formoredetails.
Oplogisacappedcollectionpresentinthenonreplicateddatabasecalledlocal.Inthepreviousrecipe,wesawwhatalocaldatabaseisandwhatcollectionsarepresentinit.Oplogissomethingwedidn’tdiscussinthepreviousrecipe,asitdemandsalotmoreexplanationandadedicatedrecipeisneededtodoitjustice.
GettingreadyRefertotheStartingmultipleinstancesaspartofareplicasetrecipeinChapter1,InstallingandStartingtheMongoDBServer,fortheprerequisitesandtoknowaboutthereplicasetbasics.Goaheadandsetupasimplethree-nodereplicasetonyourcomputerasmentionedintherecipe.Openashellandconnecttotheprimarymemberofthereplicaset.YouwillneedtostarttheMongoshellandconnecttotheprimaryinstance.
Howtodoit…1. Executethefollowingcommandsafterconnectingtoaprimaryfromtheshelltoget
thetimestampofthelastoperationpresentinoplog.Weareinterestedinlookingattheoperationsafterthistime.
>usetest
>local=db.getSisterDB('local')
>varcutoff=local.oplog.rs.find().sort({ts:-1}).limit(1).next().ts
2. Executethefollowingcommandfromtheshell.Keeptheoutputintheshellorcopyitsomewhere.Wewillanalyzeitlater.
>local.system.namespaces.findOne({name:'local.oplog.rs'})
3. Insert10documentsasfollows:
>for(i=0;i<10;i++)db.oplogTest.insert({'i':i})
4. Executethefollowingupdateoperationtosetastringvalueforalldocumentswiththevalueofigreaterthan5,whichare6,7,8,and9inourcase.Itisamultiupdateoperation:
>db.oplogTest.update({i:{$gt:5}},{$set:{val:'str'}},false,true)
5. Nowcreatetheindexasfollows:
>db.oplogTest.ensureIndex({i:1},{background:1})
6. Executethefollowingqueryonoplogasfollows:
>local.oplog.rs.find({ts:{$gt:cutoff}}).pretty()
Howitworks…Forthoseawareofmessaginganditsterminologies,oplogcanbelookedatasatopicinthemessagingworldwithoneproducer,whichistheprimaryinstance,andmultipleconsumers,whicharethesecondaryinstances.Theprimaryinstancewritestoanoplogallthecontentsthatneedtobereplicated.Thus,anycreate,update,anddeleteoperations,aswellasanyreconfigurationsonthereplicasetswillbewrittentotheoplog;andthesecondaryinstanceswilltail(continuouslyreadthecontentsoftheoplogbeingaddedtoit,whichissimilartoatailcommandwithan-foptioninUnix)thecollectiontogetdocumentswrittenbytheprimary.IfthesecondaryhasaslaveDelayconfigured,itwillnotreaddocumentsformorethanthemaximumtimeminustheslaveDelaytimefromtheoplog.
Westartedbysavinganinstanceofthelocaldatabaseinthevariablecalledlocalandidentifiedacutofftimethatwewillusetoqueryalltheoperationswewillperforminthisrecipefromtheoplog.
Executingaqueryonthesystem.namespacescollectioninthelocaldatabaseshowsusthatthecollectionisacappedcollectionwithafixedsize.Forperformancereasons,cappedcollectionsareallocatedcontinuousspaceonthefilesystemandthisspaceispreallocated.ThesizeallocatedbytheserverisdependentontheOSandCPUarchitecture.Whilestartingtheserver,theoplogSizeoptioncanbeprovidedtomentionthesizeoftheoplog.Thedefaultsaregenerallygoodenoughformostcases;however,fordevelopmentpurposes,onemaychoosetooverridethisvaluewithasmallervalue.Oplogsarecappedcollectionsthatneedtobepreallocatedaspaceonthedisk.Thispreallocationnotonlytakestimewhenthereplicasetisfirstinitialized,butalsotakesupafixedamountofdiskspace.Fordevelopmentpurposes,wegenerallystartmultipleMongoDBprocessesaspartofthesamereplicasetonthesamemachineandwantthemtobeupandrunningasquicklyaspossiblewithminimalresourceusage.Also,havingtheentireoploginmemorybecomespossibleiftheoplogsizeissmall.Forallthesereasons,itisadvisabletostartlocalinstancesfordevelopmentpurposeswithasmalloplogsize.
Weperformedsomeoperations,suchasinsert10documentsandupdatefourdocuments,usingamultiupdateoperation,andcreatedanindex.Ifwequerytheoplogforentriesafterthecutoffwecomputedearlier,wesee10documentsforeachinsertinit.Thedocumentlooksasfollows:
{
"ts":Timestamp(1392402144,1),
"h":NumberLong("-4661965417977826137"),
"v":2,
"op":"i",
"ns":"test.oplogTest",
"o":{
"_id":ObjectId("52fe5edfd473d2f623718f51"),
"i":0
}
}
Asseeninthepreviousexample,wefirstlookatthethreefields,namelyop,ns,ando.Thesefieldsstandfortheoperation,thefullyqualifiednameofthecollectionintowhichthedataisbeinginserted,andtheactualobjecttobeinserted.Theoperationistandsfortheinsertoperation.Notethatthevalueofo,whichisthedocumenttobeinserted,containsthe_idfieldthatgotgeneratedontheprimary.Weshouldsee10suchdocuments,oneforeachinsert.Whatisinterestingistoseewhathappensonamultiupdateoperation.Theprimaryputsfourdocuments,oneforeachofthemaffectedbytheupdates.Inthiscase,theopvalueisu,fortheupdate,andthequeryusedtomatchthedocumentisnotthesameaswegaveintheupdatefunction;rather,itisaquerythatuniquelyfindsadocumentbasedonthe_idfield.Asthereisanindexalreadyinplaceforthe_idfield(createdautomaticallyforeachcollection),thisoperationtofindthedocumenttobeupdatedisnotexpensive.Thevalueoftheofieldisthesameasthedocumentwepassedtotheupdatefunctionfromtheshell.Thesampledocumentintheoplogfortheupdateisasfollows:
{
"ts":Timestamp(1392402620,1),
"h":NumberLong("-7543933489976433166"),
"v":2,
"op":"u",
"ns":"test.oplogTest",
"o2":{
"_id":ObjectId("52fe5edfd473d2f623718f57")
},
"o":{
"$set":{
"val":"str"
}
}
}
Theupdateintheoplogisthesameastheoneweprovided,becausethe$setoperationisidempotent,whichmeansyoumayapplyanoperationsafelyanynumberoftimes.
However,anupdateusingthe$incoperatorisnotidempotent.Letusexecutethefollowingupdatequery:
>db.oplogTest.update({i:9},{$inc:{i:1}})
Inthiscase,theoplogwillhavethefollowingoutputasthevalueofo:
"o":{
"$set":{
"i":10
}
}
ThisnonidempotentoperationisputintooplogbyMongosmartly,asanidempotentoperationwiththevalueofisettoavaluethatisexpectedtobeaftertheincrementoperationonce.Thus,itissafetoreplayanoploganynumberoftimeswithoutcorruptingthedata.
Finally,wecanseethattheindexcreationprocessisputintheoplogasaninsertoperationinthesystem.indexescollection.However,thereissomethingtorememberduringindexcreationtillVersion2.4ofMongoDB.Anindexcreation,whetherforegroundorbackgroundontheprimary,isalwayscreatedintheforegroundonasecondaryandthus,forthatperiod,replicationwillnothappenonthatsecondaryinstance.Forlargecollections,indexcreationcantakehoursandthus,thesizeoftheoplogisveryimportanttoletthesecondarycatchupfromwhereithasn’treplicatedsincetheindexcreationstarted.However,sinceversion2.6,indexcreationinitiatedinthebackgroundontheprimarywillalsobebuiltinthebackgroundonsecondaryinstances.
Formoredetailsontheindexcreationonreplicasets,visithttp://docs.mongodb.org/master/tutorial/build-indexes-on-replica-sets/.
BuildingtaggedreplicasetsIntheStartingmultipleinstancesaspartofareplicasetrecipeinChapter1,InstallingandStartingtheMongoDBServer,wesawhowtosetupasimplereplicasetandwhatthepurposeofareplicasetis.WealsohaveagooddealofexplanationinAppendix,ConceptsforReference,onwhatwriteconcernisandwhyitisused.Whatwesawaboutwriteconcernsisthattheyofferaminimumlevelguaranteeforacertainwriteoperation.However,withtheconceptoftagsandwriteconcerns,wecandefineavarietyofrulesandconditionsthatmustbesatisfiedbeforeawriteoperationisdeemedsuccessfulandaresponseissenttotheuser.
Considersomecommonusecases:
Anapplicationwantsawriteoperationtobepropagatedtoatleastoneserverineachofitsdatacenters.Thisensuresthat,intheeventofadatacentershutdown,otherdatacenterswillhavethedatathatwaswrittenbytheapplication.Iftherearen’tmultipledatacenters,atleastonememberofareplicasetiskeptonadifferentrack.Forinstance,iftherack’spowersupplygoesdown,thereplicasetwillstillbeavailable(notnecessarilyforwrites)asatleastonememberisrunningonadifferentrack.Insuchscenarios,wewouldwantthewritetobepropagatedtoatleasttworacksbeforerespondingtotheclientwithasuccessfulwrite.Itispossiblethatareportingapplicationqueriesagroupofsecondaryinstancesofareplicasettogeneratesomereportsregularly(suchasecondarymightbeconfiguredtoneverbecomeprimary).Aftereachwrite,wewanttoensurethatthewriteoperationisreplicatedtoatleastonereportingreplicamember,beforeacknowledgingthewriteassuccessful.
Theprecedingusecasesareafewofthecommonusecasesthatariseandarenotaddressedusingsimplewriteconcernsthatwehaveseenearlier.Weneedadifferentmechanismtocatertotheserequirements;replicasetswithtagsarewhatweneed.
Obviouslythenextquestionis,Whatexactlyaretags?Letustakeanexampleofablog.Variouspostsinthebloghavedifferenttagsattachedtothem.Thesetagsallowustoeasilysearch,group,andrelatepoststogether.Tagsareuser-definedtextswithsomemeaningattachedtoit.Ifwedrawananalogybetweenablogpostandthereplicasetmembers,justasweattachtagstoapost,wecanattachtagstoeachreplicasetmember.Forexample,inamulti-datacenterscenariowithtworeplicasetmembersindatacenter1(dc1)andonememberindatacenter2(dc2),wecanhavethefollowingtagsassignedtothemembers.Thenameofthekeyandthevalueassignedtothetagarearbitrary,andtheyarechosenduringthedesigningoftheapplication.
Youmayevenchoosetoassignanytags,forexample,totheadministratorwhosetuptheserver,ifyoureallyfinditusefultoaddressyourusecase.
Replicasetmember Tag
Replicasetmember1 {'datacentre':'dc1','rack':'rack-dc1-1'}
Replicasetmember2 {'datacentre':'dc1','rack':'rack-dc1-2'}
Replicasetmember3 {'datacentre':'dc2','rack':'rack-dc2-2'}
Thisisgoodenoughtolaythefoundationofwhatreplicasettagsare.Inthisrecipe,wewillseehowtoassigntagstoreplicasetmembersand,moreimportantly,howtomakeuseofthemtoaddresssomeofthesampleusecaseswesawearlier.
GettingreadyRefertotheStartingmultipleinstancesaspartofareplicasetrecipeinChapter1,InstallingandStartingtheMongoDBServer,fortheprerequisitesandtoknowaboutreplicasetbasics.Goaheadandsetupasimplethree-nodereplicasetonyourcomputer,asmentionedintherecipe.Openashellandconnecttotheprimarymemberofthereplicaset.
Ifyouneedtoknowaboutwriteconcerns,refertotheoverviewofwriteconcernsAppendix,ConceptsforReference.
Forthepurposeofinsertingintothedatabase,wewillusePython,asitgivesusaninteractiveinterfacesuchastheMongoshell.RefertotheInstallingPyMongorecipeinChapter3,ProgrammingLanguageDrivers,forstepsonhowtoinstallPyMongo.TheMongoshellwouldhavebeenthemostidealcandidateforthedemonstrationoftheinsertoperations,buttherearecertainlimitationsaroundtheusageoftheshellwithourcustomwriteconcern.Technically,anyprogramminglanguagewiththewriteconcernsmentionedintherecipeforinsertoperationswillworkfine.
Howtodoit…1. Withthereplicasetstarted,wewilladdtagstoitandreconfigureusingthefollowing
commandsthatareexecutedfromtheMongoshell:
>varconf=rs.conf()
>conf[0].members.tags={'datacentre':'dc1','rack':'rack-dc1-1'}
>conf[1].members.tags={'datacentre':'dc1','rack':'rack-dc1-2'}
>conf[2].members.priority=0
>conf[2].members.tags={'datacentre':'dc2','rack':'rack-dc2-1'}
2. Withthereplicasettagsset(notethatwehavenotyetreconfiguredthereplicaset),weneedtodefinesomecustomwriteconcerns.First,wedefineonethatwillensurethatthedatagetsreplicatedatleasttooneserverineachdatacenter.ExecutethefollowingcommandsintheMongoshellagain:
>conf.settings={'getLastErrorModes':{'MultiDC':{datacentre:2}}}
>rs.reconfig(conf)
3. StartthePythonshellandexecutethefollowingcommands:
>>>importpymongo
>>>client=
pymongo.MongoReplicaSetClient('localhost:27000,localhost:27001',
replicaSet='replSetTest')
>>>db=client.test
4. Wewillnowexecutethefollowinginsertquery:
>>>db.multiDCTest.insert({'i':1},w='MultiDC',wtimeout=5000)
5. Theprecedinginsertquerygoesthroughsuccessfully,andObjectIdwillbeprintedout.YoumayquerythecollectiontoconfirmfromeithertheMongoshellorthePythonshell.
6. Asourprimaryisoneoftheserversindatacenter1,wewillnowstoptheserverlisteningtoport27002,whichistheonewithpriority0andtaggedtobeinadifferentdatacenter.
7. Oncetheserverisstopped(youmayconfirmusingthers.status()helperfunctionfromtheMongoshell),executethefollowinginsertqueryagain;thisinsertshouldthrowanerrorfortimeout:
>>>db.multiDCTest.insert({'i':2},w='MultiDC',wtimeout=5000)
8. RestartthestoppedMongoDBserver.9. Similarly,wecanachieverackawarenessbyensuringthatthewritepropagatesat
leasttworacks(inanydatacenter)bydefininganewconfigurationfromtheMongoshellasfollows
{'MultiRack':{rack:2}}
10. Thesettingsvalueoftheconfobjectwillthenbeasfollows.Onceset,reconfigurethereplicasetagainusingrs.reconfig(conf)fromtheMongoshellasfollows:
{
'getLastErrorModes':{
'MultiDC':{datacentre:2},
'MultiRack':{rack:2}
}
}
WesawWriteConcernusedwithreplicasettagstoachievefunctionalitysuchasdatacenterandrackawareness.Letusseehowwecanusereplicasettagswithreadoperations.
11. Wewillseehowtomakeuseofreplicasettagswithreadpreference.Letusreconfigurethesetbyaddingonemoretagtomarkasecondarymemberthatwillbeusedtoexecutesomehourlystatsreporting.
12. ExecutethefollowingstepstoreconfigurethesetfromtheMongoshell:
>varconf=rs.conf()
>conf.members[2].tags.type='reports'
>rs.reconfig(conf)
13. Thiswillconfigurethesamememberwithpriority0and1inadifferentdatacenterwithanadditionaltagcalledtypewiththevaluereports.
14. WenowgobacktothePythonshellandexecutethefollowingcommands:
>>>curs=
db.multiDCTest.find(read_preference=pymongo.ReadPreference.SECONDARY,
tag_sets=[{'type':'reports'}])
>>>curs.next()
15. Theprecedingexecutionshouldshowusonedocumentfromthecollection(aswehadinserteddatainthistestcollectionintheprevioussteps).
16. Stoptheinstancethatwetaggedforreporting,thatis,theserverlisteningtoconnectionsonport27002,andexecutethefollowingcommandonthePythonshellagain:
>>>curs=
db.multiDCTest.find(read_preference=pymongo.ReadPreference.SECONDARY,
tag_sets=[{'type':'reports'}])
>>>curs.next()
Thistimearound,theexecutionshouldfailandstatethatnosecondarywasfoundwiththerequiredtagsets.
Howitworks…Inthisrecipe,wedidalotofoperationsontaggedreplicasetsandsawhowtheycanaffectwriteoperationsusingWriteConcernandreadoperationsusingReadPreference.Letuslookattheminsomedetailnow.
WriteConcernintaggedreplicasetsWesetupareplicasetthatwasupandrunning,whichwereconfiguredtoaddtags.Wetaggedthefirsttwoserversindatacenter1andindifferentracks(withtheserversrunningandlisteningtoports27000and27001forclientconnections),andthethirdoneindatacenter2(withtheserverlisteningtoport27002forclientconnections).Wealsoensuredthatthememberindatacenter2doesn’tbecomeaprimarybysettingitspriorityto0.
Ourfirstobjectiveistoensurethatwriteoperationstothereplicasetgetreplicatedtoatleastonememberinthetwodatacenters.Toensurethis,wedefineawriteconcernasfollows:
{'MultiDC':{datacentre:2}}
Here,wefirstdefinethenameofthewriteconcernasMultiDC.Thevalue,whichisaJSONobject,hasonekeywiththenamedatacenter,whichisthesameasthekeyusedforthetagweattachedtothereplicaset,andthevalueisthenumber2,whichwillbelookedatasthenumberofdistinctvaluesofthegiventagthatshouldacknowledgethewritebeforeitisdeemedsuccessful.
Forinstance,inourcase,whenthewritecomestoserver1indatacenter1,thenumberofdistinctvaluesofthedatacentretagis1.Ifthewriteoperationgetsreplicatedtothesecondserver,thenumberstillstays1,asthevalueofthedatacentretagisthesameasthefirstmember.Itisonlywhenthethirdserveracknowledgesthewriteoperationthatthewritesatisfiesthedefinedconditionofreplicatingthewritetotwodistinctvaluesofthedatacentretaginthereplicaset.Notethatthevaluecanonlybeanumberandcannothavesomethingsuchas{datacentre:'dc1'}.Thisdefinitionisinvalidandanerrorwillbethrownwhilereconfiguringthereplicaset.
However,weneedtoregisterthiswriteconcernsomewherewiththeserver.ThisisdoneinthefinalstepoftheconfigurationbysettingthesettingsvalueinconfigurationJSON.ThevaluetosetisgetLastErrorModes.ThevalueofgetLastErrorModesisaJSONdocumentwithallpossiblewriteconcernsdefinedinit.Welaterdefineonemorewriteconcernforwritespropagatedtoatleasttworacks.ThisisconceptuallyinlinewiththeMultiDCwriteconcernandthus,wewillnotbediscussingitindetailhere.Aftersettingalltherequiredtagsandsettings,wereconfigurethereplicasetforthechangestotakeeffect.
Oncereconfigured,weperformsomewriteoperationsusingtheMultiDCwriteconcern.Whentwomembersintwodistinctdatacentersareavailable,thewritegoesthroughsuccessfully.However,whentheserverintheseconddatacentergoesdown,thewriteoperationtimesoutandthrowsanexceptiontotheclientinitiatingthewrite.Thisdemonstratesthatthewriteoperationwillsucceedorfailasperhowweintended.
Wejustsawhowthesecustomtagscanbeusedtoaddresssomeinterestingusecasesthatarenotsupportedbytheproductimplicitly,asfaraswriteoperationsareconcerned.Similartowriteoperations,readoperationscantakefulladvantageofthesetagstoaddresssomeusecases,suchasreadingfromafixedsetofsecondarymembersthataretaggedwithaparticularvalue.
ReadPreferenceintaggedreplicasetsWeaddedanothercustomtagannotatingamembertobeusedforreportingpurposes.Wethenfiredaqueryoperationwiththereadpreferencetoqueryasecondaryandprovidedthetagsetsthatshouldbelookedforbeforeconsideringthememberasacandidateforareadoperation.Rememberthatwhenusingaprimaryasthereadpreference,wecannotusetags,andthatisthereasonweexplicitlyspecifiedthevalueofread_preferencetoSECONDARY.
ConfiguringthedefaultshardfornonshardedcollectionsIntheStartingasimpleshardedenvironmentoftwoshardsrecipeinChapter1,InstallingandStartingtheMongoDBServer,wesetupasimpletwo-shardserver.IntheConnectingtoashardfromtheMongoshellandperformingoperationsrecipeinChapter1,InstallingandStartingtheMongoDBServer,weaddeddatatoapersoncollectionthatwassharded.However,foranycollectionthatisnotsharded,allthedocumentsendupononeshardcalledtheprimaryshard.Thissituationisacceptableforsmalldatabaseswitharelativelysmallnumberofcollections.However,if,thedatabasesizeincreasesandatthesametime,thenumberofunshardedcollectionsincreaseweendupoverloadingaparticularshard(theprimaryshardforadatabase)withalotofdatafromtheseunshardedcollections.Allqueryoperationsforsuchunshardedcollections,aswellasthoseonthecollectionswhoseparticularrangeintheshardresideonthisserverinstance,willbedirectedtothis.Insuchascenario,wecanhavetheprimaryshardofadatabasechangedtosomeotherinstancesothattheseunshardedcollectionsgetbalancedoutacrossdifferentinstances.Inthisrecipe,wewillseehowtoviewthisprimaryshardandchangeittosomeotherserverwheneverneeded.
GettingreadyRefertotheStartingasimpleshardedenvironmentoftwoshardsrecipeinChapter1,InstallingandStartingtheMongoDBServer,tosetupandstartashardedenvironment.Fromtheshell,connecttothestartedmongosprocess.Also,assumingthatthetwoshardserversarelisteningtothe27000and27001ports,connectfromtheshelltothesetwoprocesses.Sowehaveatotalofthreeshellsopened,oneconnectedtothemongosprocessandtwototheseindividualshards.
Weareusingthetestdatabaseforthisrecipe,andshardinghastobeenabledonthisdatabase.Ifit’snot,thenyouneedtoexecutethefollowingcommandsontheshellconnectedtothemongosprocess:
mongos>usetest
mongos>sh.enableSharding('test')
Howtodoit…1. Fromtheshellconnectedtothemongosprocess,executethefollowingtwo
commands:
mongos>db.testCol.insert({i:1})
mongos>sh.status()
2. Inthedatabases,lookoutforthetestdatabaseandtakenoteoftheprimary.Supposethatthefollowingisapart(showingthepartunderdatabasesonly)oftheoutputofsh.status():
databases:
{"_id":"admin","partitioned":false,"primary":"config"}
{"_id":"test","partitioned":true,"primary":"shard0000"}
3. Theseconddocumentunderthedatabasesshowsusthatthetestdatabaseisenabledforsharding(becausepartitionedistrue)andtheprimaryshardisshard0000.
4. Theprimaryshard,whichisshard0000inourcase,isthemongodprocesslisteningtoport27000.Opentheshellconnectedtothisprocessandexecutethefollowingquery:
>db.testCol.find()
5. Nowconnecttoanothermongodprocesslisteningtoport27001andexecutethefollowingqueryagain:
>db.testCol.find()
6. Notethatthedatawillbefoundonlyontheprimaryshardandnotonanyothershard.
7. ExecutethefollowingcommandfromtheMongosshell:
mongos>useadmin
mongos>db.runCommand({movePrimary:'test',to:'shard0001'})
8. ExecutethefollowingcommandagainfromtheMongoshellconnectedtothemongosprocess:
mongos>sh.status()
9. Fromtheshellconnectedtothemongosprocessesrunningonports27000and27001,executethefollowingquery:
>db.testCol.find()
Howitworks…Westartedashardedsetupandconnectedtoitfromthemongosprocess.WestartedbyinsertingadocumentinthetestColcollectionthatisnotenabledforshardinginthetestdatabase,whichisnotenabledforshardingaswell.Insuchcases,thedataliesonashardcalledtheprimaryshard.Donotmistakethisfortheprimaryofareplicaset.Thisisashard(thatitselfcanbeareplicaset),anditistheshardchosenbydefaultforalldatabasesandcollectionsforwhichshardingisnotenabled.
Whenweaddthedatatoanonshardedcollection,itisseenonlyontheshardthatisprimary.Executingsh.status()tellsustheprimaryshard.Tochangetheprimary,weneedtoexecuteacommandfromtheadmindatabasefromtheshellconnectedtothemongosprocess.Thecommandisasfollows:
db.runCommand({movePrimary:'<databasewhoseprimaryshardistobe
changed>',to:'<targetshard>'})
Oncetheprimaryshardischanged,allexistingdatainnonshardeddatabasesandcollectionsismigratedtothenewprimary,andallsubsequentwritestononshardedcollectionswillgotothisshard.
Usethiscommandwithcaution,asitwillmigratealltheunshardedcollectionstothenewprimary,whichmaytaketimeforbigcollections.
ManuallysplittingandmigratingchunksThoughMongoDBdoesagoodjobbydefaultofsplittingandmigratingchunksacrossshardstomaintainthebalance,undersomecircumstances,suchasasmallnumberofdocumentsorarelativelylargenumberofsmalldocuments,wheretheautomaticbalancerdoesn’tsplitthecollection,anadministratormightwanttosplitandmigratethechunksmanually.Inthisrecipe,wewillseehowtosplitandmigratethecollectionmanuallyacrossshards.Again,forthisrecipe,wewillsetupasimpleshardaswesawinChapter1,InstallingandStartingtheMongoDBServer.
GettingreadyRefertotheStartingasimpleshardedenvironmentoftwoshardsrecipeinChapter1,InstallingandStartingtheMongoDBServer,tosetupandstartashardedenvironment.Itispreferredtostartacleanenvironmentwithoutanydatainit.Fromtheshell,connecttothestartedmongosprocess.
Howtodoit…1. ConnecttothemongosprocessfromtheMongoshellandenableshardingonthetest
databaseandthesplitAndMoveTestcollectionasfollows:
>sh.enableSharding('test')
>sh.shardCollection('test.splitAndMoveTest',{_id:1},false)
2. Letusloadthedatainthecollectionasfollows:
>for(i=1;i<=10000;i++)db.splitAndMoveTest.insert({_id:i})
3. Oncethedataisloaded,executethefollowingcommand:
>db.splitAndMoveTest.find().explain()
Notethatthenumberofdocumentsinthetwoshardsintheplan.Thevaluetolookoutforinthetwodocumentsundertheshardskeyistheresultoftheexplainplan.Withinthesetwodocuments,thefieldtolookoutforisn.
4. Executethefollowingcommandstoseethesplitsofthecollection:
>config=db.getSisterDB('config')
>config.chunks.find({ns:'test.splitAndMoveTest'}).pretty()
5. Splitthechunkintotwoat5000,asfollows:
>sh.splitAt('test.splitAndMoveTest',{_id:5000})
6. Splittingitdoesn’tmigrateittothesecondserver.Seeexactlywhathappenswiththechunks,byexecutingthefollowingqueryagain:
>config.chunks.find({ns:'test.splitAndMoveTest'}).pretty()
7. Wewillnowmovethesecondchunktothesecondshard:
>sh.moveChunk('test.splitAndMoveTest',{_id:5001},'shard0001')
8. Executethefollowingqueryagainandconfirmthemigration:
>config.chunks.find({ns:'test.splitAndMoveTest'}).pretty()
9. Alternatively,thefollowingexplainplanwillshowasplitofabout50-50percent:
>db.splitAndMoveTest.find().explain()
Howitworks…Wesimulateasmalldataloadbyaddingmonotonicallyincreasingnumbersanddiscoverthatthenumbersarenotsplitacrosstwoshardsevenly,byviewingthequeryplan.Thisisnotaproblem,asthechunksizeneedstoreachaparticularthreshold,64MBbydefault,beforethebalancerdecidestomigratethechunksacrosstheshardstomaintainbalance.Thisisprettyperfectbecause,intherealworld,whenthedatasizegetshuge,wewillseethateventually,overaperiodoftime,theshardsarewellbalanced.
However,undersomecircumstances,whentheadministrationdecidestosplitandmigratethechunks;itispossibletodoitmanually.Thetwohelperfunctionssh.splitAtandsh.moveChunkaretheretodothiswork.Letuslookattheirsignaturesandseewhattheydo.
Thesh.splitAtfunctiontakestwoparameters.Thefirstparameteristhenamespace,whichhastheformat<database>.<collectionname>andthesecondparameteristhequerythatactsasthesplitpointtosplitthechunkintotwo,possiblytwouneven,portions,dependingonwherethegivendocumentisinthechunk.Thereisanothermethodnamedsh.splitFindthatwilltryandsplitthechunkintwoequalportions.
However,splittingdoesn’tmeanthechunkmovestoanothershard.Itjustbreaksonebigchunkintotwo,butthedatastaysonthesameshard.ItisaninexpensiveoperationthatinvolvesupdatingtheconfigDB.
Thenextoperationweexecuteistomigratethechunktoadifferentshardafterwesplititintotwo.Thesh.MoveChunkoperationisusedjusttodothat.Thisfunctiontakesthreeparameters.Thefirstoneisagainthenamespaceofthecollection,whichhastheformat<database>.<collectionname>;thesecondparameterisaqueryadocument,whosechunkwouldbemigrated;andthethirdparameteristhedestinationchunk.
Oncethemigrationisdone,thequery’splanshowsusthatthedataissplitintotwochunks.
Performingdomain-drivenshardingusingtagsTheStartingasimpleshardedenvironmentoftwoshardsandConnectingtoashardfromtheMongoshellandperformingoperationsrecipesinChapter1,InstallingandStartingtheMongoDBServer,explainedhowtostartasimpletwo-servershardandtheninsertdatainacollectionafterchoosingashardkey.Thedatathatgetsshardedismoretechnical,wherethedatachunkiskepttoamanageablesizebyMongo,bysplittingitintomultiplechunksandmigratingthechunksacrossshardstokeepthechunkdistributionevenacrossshards.However,whatifwewanttheshardingtobemoredomain-oriented?Supposewehaveadatabaseforstoringpostaladdressesandweshardbasedonpostalcodes,whereweknowthepostalcoderangeofacity.Whatwecandoistagtheshardserversaccordingtothecitynameasthetag,addtheshardrange(postalcodes),andassociatethisrangewiththetag.
Thisway,wecanstatewhichserverscancontainthepostaladdressesofwhichcities.Forinstance,weknowthatforMumbai,beingthemostpopulouscity,thenumberofaddresseswouldbehugeandthusweaddtwoshardsforMumbai.Ontheotherhand,oneshardshouldbeenoughtocopewiththevolumesofPune;sofornowwetagjustoneshard.Inthisrecipe,wewillseehowtoachievethisusecaseusingtag-awaresharding.Ifthedescriptionisconfusing,don’tworry;wewillseehowtoimplementwhatwejustdiscussed.
GettingreadyRefertotheStartingasimpleshardedenvironmentoftwoshardsrecipesinChapter1,InstallingandStartingtheMongoDBServer,forhowtostartasimpleshard.However,forthisrecipe,wewilladdanadditionalshard.So,wewillnowstartthreeMongoDBserverslisteningtoports27000,27001,and27002.Again,itisrecommendedtostartoffwithacleandatabase.Forthepurposeofthisrecipe,wewillbeusingtheuserAddresscollectiontostorethedata.
Howtodoit…1. Assumingthatwehavethreeshardsupandrunning;letusexecutethefollowing
commands:
mongos>sh.addShardTag('shard0000','Mumbai')
mongos>sh.addShardTag('shard0001','Mumbai')
mongos>sh.addShardTag('shard0002','Pune')
2. Withthetagsdefined,letusdefinetherangeofthepincodesthatwillmaptoatag,asfollows:
mongos>sh.addTagRange('test.userAddress',{pincode:400001},
{pincode:400999},'Mumbai')
mongos>sh.addTagRange('test.userAddress',{pincode:411001},
{pincode:411999},'Pune')
3. EnableshardingforthetestdatabaseanduserAddresscollectionasfollows:
mongos>sh.enableSharding('test')
mongos>sh.shardCollection('test.userAddress',{pincode:1})
4. InsertthefollowingdocumentsintheuserAddresscollection:
mongos>db.userAddress.insert({_id:1,name:'Varad',city:'Pune',
pincode:411001})
mongos>db.userAddress.insert({_id:2,name:'Rajesh',city:'Mumbai',
pincode:400067})
mongos>db.userAddress.insert({_id:3,name:'Ashish',city:'Mumbai',
pincode:400101})
5. Executethefollowingexplainplans:
mongos>db.userAddress.find({city:'Pune'}).explain()
mongos>db.userAddress.find({city:'Mumbai'}).explain()
Howitworks…Supposewewanttopartitiondatadrivenbydomaininashard;wecanusetag-awaresharding.Itisanexcellentmechanismthatletsustagtheshardsandthensplitthedatarangeacrossshardsidentifiedbythetags.Wedon’treallyhavetobotherabouttheactualmachinesandtheiraddresseshostingtheshard.Tagsactasagoodabstraction,intheway,wecantagashardwithmultipletagsandonetagcanbeappliedtomultipleshards.
Inourcase,wehavethreeshardsandweapplytagstoeachofthemusingthesh.addShardTagmethod.ThismethodtakestheshardID,whichwecanseeinthesh.statuscallwiththe“shards”key.Thissh.addShardTagcanbeusedtokeepaddingtagstoashard.Similarly,thereisahelpermethodsh.removeShardTagtoremoveanassignmentofthetagfromtheshard.Boththesemethodstaketwoparameters,thefirstoneistheshardIDandthesecondoneofthetagtoremove.
Oncethetaggingisdone,weassigntherangeofthevaluesoftheshardkeytothetag.Thesh.addTagRangemethodisusedtodothat.Itacceptsfourparameters;thefirstoneisthenamespace(thefullyqualifiednameofthecollection),thesecondandthirdparametersarethestartandendvaluesoftherangeforthisshardkey,andthefourthparameteristhetagnameoftheshardshostingtherangebeingadded.Forexample,thecallsh.addTagRange('test.userAddress',{pincode:400001},{pincode:400999},
'Mumbai')saysweareaddingtheshardrangefrom400001to400999forthetest.userAddresscollectionandthisrangewillbestoredintheshardstaggedasMumbai.
Oncethetaggingandaddingtagrangesaredone,weenableshardingonthedatabaseandcollection,andadddatatoitfromMumbaiandPunewiththerespectivepincodes.Wethenqueryandexplaintheplan,toseethatthedatadidindeedresideontheshardswehavetaggedforPuneandMumbaicity.Wecanalsoaddnewshardstothisshardedsetupandaccordinglytagthenewshard.Thebalancerwillthenaccordinglybalancethedatabasedonthevalueithastagged.Forinstance,iftheaddressesinPuneincrease,thusoverloadingashard,wecanaddanewshardwiththetagasPune.ThepostaladdressesforPunewillthenbeshardedacrossthesetwoserverinstancestaggedforPunecity.
ExploringtheconfigdatabaseinashardedsetupTheconfigdatabaseisthebackboneofashardedsetupinMongo.Itstoresallthemetadataoftheshardsetupandhasadedicatedmongodprocessrunningforit.Whenamongosprocessisstarted,weprovideitwiththeconfigserver’sURL.Inthisrecipe,wewilltakealookatsomecollectionsintheconfigdatabaseanddivedeepintotheircontentandsignificance.
GettingreadyWewillhaveashardedsetupforthisrecipe.RefertotheStartingasimpleshardedenvironmentoftwoshardsinChapter1,InstallingandStartingtheMongoDBServer,forhowtostartasimpleshard.Additionally,connecttothemongosprocessfromashell.
Howtodoit…1. Fromtheconsoleconnectedtothemongosprocess,switchtotheconfigdatabaseand
executeviewallcollectionsasfollows:
mongos>useconfig
mongos>showcollections
2. Fromthelistofallcollections,wewillvisitafew.Westartwiththedatabasescollection.Thiskeepsatrackofallthedatabasesonthisshard.Executethefollowingcommandfromtheshell:
mongos>db.databases.find()
Thecontentoftheresultisprettystraightforward.Thevalueofthe_idfieldisforthedatabase.Thevalueofthefieldpartitionedtellsuswhethershardingisenabledforthedatabaseornot;trueindicatesitisenabledandthefieldprimarygivestheprimaryshardwherethedataofnonshardedcollectionsresides.
3. Thenextcollectionwewillvisitiscollections.Executethefollowingcommandfromtheshell:
mongos>db.collections.find().pretty()
Thiscollection,unlikethedatabasescollectionwesawearlier,containsonlythosecollectionsforwhichwehaveenabledsharding.The_idfieldgivesthenamespaceofthecollectioninthe<database>.<collectionname>format,thekeyfieldgivestheshardkey,andtheuniquefieldindicateswhethertheshardkeyisuniqueofnot.Thesethreefieldscomeasthethreeparametersofthesh.shardCollectionfunctioninthatveryorder.
4. Next,welookatthechunkscollection.Executethefollowingcommandontheshell.Ifthedatabasewascleanwhenwestartedthisrecipe,wewon’thavealotofdatainthis:
mongos>db.chunks.find().pretty()
5. Wethenlookatthetagscollection.Executethefollowingquery:
mongos>db.tags.find().pretty()
6. Letusquerythemongoscollectionasfollows.Thisisasimplecollectiongivingthelistofallmongosinstancesconnectedtotheshard,withthedetailssuchasthehostnameandportonwhichthemongosinstanceisrunning,whichformsthe_idfield,andtheversionandfigures,suchasforhowmuchtimetheprocesshasbeenupandrunninginseconds.
mongos>db.mongos.find().pretty()
7. Finally,welookattheversioncollection.Executethefollowingquery(notethatitisnotsimilartootherqueriesweexecute):
mongos>db.getCollection('version').findOne()
Howitworks…Wesawthecollectionsanddatabasescollectionwhenwequeriedthem;theyareprettysimple.Letuslookatthecollectioncalledchunks.Thefollowingisasampledocumentfromthiscollection:
{
"_id":"test.userAddress-pincode_400001.0",
"lastmod":Timestamp(1,3),
"lastmodEpoch":ObjectId("53026514c902396300fd4812"),
"ns":"test.userAddress",
"min":{
"pincode":400001
},
"max":{
"pincode":411001
},
"shard":"shard0000"
}
Thefieldsofinterestarens(thenamespaceofthecollection),min(theminimumvaluepresentinthechunk),max(themaximumvaluepresentinthechunk),andshard(theshardonwhichthischunklies).Thevalueofthechunksizeis64MBbydefault.Thiscanbeseeninthesettingscollection.Executedb.settings.find()fromtheshellandlookatthevalueofthefieldvalue,whichisthesizeofthechunkinMB.Chunksarerestrictedtothissmallsizetoeasethemigrationprocessacrossshardsifneeded.Whenthesizeofthechunkexceedsthisthreshold,theMongoDBserverfindsasuitablepointintheexistingchunktobreakitintotwo,andaddsanewentryinthischunk’scollection.Thisoperationiscalledsplittingandisinexpensive,asthedatastayswhereitis.Itisjustlogicallysplitintomultiplechunks.ThebalanceronMongotriestokeepthechunksacrossshardsbalanced,andthemomentitseessomeimbalance,itmigratesthesechunkstoadifferentshard,whichisexpensiveandalsodependslargelyonthenetworkbandwidth.Ifweexecutesh.status(),theimplementationactuallyqueriesthecollectionswesawearlierandprintstheformattedresult.
Chapter5.AdvancedOperationsInthischapter,wewillcoverthefollowingrecipes:
AtomicfindandmodifyoperationsImplementingatomiccountersinMongoDBImplementingserver-sidescriptsCreatingandtailingcappedcollectioncursorsinMongoDBConvertinganormalcollectiontoacappedcollectionStoringbinarydatainMongoDBStoringlargedatainMongousingGridFSStoringdatatoGridFSfromaJavaclientStoringdatatoGridFSfromaPythonclientImplementingtriggersinMongoDBusingoplogExecutingflatplane(2D)geospatialqueriesinMongousinggeospatialindexesSphericalindexesandGeoJSON-compliantdatainMongoDBImplementingafull-textsearchinMongoDBIntegratingMongoDBwithElasticsearchforafull-textsearch
IntroductionInChapter2,Command-lineOperationsandIndexes,wesawhowtoperformbasicoperationsfromtheshelltoquery,update,andinsertdocuments.Wealsoexploreddifferenttypesofindexesandindexcreation.Inthischapter,wegoaheadandseesomeoftheadvancedfeaturesofMongo,suchasGridFS,geospatialindexes,andfull-textsearch.Otherrecipeswewillseeincludeanintroductiontocappedcollectionsandtheirusesandimplementingserver-sidescriptsinMongoDB.
AtomicfindandmodifyoperationsInChapter2,Command-lineOperationsandIndexes,wehadsomerecipesthatexplainedvariousCRUDoperationsthatweperforminMongoDB.Therewasoneconceptthatwedidn’tcoverthatis,atomicallyfindingandmodifyingdocuments.Modifyconsistsofbothupdateanddeleteoperations.Inthisrecipe,wewillseefindandmodifyoperationsinsomedetailand,inthenextrecipe,ImplementingatomiccountersinMongoDB,wewillputthemtouseinimplementingcounters.
GettingreadyRefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,andstartasingleinstanceofMongoDB.Thatistheonlyprerequisiteforthisrecipe.StartaMongoshellandconnecttothestartedserver.
Howtodoit…WewilltestadocumentintheatomicOperationsTestcollectionasfollows:
1. ExecutethefollowingcommandsfromtheMongoshell:
>db.atomicOperationsTest.drop()
>db.atomicOperationsTest.insert({i:1})
2. ExecutethefollowingcommandsfromtheMongoshellandobservetheoutput:
>db.atomicOperationsTest.findAndModify({
query:{i:1},
update:{$set:{text:'TestString'}},
new:false
}
)
3. Wewillexecuteanotheronethistime,butwithslightlydifferentparameters;observetheoutputthistimearound:
>db.atomicOperationsTest.findAndModify({
query:{i:1},
update:{$set:{text:'UpdatedString'}},fields:{i:1,text:1,
_id:0},
new:true
}
)
4. Wewillexecuteanotherupdatethistimethatwillupsertthedocumentasfollows:
>db.atomicOperationsTest.findAndModify({
query:{i:2},
update:{$set:{text:'TestString'}},
fields:{i:1,text:1,_id:0},
upsert:true,
new:true
}
)
5. Nowquerythecollectiononce,asfollows,andseethedocumentspresent:
>db.atomicOperationsTest.find().pretty()
6. Wewillfinallyexecutethedeleteoperationasfollows:
>db.atomicOperationsTest.findAndModify({
query:{i:2},
remove:true,
fields:{i:1,text:1,_id:0},
new:false
}
)
Howitworks…IfweperformthefindandupdateoperationsindependentlybyfirstfindingthedocumentandthenupdatingitinMongoDB,theresultsmightnotbeasexpectedastheremightbeaninterleavingupdatebetweenthefindandtheupdateoperationsthatwillchangethedocumentstate.Insomeofthespecificusecases,suchasimplementingatomiccounters,thisisnotacceptableandthus,weneedawaytoatomicallyfind,update,andreturnadocument.Thereturnedvalueiseithertheonebeforetheupdateisappliedoraftertheupdateisapplied,andthisisdecidedbytheinvokingclient.
Nowthatwehaveexecutedthestepsintheprecedingsection,letusseewhatweactuallydidandwhatallthesefieldsintheJSONdocument,whicharepassedasparameterstothefindAndModifyoperation,mean.Startingwithstep2,wegaveadocumentasaparametertothefindAndModifyfunctionthatcontainsthefollowingfields.Thefieldsquery,update,andnewareusedtospecifythequerythatwillbeusedtofindthedocument,theupdatethatwillbeappliedtoit,andaBooleanvaluethatwillbeusedtospecifywhetherthedocumentreturnedbytheoperationistheoneaftertheupdateisappliedorbeforeitwasapplied.Inthiscase,thevalueofthenewflagisfalse.Theresultingdocumentreturnedistheonebeforetheupdateisapplied.
Instep3,weactuallyaddedanewfieldtothedocument,passedasaparametercalledfields,thatisusedtoselectalimitedsetoffieldsfromtheresultingdocumentreturned.Also,thevalueofthenewfieldistrue,whichindicatesthatwewanttheupdateddocument;thatis,theoneaftertheupdateoperationisexecutedandnottheonebeforetheupdate.
Instep4,theparametercontainedanewfieldcalledupsert,whichupserted(update+insert)thedocument.Thatis,ifthedocumentwiththegivenqueryisfound,itisupdated;otherwise,anewoneiscreatedandupdated.Ifthedocumentdidn’texistandanupserthappens,havingthevalueofthenewparameterasfalsewillreturnnull.Thisisbecausetherewasnothingpresentbeforetheupdateoperationwasexecuted.
Finally,instep6,theparameterdidn’thavetheupdatefieldbuthadtheremovefieldwiththevaluetrue,indicatingthatthedocumentistoberemoved.Also,thevalueofthenewfieldwasfalse,whichmeansthatweexpectthedocumentthatgotdeleted.
SeealsoTheImplementingatomiccountersinMongoDBrecipe,toseehowtoimplementtheusecasethatisusedtodevelopanatomiccounterinMongo
ImplementingatomiccountersinMongoDBAtomiccountersareanecessityforalargenumberofusecases.Mongodoesn’thaveabuilt-infeatureforatomiccounters;nevertheless,itcanbeeasilyimplementedusingsomeofitscoolofferings.Infact,implementingitismerelyacoupleoflinesofcode.RefertothepreviousrecipetoknowwhatatomicfindandmodifyoperationsareinMongo.
GettingreadyRefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,andstartasingleinstanceofMongo.Thisistheonlyprerequisiteforthisrecipe.StartaMongoshellandconnecttothestartedserver.
Howtodoit…1. ExecutethefollowingpieceofcodefromtheMongoshell:
>functiongetNextSequence(counterId){
returndb.counters.findAndModify(
{
query:{_id:counterId},
update:{$inc:{count:1}},
upsert:true,
fields:{count:1,_id:0},
new:true
}
).count
}
2. Now,fromtheshell,invokethefollowingcommands:
>getNextSequence('PostsCounter')
>getNextSequence('PostsCounter')
>getNextSequence('ProfileCounter')
Howitworks…ThefunctionisassimpleasafindAndModifyoperationonacollectionusedtostoreallthecounters.ThecounterIDisthe_idfieldofthedocumentstored,andthevalueofthecounterisstoredinthecountfield.ThedocumentpassedtofindAndModifyacceptsthequerythatuniquelyidentifiesthedocumentstoringthecurrentcount,whichisaqueryusingthe_idfield.Theupdateoperationisan$incoperationthatwillincrementthevalueofthecountfieldbyone,butwhatifthedocumentdoesn’texist?Thiswillhappenduringthefirstinvocationofthecounter.Totakecareofthisscenario,wewillbesettingtheupsertflagtotrue,whichatomicallyeitherupdatesthedocumentfieldorcreatesone.Thevalue,thus,willalwaysstartwithone,andtherearenowaysinthisfunctionbywhichwecanhaveanyuser-definedstartnumberforthesequenceandacustom-incrementedstep.Toaddresssuchrequirements,wewillhavetospecificallyaddadocumentwiththeinitializedvaluestothecounterscollection.Finally,weareinterestedinthestateofthecounterafterthevalueisincremented.Hence,wesetthevalueofthenewfieldastrue.
Oninvokingthismethodthreetimes,aswedid,weshouldseethefollowinginthecounterscollection.Simplyexecutethefollowingquery:
>db.counters.find()
{"_id":"PostsCounter","count":2}
{"_id":"ProfileCounter","count":1}
Usingthissmallfunction,wehavenowimplementedatomiccountersinMongo.
WecanstoresuchcommoncodeontheMongoserver,whichwillbeavailableforexecutioninotherfunctions.
Implementingserver-sidescriptsInthisrecipe,wewillseehowtowriteserver-storedJavaScriptthatissimilartostoredproceduresinrelationaldatabases.Thisisacommonusecase,whereotherpiecesofcoderequireaccesstothesecommonfunctionsandwehavetheminonecentralplace.Thefunctionfordemopurposeissimple;wewilladdtwonumbers.Therearetwopartstothisrecipe.First,we’llseehowtoloadthescriptsfromthecollectionsontheclient-sideJavaScriptshellandthen,wewillseehowtoexecutethesefunctionsontheserver.
NoteThedocumentationspecificallymentionsthatitisnotrecommendedtouseserver-sidescripts.Securityisoneconcernthoughifthedataisnotproperlyauditedand,hence,needtobecarefulaboutwhatfunctionsaredefined.SincethelaunchofMongo2.4,theserver-sideJavaScriptengineisV8,whichcanexecutemultiplethreadsinparallel,asopposedtotheenginepriortoVersion2.4ofMongo,whichexecutesonlyonethreadatatime.
GettingreadyRefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,andstartasingleinstanceofMongo.Thisistheonlyprerequisiteforthisrecipe.StartaMongoshellandconnecttothestartedserver.
Howtodoit…1. Createanewfunctioncalledaddandsaveittothedb.system.jscollectionas
follows.Thecurrentdatabaseshouldbetest:
>usetest
>db.system.js.save({_id:'add',value:function(num1,num2){return
num1+num2}})
2. Nowthatthisfunctionisdefined,loadallthefunctionsasfollows:
>db.loadServerScripts()
3. Invokeaddandseeifitworks:
>add(1,2)
4. Wewillusetheaddfunctionandexecutethisontheserversideinstead.Executethefollowingcommandsfromtheshell:
>usetest
>db.eval('returnadd(1,2)')
5. Executethefollowingcommands:
>usetest1
>db.eval('returnadd(1,2)')
Howitworks…Thesystem.jscollectionisaplainoldcollectionjustlikeanyothercollection.Weaddanewserver-sideJavaScriptusingthesavefunctioninthiscollection.Thesavefunctionisjustaconveniencefunctionthatinsertsthedocumentifitisnotpresentorupdatesanexistingone.Theobjectiveistoaddanewdocumenttothiscollection,whichyoumayaddusinginsertorupsert.
ThesecretliesintheloadServerScriptsmethod.Themethodexecutesthefollowinglineofcode:
this.system.js.find().forEach(function(u){eval(u._id+"="+u.value);});
ThisevaluatesJavaScriptusingtheevalfunction,anditassignsthefunctiondefinedinthevalueattributeofthedocumenttoavariablenamedwiththenamegiveninthe_idfieldofthedocument,foreachdocumentpresentinthesystem.jscollection.
Forexample,ifthe{_id:'add',value:function(num1,num2){returnnum1+num2}}documentispresentinthesystem.jscollection,thefunctiongiveninthevaluefieldofthedocumentwillbeassignedtothevariablenamedasaddinthecurrentshell.Theaddvalueisgiveninthe_idfieldofthedocument.
Thesescriptsdonotreallyexecuteontheserver,buttheirdefinitionisstoredontheserverinacollection.TheloadServerScriptsmethodjustinstantiatessomevariablesinthecurrentshellandmakesthosefunctionsavailableforinvocation.ItistheJavaScriptinterpreteroftheshellthatexecutesthesefunctionsandnottheserver.Thesystem.jscollectionisdefinedinthescopeofthedatabase,butonceloaded,theseareJavaScriptfunctionsdefinedintheshellandhence,thefunctionsareavailablethroughoutthescopeoftheshell,irrespectiveofthedatabasecurrentlyactive.
Asfarassecurityisconcerned,iftheshellisconnectedtotheserverwithsecurityenabled,theuserinvokingloadServerScriptsmusthaveprivilegestoreadthecollectionsinthedatabase.Formoredetailsonenablingsecurityandvariousrolesausercanhave,refertotheSettingupusersinMongoDBrecipeinChapter4,Administration.Aswesawearlier,theloadServerScriptsfunctionreadsdatafromthesystem.jscollectionandthus,iftheuserdoesn’thaveprivilegestoreadfromthecollection,thefunctioninvocationwillfail.Apartfromthat,thefunctionsexecutedfromtheshellafterbeingloadedshouldhaveappropriateprivileges.Forinstance,ifafunctioninserts/updatesinanycollection,theusershouldhavereadandwriteprivilegesonthatparticularcollectionaccessedfromthefunction.
Executingscriptsontheserverisperhapswhatonewouldexpecttobetheserver-sidescript.asagainstexecutingintheconnectedshell.Inthiscase,thefunctionsareevaluatedontheserver’sJavaScriptengineandthesecuritychecksaremorestringentaslong-runningfunctionscanholdlockshavingdetrimentaleffectsontheperformance.ThewrappertoinvoketheexecutionoftheJavaScriptcodeontheserversideisthedb.evalfunction,acceptingthecodetoevaluateontheserversidealongwiththeparameters,ifany.
Beforeevaluatingthefunction,thewriteoperationtakesagloballock.Thiscanbeskippedifthenolockparameterisused.Forinstance,theprecedingaddfunctioncanbeinvokedasfollows,insteadofcallingdb.eval,andwillachievethesameresults.Additionally,weprovidedthenolockfieldtoinstructtheservernottoacquirethegloballockbeforeevaluatingthefunction.Ifthefunctionperformsanyread/writeoperationsonthecollection,itwillacquirelocksasusual,andthisfielddoesn’taffectthisbehavior.
>db.runCommand({eval:function(num1,num2){returnnum1+num2},args:[1,
2],nolock:true})
Ifsecurityisenabledontheserver,theinvokinguserneedstohavefourroles,namely,userAdminAnyDatabase,dbAdminAnyDatabase,readWriteAnyDatabase,andclusterAdmin(ontheadmindatabase),tosuccessfullyinvokethedb.evalfunction.
Programminglanguagesdoprovideawayfortheinvocationofsuchserver-sidescriptsaswellusingtheevalfunction.Forinstance,inJavaAPI,thecom.mongodb.DBclasshastheevalmethodtoinvokeserver-sideJavaScriptcode.Suchserver-sideexecutionsarehighlyusefulwhenwewanttoavoidunnecessarynetworktrafficforthedataandgettheresulttotheclients.However,toomuchlogiconthedatabaseservercanquicklymakethingsdifficulttomaintainandcanaffecttheperformanceoftheserverbadly.
CreatingandtailingcappedcollectioncursorsinMongoDBCappedcollectionsarefixed-sizecollectionsandtheyactlikequeues.Thedocumentsaddedtoitareaddedtowardstheendofthecollection,removingtheoldestentryinthecollection,ifthespaceallocatedtothecollectionbecomesfull.Theyprovidefastaccesstothelimited-sizedcollectionsevenwithouttheuseoftheindex.Theyarenaturallysortedbytheorderoftheinsertion,andanyretrievalneededonthemorderedbytimecanberetrievedusingthe$naturalsortorder.Thefollowingdiagramgivesapictorialrepresentationofacappedcollectionwhosesizeisenoughtoholduptothreedocumentsofequalsize(whichistoosmallforanypracticaluse,butgoodforillustrationpurposes).Asweseeinthediagram,thecollectionissimilartoacircularqueue,wheretheoldestdocumentisreplacedbythenewlyaddeddocument,shouldthecollectionbecomefull:
TailablecursorsareaspecialtypeofcursorthattailsthecollectionjustasatailcommandinUnixdoes.Thesecursorsiteratethroughthecollectionlikenormalcursorsdobutadditionally,theywaitfordatatobeavailableinthecollectionifitisnotavailable.Wewillseecappedcollectionsandtailablecursorsindetailinthisrecipe.
GettingreadyRefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,andstartasingleinstanceofMongo.Thisistheonlyprerequisiteforthisrecipe.StartaMongoshellandconnecttothestartedserver.
Howtodoit…Therearetwopartstothisrecipe.Inthefirstpart,wewillbecreatingacappedcollectioncalledtestCappedandwilltryperformingsomebasicoperationsonit.Inthesecondpart,wewillbecreatingatailablecursoronthecappedcollectionwecreated.
1. First,wewilldropitifacollectionwiththetestCappednameexists,asfollows:
>db.testCapped.drop()
2. Nowcreateacappedcollectionasfollows(notethatthesizegivenhereisinbytesallocatedforthecollectionandnotthenumberofdocumentsitcontains):
>db.createCollection('testCapped',{capped:true,size:100})
3. Wewillnowinsert100documentsinthecappedcollection,asfollows:
>for(i=1;i<100;i++){
db.testCapped.insert({'i':i,val:'Testcapped'})
}
4. Nowquerythecollectionasfollows:
>db.testCapped.find()
5. Trytoremovethedatafromthecollection,asfollows:
>db.testCapped.remove()
Youshouldgetanerrorafterexecutingthepreviouscommand
6. Wewillnowcreateanddemonstrateatailablecursor.Itisrecommendedthatyoutype/copythefollowingpiecesofcodeintoatexteditorandkeepithandyforexecution.
7. Toinsertdatainacollection,wewillbeexecutingthefollowingfragmentofcode.Executethispieceofcodeintheshellasfollows(notethatthisexecutionwilltakequitesometime):
>for(i=101;i<500;i++){
sleep(1000)
db.testCapped.insert({'i':i,val:'TestCapped'})
}
8. Totailacappedcollection,weexecutethefollowingpieceofcode:
>varcursor=
db.testCapped.find().addOption(DBQuery.Option.tailable).addOption(DBQue
ry.Option.awaitData)
while(cursor.hasNext()){
varnext=cursor.next()
print('i:'+next.i+',value:'+next.val)
}
9. Openashellandconnecttotherunningmongodprocess.Thiswillbethesecondshellopenedandconnectedtotheserver.Copyandpastethecodegiveninstep8inthis
shellandexecuteit.10. Observehowtherecordsinsertedareshown,astheyareinsertedintothecapped
collection.
Howitworks…WecreatedacappedcollectionexplicitlyusingthecreateCollectionfunction.Thisistheonlywayacappedcollectioniscreated.TherearetwoparameterstothecreateCollectionfunction.Thefirstoneisthenameofthecollection.ThesecondparameterisaJSONdocumentthatcontainstwofields,cappedandsize,whichareusedtoinformwhetherthecollectioniscappedornot,andthesizeofthecollectioninbytes,respectively.Anadditionalfieldmaxcanbeprovidedtospecifythemaximumnumberofdocumentsinthecollection.Thefieldsizeisrequiredevenifthemaxfieldisspecified.Wetheninsertandquerythedocuments.Whenwetrytoremovethedocumentsfromthecollection,wewillseeanerrortotheeffectthatremovalisnotpermittedfromthecappedcollection.Itallowsthedocumentstobedeletedonlywhennewdocumentsareadded,andthereisn’tspaceavailabletoaccommodatethem.
Whatweseenextisaboutthetailablecursorwecreated.Westartedtwoshells,andoneofthemisanormalinsertionofdocumentswithanintervalof1secondbetweensubsequentinsertions.Inthesecondshell,wecreatedacursor,iteratedthroughit,andprintedthedocumentsthatwegetfromthecursorontotheshell.Theadditionaloptionsweaddedtothecursormakethedifferencethough.Therearetwooptionsadded,DBQuery.Option.tailableandDBQuery.Option.awaitData.Theformeroptionistoinstructthatthecursoristailableratherthannormal,wherethelastpositionismarkedandwecanresumewhereweleftoff.Thelatteroptionwaitsformoredataforsometimeratherthanreturningimmediatelywhennodataisavailableandwereachtheendofthecursor.TheawaitDataoptioncanbeusedwithtailablecursorsonly.ThecombinationofthesetwooptionsgivesusafeelsimilartothetailcommandintheUnixfilesystem.Foralistofdifferentavailableoptions,visithttp://docs.mongodb.org/manual/reference/method/cursor.addOption/.
ConvertinganormalcollectiontoacappedcollectionInthisrecipe,wewilldemonstratetheprocesstoconvertanormalcollectiontoacappedcollection.
GettingreadyRefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,andstartasingleinstanceofMongo.Thisistheonlyprerequisiteforthisrecipe.StartaMongoshellandconnecttothestartedserver.
Howtodoit…1. Executethefollowingcommandtoensurethatyouareinthetestdatabase:
>usetest
2. Createanormalcollectionasfollows.Wewillbeadding100documentstoit.Type/copythefollowingqueryintheMongoshellandexecuteit:
for(i=1;i<=100;i++){
db.normalCollection.insert({'i':i,val:'SomeTextContent'})
}
3. Querythecollection,asfollows,toconfirmifitcontainsthedata:
>db.normalCollection.find()
4. Nowquerythesystem.namespacescollectionasfollows,andnotetheresultdocument:
>db.system.namespaces.find({name:'test.normalCollection'})
5. Executethefollowingcommandtoconvertthecollectionintoacappedcollection:
>db.runCommand({convertToCapped:'normalCollection',size:100})
6. Querythecollectiontotakealookatthedata:
>db.normalCollection.find()
7. Querythesystem.namespacescollection,asfollows,andnotetheresultdocument:
>db.system.namespaces.find({name:'test.normalCollection'})
Howitworks…Wecreatedanormalcollectionwith100documentsandthentriedtoconvertittoacappedcollectionwithasizeof100bytes.ThecommandhasthefollowingJSONdocumentpassedtotherunCommandfunction:
{convertToCapped:<nameofnormalcollection>,size:<sizeinbytesofthe
cappedcollection>}
Thiscommandcreatesacappedcollectionwiththementionedsize,andloadsthedocumentsinthenaturalorderfromthenormalcollectiontothetargetcappedcollection.Ifthesizeofthecappedcollectionreachesthelimitmentioned,theolddocumentsareremovedintheFIFOorder,makingspacefornewdocuments.Oncethisisdone,thecreatedcappedcollectionisrenamed.Executingafindqueryonthecappedcollectionconfirmsthatnotall100documentsthatwereoriginallypresentinthenormalcollectionarepresentinthecappedcollection.Aqueryonthesystem.namespacescollection,beforeandaftertheexecutionoftheconvertToCappedcommand,showsthechangeinthecollectionattributes.Notethatthisoperationacquiresaglobalwritelock,blockingallreadandwriteoperationsinthisdatabase.Also,anyindexespresentontheoriginalcollectionarenotcreatedonthiscappedcollection-upconversion.
There’smore…OplogisanimportantcollectionusedforreplicationinMongoDBandisacappedcollection.Formoreinformationonreplicationandoplogs,refertotheUnderstandingandanalyzingoplogsrecipeinChapter4,Administration.IntheImplementingtriggersinMongoDBusingoplogrecipe,wewillusethisoplogtoimplementafeaturesimilartotheafterinsert/update/deletetriggerofarelationaldatabase.
StoringbinarydatainMongoDBSofarwehaveseenhowtostoretextvalues,dates,andnumbersfieldsinadocument.Binarycontentalsoneedstobestoredattimesinthedatabase.Considercaseswhereusersuploadtheirphotographsorscannedcopiesofdocumentsthatneedtobestoredinthedatabase.Inrelationaldatabases,theBLOBdatatypeisthemostcommonlyusedtypetoaddresstheserequirements.Mongotoosupportsbinarycontentstobestoredinadocumentinthecollection.Thecatchisthatthetotalsizeofthedocumentshouldn’texceed16MB,whichistheupperlimitofthedocumentsizeatthetimeofwritingthisbook.Inthisrecipe,wewillstoreasmallimagefileinMongo’sdocumentandalsoretrieveitlater.IfthecontentyouwishtostoreinMongoDBcollectionsisgreaterthan16MB,MongoDBoffersanout-of-the-boxsolutioncalledGridFS.WewillseehowtouseGridFSintheStoringlargedatainMongoDBusingGridFSrecipelaterinthischapter.
GettingreadyLookattheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,andstartasingleinstanceofMongoDB.Also,theprogramtowritebinarycontenttothedocumentiswritteninJava.FormoredetailsonJavadrivers,refertothefollowingrecipesinChapter3,ProgrammingLanguageDrivers:
ExecutingqueryandinsertoperationsusingaJavaclientExecutingupdateanddeleteoperationsusingaJavaclientAggregationinMongousingaJavaclientMapReduceinMongousingaJavaclient
OpenaMongoshellandconnecttothelocalMongoDBinstancelisteningtoport27017.Forthisrecipe,wewillbeusingtheprojectmongo-cookbook-bindata.Thisprojectisavailableinthesourcecodebundledownloadablefromthebook’swebsite.Thefolderneedstobeextractedonthelocalfilesystem.Openacommand-lineshellandgototherootoftheprojectextracted.Itshouldbethedirectorywherethepom.xmlfileisfound.
Howtodoit…1. Ontheoperatingsystemshellwiththepom.xmlfilepresentinthecurrentdirectoryof
themongo-cookbook-bindataproject,executethefollowingcommand:
$mvncleancompileexec:java-
Dexec.mainClass=com.packtpub.mongo.cookbook.BinaryDataTest
2. Observetheoutput;theexecutionshouldbesuccessful.3. SwitchtotheMongoshell,connectedtothelocalinstance,andexecutethefollowing
query:
>db.binaryDataTest.findOne()
4. Scrollthroughthedocumentandtakeanoteofthefieldsinthedocument.
Howitworks…Ifwescrollthroughthelargedocumentprintedout,weseethatthefieldsarefileName,size,anddata.Thefirsttwofieldsareoftypestringandnumberrespectively,whichwepopulatedondocumentcreation,andholdthenameofthefileweprovideandthesizeinbytes.ThedatafieldisafieldofBSONtype,whereweseethedataencodedinthebase64format.
Whatwedidtoinsertthisdocumentisnotmuchfromanapplication’sperspective.ThefollowinglinesofcodeshowhowwepopulatedtheDBObjectthatweaddedtothecollection:
DBObjectdoc=newBasicDBObject("_id",1);
doc.put("fileName",resourceName);
doc.put("size",imageBytes.length);
doc.put("data",imageBytes);
Aswesee,twofields,namely,fileNameandsize,areusedtostorethenameofthefileandthesizeofthefile,andareoftypestringandnumberrespectively.ThefielddataisaddedtoDBObjectasabytearray.ItgetsstoredautomaticallyastheBSONtypeBinDatainthedocument.
Whatwesawinthisrecipeisstraightforward,asthedocumentsizeislessthan16MB,whichisthemaximumdocumentsizeinMongoasofwritingthisbook.Ifthesizeofthefilesstoredexceedsthisvalue,wehavetoresorttosolutionssuchasGridFS,explainedinthenextrecipe.
StoringlargedatainMongoDBusingGridFSAdocument’ssizeinMongoDBcanbeamaximumof16MB,butdoesthatmeanwecannotstoredatathatismorethan16MBinsize?Therearecaseswhereyouprefertostorevideosandaudiofilesinadatabaseratherthaninthefilesystemforanumberofadvantages,suchas,afewofthemarestoringmetadataalongwiththem,accessingthefilefromanintermediatelocation,andreplicatingthecontentsforhighavailabilityifreplicationisenabledontheMongoDBserverinstances.GridFSisthewaytoaddresssuchusecasesinMongoDB.WewillalsoseehowGridFSmanageslargecontentthatexceeds16MBandwillanalyzethecollectionsitusesforstoringthecontentbehindthescenes.Fortestpurposes,wewillnotbeusingdataexceeding16MBbutsomethingsmallertoseeGridFSinaction.
GettingreadyRefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,andstartasingleinstanceofMongo.Thisistheonlyprerequisiteforthisrecipe.StartaMongoshellandconnecttothestartedserver.Additionally,wewillusethemongofilesutilitytostoredatainGridFSfromthecommandline.
Howtodoit…1. Downloadthecodebundleofthebookfromthebook’swebsiteandsavetheimage
filenamedglimpse_of_universe-wide.jpgfromittoyourlocaldrive(youmaychooseanyotherlargefile,asamatteroffact,andprovideanappropriatenametothefilewiththecommandsweexecute).Forthesakeoftheexample,theimageissavedintheHomedirectory.Wewillsplitourstepsintothreeparts:
2. Withtheserverupandrunning,executethefollowingcommandfromtheoperatingsystem’sshell,withthecurrentdirectorybeingtheHomedirectory.Therearetwoargumentshere.Thefirstoneisthenameofthefileonthelocalfilesystem,andthesecondoneisthenamethatwillbeattachedtotheuploadedcontentinMongoDB.
$mongofilesput-lglimpse_of_universe-wide.jpguniverse.jpg
3. Letusnowquerythecollectionstoseehowthiscontentisactuallystoredinthecollectionsbehindthescenes.Withtheshellopen,executethefollowingtwoqueries.Makesurethatinthesecondquery,youmentionnottoselectthedatafield:
>db.fs.files.findOne({filename:'universe.jpg'})
>db.fs.chunks.find({},{data:0})
4. NowthatwehaveputafiletoGridFSfromtheoperatingsystem’slocalfilesystem,wewillseehowwecangetthefiletothelocalfilesystem.Executethefollowingcommandfromtheoperatingsystemshell:
$mongofilesget-lUploadedImage.jpguniverse.jpg
5. Finally,wewilldeletethefileweuploadedasfollows.Fromtheoperatingsystemshell,executethefollowingcommand:
$mongofilesdeleteuniverse.jpg
6. Confirmthedeletionusingthefollowingqueriesagain:
>db.fs.files.findOne({filename:'universe.jpg'})
>db.fs.chunks.find({},{data:0})
Howitworks…Mongodistributioncomeswithanout-of-the-boxtoolcalledmongofilesthatletsusuploadlargecontenttotheMongoserver;thisgetsstoredusingtheGridFSspecification.GridFSisnotadifferentproductbutaspecificationthatisstandardandfollowedbydifferentdriversforMongoDBtostoredatagreaterthan16MB,whichisthemaximumdocumentsize.Itcanevenbeusedforfilesofsizelessthan16MB,aswedidinourrecipe,butthereisn’treallyagoodreasontodothat.Thereisnothingstoppingusfromimplementingourownwayofstoringtheselargefiles,butitispreferredtofollowthestandardbecausealldriverssupportit;theydotheheavyliftingofsplittingthebigfileintosmallchunksandreassemblingthechunkswhenneeded.
Wekepttheimagedownloadedfromthebook’swebsiteanduploadeditusingmongofilestoMongoDB.Thecommandtodothatisput,andthe-loptiongivesthenameofthefileonthelocaldrivethatwewanttoupload.Finally,universe.jpgisthenamebywhichwewantthefiletobestoredonGridFS.
Onsuccessfulexecution,weshouldseethefollowingoutput:
connectedto:127.0.0.1
addedfile:{_id:ObjectId('5310d531d1e91f93635588fe'),filename:
"universe.jpg
",chunkSize:262144,uploadDate:newDate(1393612082137),md5:
"d894ec31b8c5add
d0c02060971ea05ca",length:2711259}
done!
Thisgivesussomedetailsoftheupload,namely,theunique_idfortheuploadedfile,thenameofthefile,thechunksize(thesizeofeachchunkthisbigfileisbrokeninto,whichbydefaultis256KB),thedateofupload,thechecksumoftheuploadedcontent,andthetotallengthofupload.Thechecksumcanbecomputedbeforehandandthencomparedaftertheupload,tocheckwhethertheuploadedcontentwascorruptedornot.
WeexecutedthefollowingqueryfromtheMongoshellinthetestdatabase:
>db.fs.files.findOne({filename:'universe.jpg'})
Weseethattheoutputwesawfortheputcommandofmongofilesisthesameasthedocumentqueriedearlierinthefs.filescollection.ThisisthecollectionwherealltheuploadedfiledetailsareputwhensomedataisaddedtoGridFS.Therewillbeonedocumentperupload.ApplicationscanlateralsomodifythisdocumenttoaddtheirowncustommetadataalongwiththestandarddetailsaddedbyMongowhenaddingthedata.Applicationscanverywellusethiscollectiontoadddetails.Forexample,ifthedocumentisforanimageupload,wecanadddetailssuchasthenameofthephotographer,thelocationwheretheimagewastaken,whenitwastaken,andtagsfortheindividualsintheimageinthiscollection.
Thefilecontentissomethingthatcontainsthisdata.Letusexecutethefollowingquery:
>db.fs.chunks.find({},{data:0})
Wehavedeliberatelyleftoutthedatafieldfromtheresultselected.Letuslookatthestructureoftheresultdocument:
{
_id:<UniqueidentifieroftypeObjectIdrepresentingthischunk>,
file_id:<ObjectIdofthedocumentinfs.filesforthefilewhosechunk
thisdocumentrepresent>,
n:<Thechunkidentifierstartswith0,thisisusefulforknowingtheorder
ofthechunks>,
data:<BSONbinarycontentforthedatauploadedforthefile>
}
Forthefileweuploaded,wehave11chunksofmaximum256KBeach.Whenafileisbeingrequested,thefs.chunkscollectionissearchedbyfile_id,whichcomesfromthe_idfieldofthefs.filescollection,andthenfield,whichisthechunk’ssequence.Auniqueindexcreatedonthesetwofields,whenthiscollectioniscreatedforthefirsttimewhenafileisuploadedusingGridFSforfastretrievalofchunksusingthefileID,issortedbythechunk’ssequencenumber.
Similartoput,thegetoptionisusedtoretrievethefilesfromtheGridFSandputthemonalocalfilesystem.Theoption-l,whichisstillusedtoprovidethename,isthenameofthefilethatwouldbesavedonthelocalfilesystem.ThefinalparametertogetthecommandisthenameofthefileasstoredonGridFS.Thisisthevalueofthefilenamefieldinthefs.filescollection.Finally,thedeletecommandofmongofilessimplyremovestheentryofthefilefromthefs.filesandfs.chunkscollections.Thenameofthefilegivenfordeletionisagainthevaluepresentinthefilenamefieldofthefs.filescollection.
There’smore…SomeimportantusecasesofusingGridFSarewhentherearesomeuser-generatedcontentssuchaslargereportsonstaticdatathatdoesn’tchangetoooftenandisexpensivetogeneratefrequently.Insteadofrunningthemallthetime,theycanberunonceandstoreduntilachangeinstaticdataisdetected,inwhichcasethestoredreportisdeletedandexecutedagainonthenextrequestofthedata.Thefilesystemmaynotalwaysbeavailabletotheapplicationtowritethefilesto,inwhichcasethisisagoodalternative.Therearecaseswhereonemightbeinterestedinsomeintermediatechunkofthedatastored,inwhichcasethechunkcontainingtherequireddatacanbeaccessed.Yougetsomenicefeatures;forinstance,theMD5contentofthedataisstoredautomaticallyandisavailableforusebytheapplication.
NowthatwehaveseenwhatGridFSis,letusseesomescenarioswhereusingGridFSmightnotbeaverygoodidea.TheperformanceofaccessingthecontentfromMongoDBusingGridFSanddirectlyfromthefilesystemwillnotbethesame.DirectfilesystemaccesswillbefasterthanGridFS,andproofofconcept(POC)forthesystemtobedevelopedisrecommendedtomeasuretheperformance,seeifitiswithintheacceptablelimits;thetradeoffinperformancemightbeworththebenefitsweget.Also,ifyourapplicationserverisfrontedwithCDN,youmightnotactuallyneedalotofI/OforstaticdatastoredinGridFS.AsGridFSstoresthedatainmultipledocumentsincollections,atomicallyupdatingthemisnotpossible.Ifweknowthecontentislessthan16MB,whichisthecaseinalotofuser-generatedcontentorsomesmallfilesuploaded,wemayskipGridFSaltogetherandstorethecontentinonedocumentasBSONsupportsthestorageofbinarycontentinthedocument.Formoredetails,refertotheStoringbinarydatainMongoDBrecipe.
Wewillrarelybeusingthemongofilesutilitytostore,retrieve,anddeletedatafromGridFS.Thoughitmayoccasionallybeused,themajorityoftimeswewillbelookingatperformingtheseoperationsfromanapplication.Thus,inthenextcoupleofrecipes,wewillseehowtoconnecttoGridFStostore,retrieve,anddeletefilesusingJavaandPythonclients.
ThoughthishasnothingmuchtodowithMongo,OpenstackisanInfrastructureasaService(IaaS)platformandoffersavarietyofservicesforcomputing,storing,networking,andsoon.OneoftheimagestorageservicescalledGlancesupportsalotofpersistentstorestostoretheimages.OneofthesupportedstoresbyGlanceisMongoDB’sGridFS.YoucanfindmoreinformationonhowtoconfigureGlancetouseGridFSathttp://docs.openstack.org/trunk/config-reference/content/ch_configuring-openstack-image-service.html.
StoringdatatoGridFSfromaJavaclientInthepreviousrecipe,wesawhowtostoredatatoGridFSusingthecommand-lineutilitycalledmongofiles,whichcomeswithMongoDBtomanagelargedatafiles.TogetanideaofwhatGridFSisandwhatcollectionsareusedbehindthescenestostorethedata,refertothepreviousrecipe.Inthisrecipe,wewilllookatstoringdatatoGridFSusingaJavaclient.Theprogramwillbeahighlyscaleddownversionofthemongofilesutilityandfocusesonlyonhowtostore,retrieve,anddeletedataratherthantryingtoprovidealotofoptionssuchasmongofilesdo.
GettingreadyRefertotheConnectingtoasinglenodefromaJavaclientrecipefromChapter1,InstallingandStartingtheMongoDBServer,forallthenecessarysetupforthisrecipe.IfyouareinterestedinmoredetailsonJavadrivers,refertothefollowingrecipesinChapter3,ProgrammingLanguageDrivers:
ExecutingqueryandinsertoperationsusingaJavaclientExecutingupdateanddeleteoperationsusingaJavaclientAggregationinMongousingaJavaclientMapReduceinMongousingaJavaclient
OpenaMongoshellandconnecttothelocalmongodinstancelisteningtoport27017.Forthisrecipe,wewillbeusingtheprojectmongo-cookbook-gridfs.Thisprojectisavailableinthesourcecodebundledownloadablefromthebook’swebsite.Thefolderneedstobeextractedonthelocalfilesystem.Openaterminalonyouroperatingsystemandgototherootoftheprojectextracted.Itshouldbethedirectorywherethepom.xmlfileisfound.Also,savetheglimpse_of_universe-wide.jpgfileonthelocalfilesystem,justasinthepreviousrecipe.Thisfilecanbefoundinthedownloadablecodebundlefromthebook’swebsite.
Howtodoit…1. WeareassumingthatthecollectionsofGridFSarecleanandnopriordatais
uploaded.Ifthereisnothingcrucialinthedatabase,youmayexecutethefollowingqueriestoclearthecollection.Doexercisecautionbeforedroppingthecollectionsthough:
>usetest
>db.fs.chunks.drop()
>db.fs.files.drop()
2. Openanoperatingsystemshellandexecutethefollowingcommand:
$mvncleancompileexec:java-
Dexec.mainClass=com.packtpub.mongo.cookbook.GridFSTests-
Dexec.args="put~/glimpse_of_universe-wide.jpguniverse.jpg"
NoteThefileIneedtouploadwasplacedintheHomedirectory.Youmaychoosetogivethefilepathoftheimagefileaftertheputcommand.Notethatifthepathcontainsspaces,thewholepathneedstobewithinsinglequotes.
3. Iftheprecedingcommandrunssuccessfully,youshouldseethefollowingoutput:
Successfullywrittentouniverse.jpg,detailsare:
UploadIdentifier:5314c05e1c52e2f520201698
Length:2711259
MD5hash:d894ec31b8c5addd0c02060971ea05ca
ChunkSideinbytes:262144
TotalNumberOfChunks:11
4. Oncetheprecedingexecutionissuccessful,whichwecanconfirmfromtheconsoleoutput,executethefollowingqueriesfromtheMongoshell:
>db.fs.files.findOne({filename:'universe.jpg'})
>db.fs.chunks.find({},{data:0})
5. NowwewillgetthefilefromGridFStothelocalfilesystem.Executethefollowingcommandtoperformthisoperation.Ensurethatthedirectorytowhichwearewriting,thesecondparameterafterthegetoperation,iswritabletotheuser.
$mvncleancompileexec:java-
Dexec.mainClass=com.packtpub.mongo.cookbook.GridFSTests-
Dexec.args="get'~/universe.jpg'universe.jpg"
6. Confirmthatthefileispresentonthelocalfilesystematthementionedlocation.Weshouldseethefollowingoutputontheconsoleoutputtoindicateasuccessfulwriteoperation:
Connectedsuccessfully..
Successfullywritten2711259bytesto~/universe.jpg
7. Finally,wewilldeletethefilefromGridFSasfollows:
$mvncleancompileexec:java-
Dexec.mainClass=com.packtpub.mongo.cookbook.GridFSTests-
Dexec.args="deleteuniverse.jpg"
8. Onsuccessfuldeletion,weshouldseethefollowingoutputontheconsole:
Connectedsuccessfully..
Removedfilewithname'universe.jpg'fromGridFS
Howitworks…Thecom.packtpub.mongo.cookbook.GridFSTestsclassacceptsthreetypesofoperations,put,get,anddelete,touploadafiletoGridFS,getcontentsfromGridFStoalocalfilesystem,anddeletefilesfromGridFSrespectively.
Theclassacceptsuptothreeparameters.Thefirstoneistheoperationwithvalidvaluesasget,put,anddelete.Thesecondparameterisrelevantforthegetandputoperationsandisthenameofthefileonthelocalfilesystemtowritethedownloadedcontentto,orfromwhichthecontentissourcedforupload.ThethirdparameteristhenameofthefileinGridFS,whichisnotnecessarilythesameasthenameonthelocalfilesystem.However,fordelete,onlythefilenameonGridFSisneededfordeletionpurposes.
LetusseesomeimportantsnippetsofcodefromtheclassthatisspecifictoGridFS.
Openthecom.packtpub.mongo.cookbook.GridFSTestsclassinyourfavoriteIDEandlookforthehandlePut,handleGet,andhandleDeletemethods.Thesearethemethodswhereallthelogicis.First,wewillstartwiththehandlePutmethod,whichismeanttouploadthecontentsofthefilefromthelocalfilesystemtoGridFS.
Irrespectiveofwhichoperationwedo,wewillcreateaninstanceofthecom.mongodb.gridfs.GridFSclass.Inourcase,weinstantiateditasfollows:
GridFSgfs=newGridFS(client.getDB("test"));
Theconstructorofthisclasstakesthedatabaseinstanceofthecom.mongodb.DBclassinwhichwewishtocreatetheGridFStablesfs.chunksandfs.files,whichwillstoretheuploadedcontent.OncetheinstanceofGridFSiscreated,wewillinvokethecreateFilemethodonit.Thismethodacceptstwoparameters;thefirstoneisInputStream,whichsourcesthebytesofthecontenttobeuploaded,andthesecondparameteristhenameofthefilethatwillbesavedonGridFS.However,thismethoddoesn’tcreatethefileonGridFSbutreturnsaninstanceofcom.mongodb.gridfs.GridFSInputFile.Theuploadwillhappenonlywhenwecallthesavemethodinthisreturnedobject.ThereareafewoverloadedvariantsofthiscreateFilemethod.Formoredetails,refertotheJavadocofthecom.mongodb.gridfs.GridFSclass.
OurnextmethodishandleGet,whichgetsthecontentsofthefilesavedonGridFStoalocalfilesystem.Similartothecom.mongodb.DBCollectionclass,theclasscom.mongodb.gridfs.GridFShasthefindandfindOnemethodstosearch.However,insteadofacceptinganyDBObjectquery,findandfindOneinGridFSacceptthefilenameortheObjectIdvalueofthedocumenttosearchinthefs.filescollection.Similarly,thereturnvalueisnotaDBCursorbutaninstanceofcom.mongodb.gridfs.GridFSDBFile.ThisclasshasvariousmethodsthatletusgettheInputStreamofthebytesofthefilepresentonGridFS.OthermethodsarewriteTo,whichwritestothegivenfileorOutputStreamandagetLengthmethodthatgivethenumberofbytesinthefile.Fordetails,refertotheJavadocofthecom.mongodb.gridfs.GridFSDBFileclass.
Finally,welookatthehandleDeletemethodthatisusedtodeletethefilesonGridFSandisthesimplestofthelot.ThemethodontheobjectofGridFSisremove,whichacceptsa
stringargumentthatisthenameofthefiletodeleteontheserver.Thereturntypeofthismethodisvoid.So,irrespectiveofwhetherthecontentispresentonGridFSornot,themethodwillnotreturnavaluenorthrowanexceptionifanameisprovidedtothismethodforafilethatdoesn’texist.
StoringdatatoGridFSfromaPythonclientIntheStoringlargedatainMongoDBusingGridFSrecipe,wesawwhatGridFSisandhowitcanbeusedtostorelargefilesinMongoDB.Inthepreviousrecipe,wesawhowtouseGridFSAPIfromaJavaclient.Inthisrecipe,wewillseehowtostoreimagedataintoMongoDBusingGridFSfromaPythonprogram.
GettingreadyRefertotheConnectingtoasinglenodefromaJavaclientrecipefromChapter1,InstallingandStartingtheMongoDBServer,forallthenecessarysetupforthisrecipe.IfyouareinterestedinmoredetailsonPythondrivers,refertothefollowingrecipesinChapter3,ProgrammingLanguageDrivers:
InstallingPyMongoExecutingqueryandinsertoperationsusingPyMongoExecutingupdateanddeleteoperationsusingPyMongo
Downloadandsavetheglimpse_of_universe-wide.jpgimagefilefromthedownloadablecodebundle,availableonthebook’swebsite,tothelocalfilesystem,aswedidinthepreviousrecipe.
Howtodoit…1. OpenaPythoninterpreterbytypinginthefollowingcommandintheoperating
systemshell(notethatthecurrentdirectoryisthesameasthedirectorywheretheimagefileglimpse_of_universe-wide.jpgisplaced):
$python
2. Importtherequiredpackagesasfollows:
>>>importpymongo
>>>importgridfs
3. OncethePythonshellisopened,createaMongoClientanddatabaseobjecttothetestdatabaseasfollows:
>>>client=pymongo.MongoClient('mongodb://localhost:27017')
>>>db=client.test
4. TocleartheGridFS-relatedcollectionstostartclean,andonlyifnothingimportantispresentinthem,executethefollowingqueries:
>>>db.fs.files.drop()
>>>db.fs.chunks.drop()
5. CreatetheinstanceofGridFSasfollows:
>>>fs=gridfs.GridFS(db)
6. Now,wewillreadthefileanduploaditscontentstoGridFS.First,createthefileobjectasfollows:
>>>file=open('glimpse_of_universe-wide.jpg','rb')
7. NowputthefileintoGridFSasfollows:
>>>fs.put(file,filename='universe.jpg')
8. Onsuccessfullyexecutingtheprecedingputcommand,weshouldseeObjectIdforthefileuploaded.Thiswouldbesameasthe_idfieldofthefs.filescollectionforthisfile.
9. ExecutethefollowingqueryfromthePythonshell.Itshouldprintoutthedictobjectwiththedetailsoftheupload.Verifythecontentsandcross-checkbyexecutingthefollowingquery:
>>>db.fs.files.find_one()
10. Now,wewillgettheuploadedcontentandwriteittoafileonthelocalfilesystem.LetusgettheGridOutinstancerepresentingtheobject,toreadthedataoutofGridFSasfollows:
>>>gout=fs.get_last_version('universe.jpg')
11. Withthisinstanceavailable,letuswritethedatatothefileonalocalfilesystemasfollows.First,openahandletothefileonthelocalfilesystemtowriteto,asfollows:
>>>fout=open('universe.jpg','wb')
12. Wewillthenwritecontentstoitasfollows:
>>>fout.write(gout.read())
>>>fout.close()
>>>gout.close()
13. Nowverifythefileonthecurrentdirectoryonthelocalfilesystem.Anewfilecalleduniverse.jpgwillbecreatedwiththesamenumberofbytesasthesourcepresentinit.Verifyitbyopeningitinanimageviewer.
Howitworks…Letuslookindetailatthestepsweexecuted.InthePythonshell,weimporttwopackages,pymongoandgridfs,andinstantiatethepymongo.MongoClientandgridfs.GridFSinstances.Theconstructorofthegridfs.GridFSclasstakesanargument,whichistheinstanceofpymongo.Database.
WeopenafileinbinarymodeusingtheopenfunctionandpassthefileobjecttotheGridFS’sputmethod.Thereisanadditionalargumentpassed,calledfilename,whichisthenameofthefileputintoGridFS.Thefirstparameter,infact,neednotbeafileobject,butanyobjectwithareadmethoddefined.
Oncetheputoperationsucceeds,thereturnvalueisanObjectIdfortheuploadeddocumentinthefs.filescollection.Aqueryonfs.filescanconfirmthatthefileisuploaded.Verifythatthesizeofthedatauploadedmatchesthesizeofthefile.
OurnextobjectiveistogetthefilefromGridFSontothelocalfilesystem.Intuitively,onewouldimaginethatifthemethodtoputafileinGridFSisput,thenthemethodtogetthefilewouldbeget.True,themethodisindeedget.However,itwillgetonlybasedontheObjectId,whichwasreturnedbytheputmethod.SoifyouareoktogetbyObjectId,themethodforyouisget.However,ifyouwanttogetbythefilename,themethodtouseisget_last_version,whichacceptsthenameofthefilethatweuploaded,andthereturntypeofthismethodisgridfs.gridfs_file.GridOut.Thisclasscontainsthemethodread,whichwillreadoutallthebytesfromtheuploadedfiletoGridFS.Weopenafilecalleduniverse.jpgtowriteinbinarymodeandwriteallthebytesreadfromtheGridOutobject.
ImplementingtriggersinMongoDBusingoplogAtriggerinarelationaldatabaseisacodethatgetsinvokedwhenaninsert,update,oradeleteoperationisexecutedonatableinthedatabase.Atriggercanbeinvokedeitherbeforeoraftertheoperation.TriggersarenotimplementedinMongoDBoutofthebox,andincaseyouneedsomesortofnotificationforyourapplicationwheneveranyinsert,update,anddeleteoperationsareexecuted,youarelefttomanagethembyyourselfintheapplication.Oneapproachistohavesomesortofdataaccesslayerintheapplicationthatistheonlyplacetoquery,insert,update,ordeletedocumentsfromthecollections.However,thereareafewchallengestothis.First,youneedtoexplicitlycodethelogictoaccommodatethisrequirementintheapplication,whichmayormaynotbefeasible.Ifthedatabaseissharedandmultipleapplicationsaccessit,thingsbecomeevenmoredifficult.Second,theaccessneedstobestrictlyregulatedandnoothersourceofinsert,update,anddeleteshouldbepermitted.
Alternatively,weneedtolookatrunningsomesortoflogicinalayerclosetothedatabase.Onewaytotrackallwriteoperationsisusinganoplog.Notethatreadoperationscannotbetrackedusingoplogs.Inthisrecipe,wewillwriteasmallJavaapplicationtotailanoplogandgetalltheinsert,update,anddeleteoperationshappeningonaMongoinstance.NotethatthisprogramisimplementedinJavaandworksequallywellinanyotherprogramminglanguage.Thecruxliesinthelogicfortheimplementation;theplatformforimplementationcanvary.Also,thisworksonlyifthemongodinstanceisstartedasapartofareplicasetandnotastandaloneinstance.Also,thistrigger-likefunctionalitycanbeinvokedonlyaftertheoperationisperformedandnotbeforethedatagetsinserted,updated,ordeletedfromthecollection.
GettingreadyRefertotheStartingmultipleinstancesaspartofareplicasetrecipefromChapter1,InstallingandStartingtheMongoDBServer,forallthenecessarysetupforthisrecipe.IfyouareinterestedinmoredetailsonJavadrivers,refertotheExecutingqueryandinsertoperationsusingaJavaclientandExecutingupdateanddeleteoperationsusingaJavaclientrecipesinChapter3,ProgrammingLanguageDrivers.Theprerequisitesofthesetworecipesareallweneedforthisrecipe.
RefertotheCreatingandtailingcappedcollectioncursorsinMongoDBrecipeinthischapter,toknowmoreaboutcappedcollectionsandtailablecursorsifyouneedarefresher.Finally,thoughnotmandatory,Chapter4,Administration,explainsoplogindepthintheUnderstandingandanalyzingoplogsrecipe.ThisrecipewillnotexplainoplogindepthaswedidinChapter4,Administration.Openashellandconnectittotheprimaryofthereplicaset.
Forthisrecipe,wewillbeusingtheprojectmongo-cookbook-oplogtrigger.Thisprojectisavailableinthesourcecodebundledownloadablefromthebook’swebsite.Thefolderneedstobeextractedonthelocalfilesystem.Openacommand-lineshellandgototherootoftheprojectextracted.Itshouldbeinthedirectorywherethepom.xmlfileisfound.Also,theTriggerOperations.jsfilewillbeneededtotriggeroperationsinthedatabasethatweintendtocapture.
Howtodoit…1. Openanoperatingsystemshellandexecutethefollowingcommand:
mvncleancompileexec:java-
Dexec.mainClass=com.packtpub.mongo.cookbook.OplogTrigger-
Dexec.args="test.oplogTriggerTest"
2. WiththeJavaprogramstarted,wewillopentheshellasfollows,withtheTriggerOperations.jsfilepresentinthecurrentdirectoryandthemongodinstancelisteningtoport27000astheprimary:
$mongo--port27000TriggerOperations.js--shell
3. Oncetheshellisconnected,executethefollowingfunctionweloadedfromtheJavaScript:
test:PRIMARY>triggerOperations()
4. ObservetheoutputprintedoutontheconsolewheretheJavaprogramcom.packtpub.mongo.cookbook.OplogTriggerisbeingexecutedusingMaven.
Howitworks…Thefunctionalityweimplementedisprettyhandyforalotofusecases.Letusseewhatwedidatahigherlevelfirst.TheJavaprogramcom.packtpub.mongo.cookbook.OplogTriggerissomethingthatactsasatriggerwhennewdataisinserted,updated,ordeletedfromacollectioninMongoDB.ItusestheoplogcollectionthatisthebackboneofreplicationinMongotoimplementthisfunctionality.
TheJavaScriptwehavejustactsasasourceofproducing,updating,anddeletingdatafromthecollection.There’snothingreallysignificanttowhatthisJavaScriptfunctiondoes,butitinsertssixdocumentsinacollection,updatesoneofthem,deletesoneofthem,insertsfourmoredocuments,andfinally,deletesallthedocuments.YoumaychoosetoopentheTriggerOperations.jsfileandtakealookathowitisimplemented.ThecollectiononwhichitperformsispresentinthetestdatabaseandiscalledoplogTriggerTest.
WhenweexecutetheJavaScriptfunction,weshouldseesomethinglikethefollowingoutputprintedontheconsole:
[INFO]<<<exec-maven-plugin:1.2.1:java(default-cli)@mongo-cookbook-
oplogtriger<<<
[INFO]
[INFO]---exec-maven-plugin:1.2.1:java(default-cli)@mongo-cookbook-
oplogtriger---
Connectedsuccessfully..
Startingtailingoplog…
OperationisInsertObjectIdis5321c4c2357845b165d42a5f
OperationisInsertObjectIdis5321c4c2357845b165d42a60
OperationisInsertObjectIdis5321c4c2357845b165d42a61
OperationisInsertObjectIdis5321c4c2357845b165d42a62
OperationisInsertObjectIdis5321c4c2357845b165d42a63
OperationisInsertObjectIdis5321c4c2357845b165d42a64
OperationisUpdateObjectIdis5321c4c2357845b165d42a60
OperationisDeleteObjectIdis5321c4c2357845b165d42a61
OperationisInsertObjectIdis5321c4c2357845b165d42a65
OperationisInsertObjectIdis5321c4c2357845b165d42a66
OperationisInsertObjectIdis5321c4c2357845b165d42a67
OperationisInsertObjectIdis5321c4c2357845b165d42a68
OperationisDeleteObjectIdis5321c4c2357845b165d42a5f
OperationisDeleteObjectIdis5321c4c2357845b165d42a62
OperationisDeleteObjectIdis5321c4c2357845b165d42a63
OperationisDeleteObjectIdis5321c4c2357845b165d42a64
OperationisDeleteObjectIdis5321c4c2357845b165d42a60
OperationisDeleteObjectIdis5321c4c2357845b165d42a65
OperationisDeleteObjectIdis5321c4c2357845b165d42a66
OperationisDeleteObjectIdis5321c4c2357845b165d42a67
OperationisDeleteObjectIdis5321c4c2357845b165d42a68
TheMavenprogramrunscontinuouslyandneverterminatesastheJavaprogramdoesn’tterminate.YoumayhitCtrl+Ctostoptheexecution.
LetusanalyzetheJavaprogram,whichiswherethemeatofthecontentis.Thefirstassumptionisthatforthisprogramtowork,thereplicasetmustbesetup,aswewilluse
Mongo’soplogcollection.TheJavaprogramcreatesaconnectiontotheprimaryofthereplicasetmembers,connectstothelocaldatabase,andgetstheoplog.rscollection.Then,allitdoesisfindthelast,ornearlythelast,timestampintheoplog.Thisisdonenotjusttopreventthewholeoplogfrombeingreplayedonstartup,butalsotomarkapointtowardstheendintheoplog.Thefollowingisthecodetofindthistimestampvalue:
DBCursorcursor=collection.find().sort(newBasicDBObject("$natural",
-1)).limit(1);
intcurrent=(int)(System.currentTimeMillis()/1000);
returncursor.hasNext()?(BSONTimestamp)cursor.next().get("ts"):new
BSONTimestamp(current,1);
Theoplogissortedinthereversenaturalordertofindthetimeinthelastdocumentinit.AsoplogsfollowtheFIFOpattern,sortingtheoploginthenaturaldescendingorderisequivalenttosortingbythetimestampindescendingorder.
Oncethisisdone,findingthetimestampasearlier,wequerytheoplogcollectionasusual,butwithtwoadditionaloptionsasfollows:
DBCursorcursor=collection.find(QueryBuilder.start("ts")
.greaterThan(lastreadTimestamp).get())
.addOption(Bytes.QUERYOPTION_TAILABLE)
.addOption(Bytes.QUERYOPTION_AWAITDATA);
ThequeryfindsalldocumentsgreaterthanaparticulartimestampandaddstwooptionsBytes.QUERYOPTION_TAILABLEandBytes.QUERYOPTION_AWAITDATA.Thelatteroptioncanonlybeaddedwhentheformeroptionisadded.Thisnotonlyqueriesandreturnsthedata,butalsowaitsforsometimewhentheexecutionreachestheendofthecursorforsomemoredata.Eventually,whennodataarrives,itterminates.
Duringeveryiteration,storethelastseentimestampaswell.Thisisusedwhenthecursorcloseswhennomoredataisavailable,andwequeryagaintogetanewtailablecursorinstance.Thequerythistimewillusethetimestampthatwehavestoredonthepreviousiterationwhenthelastdocumentwasseen.ThisprocesscontinuesindefinitelyandwebasicallytailthecollectionjustaswetailafileinUnixusingthetailcommand.
Theoplogdocumentcontainsafieldcalledop,fortheoperationwhosevaluesarei,u,anddforinsert,update,anddelete,respectively.Thefieldocontainstheinsertedordeletedobject’sID(_id)inthecaseofinsertanddelete.Inthecasetheupdateofthefileo2containsthe_id,allwedoissimplycheckfortheseconditionsandprintouttheoperationandtheIDofthedocumentinserted,deleted,orupdated.
Let’slookatthingsweneedtobecarefulabout.Obviously,thedeleteddocumentswillnotbeavailableinthecollection,so_idwouldnotreallybeusefulifyouintendtoquery.Also,becarefulwhenselectingadocumentaftertheupdateusingtheIDweget,assomeotheroperation,laterintheoplog,mightalreadyhaveperformedmoreupdatesonthesamedocumentandourapplication’stailablecursorisyettoreachthatpoint.Thisiscommonincaseofhigh-volumesystems.Wehaveasimilarproblemforinsertsaswell.Thedocumentwemightquery,usingtheprovidedid,mightbeupdated/deletedalready.Applicationsusingthislogictotracktheseoperationsmustbeawareofthem.
Alternatively,takealookattheoplogthatcontainsmoredetails,suchasthedocumentinsertedortheupdatestatementexecuted.Updatesintheoplogcollectionareidempotent,whichmeanstheycanbeappliedanynumberoftimeswithoutunintendedsideeffects.Forinstance,iftheactualupdatewastoincrementthevaluebyone,theupdateintheoplogcollectionwillhavethesetoperatorwiththefinalvaluetobeexpected.Thisway,thesameupdatecanbeappliedmultipletimes.Thelogicyouwouldusewouldthenhavetobemoresophisticated,toimplementsuchscenarios.
Also,failoversarenothandledhere.Thisisneededforproduction-basedsystems.Theinfiniteloopontheotherhand,opensanewcursorassoonasthefirstoneterminates.Therecouldbeasleepdurationintroducedbeforetheoplogisqueriedagain,toavoidoverwhelmingtheserverwithqueries.Notethattheprogramgivenhereisnotaproduction-qualitycodebutjustasimpledemoofthetechniquethatisbeingusedbyalotofothersystemstogetnotifiedaboutnewdatainsertions,deletions,andupdatestocollectionsinMongoDB.
MongoDBdidn’thavethetextsearchfeaturetillversion2.4,andpriortothat,allfull-textsearchwashandledusingexternalsearchenginessuchasSolrorElasticsearch.Evennow,atthetimeofwriting,thetextsearchfeatureinMongoDB,thoughproduction-ready,issomethingmanywouldstilluseadedicatedexternalsearchindexer.Itwon’tbesurprisingifadecisionistakentouseanexternalfull-textindexsearchtoolinsteadofleveragingMongoDB’sinbuiltone.IncaseofElasticsearch,theabstractiontoflowthedataintotheindexesisknownasriver.TheMongoDBriverinElasticsearch,whichaddsdatatotheindexesasandwhenthedatagetsaddedtothecollectionsinMongo,isbuiltonthesamelogicaswesawinthesimpleprogramimplementedinJava.
Executingflatplane(2D)geospatialqueriesinMongousinggeospatialindexesInthisrecipe,wewillseewhatgeospatialqueriesareandthenseehowtoapplythesequeriesonflatplanes.Wewillputittouseinatestmapapplication.
Geospatialqueriescanbeexecutedondatainwhichgeospatialindexesarecreated.Therearetwotypesofgeospatialindexes.Thefirstone,called2Dindexes,isthesimplerofthetwo.Itassumesthatthedataisgivenasx,ycoordinates.Thesecondone.called3Dorsphericalindexes,isrelativelymorecomplicated.Inthisrecipe,wewillexplore2Dindexesandexecutesomequerieson2Ddata.Thedataonwhichwearegoingtoworkisa25X25gridwithsomecoordinatesrepresentingbusstops,restaurants,hospitals,andgardens.
GettingreadyRefertotheConnectingtoasinglenodefromaJavaclientrecipefromChapter1,InstallingandStartingtheMongoDBServer,forallthenecessarysetupforthisrecipe.Downloadthedatafilenamed2dMapLegacyData.jsonandkeepitreadytoimportonthelocalfilesystem.OpenaMongoshellconnectingtothelocalMongoDBinstance.
Howtodoit…1. Executethefollowingcommandfromtheoperatingsystemshelltoimportthedata
intothecollection.The2dMapLegacyData.jsonfileispresentinthecurrentdirectory.
$mongoimport-careaMap-dtest--drop2dMapLegacyData.json
2. Ifweseesomethinglikethefollowingoutputonthescreen,wecanconfirmthattheimporthasgonethroughsuccessfully:
connectedto:127.0.0.1
MonMar1723:58:27.880dropping:test.areaMap
MonMar1723:58:27.932check926
MonMar1723:58:27.934imported26objects
3. AfterthesuccessfulimportfromtheopenedMongoshell,verifythecollectionanditscontentsbyexecutingthefollowingquery:
>db.areaMap.find()
Thisshouldgiveyouthefeelofthedatainthecollection
4. Thenextstepistocreatea2Dgeospatialindexonthisdata.Executethefollowingcommandtocreatea2Dindex:
$db.areaMap.ensureIndex({co:'2d'})
5. Withtheindexcreated,wewillnowtrytofindthenearestrestaurantfromtheplacewhereanindividualisstanding.Assumingthepersonisnotfussyaboutthetypeofcuisine,letusexecutethefollowingquery,assumingthatthepersonisstandingatlocation(12,8),asshownintheprecedingscreenshot.Also,weareinterestedinjustthethreenearestplaces:
$db.areaMap.find({co:{$near:[12,8]},type:'R'}).limit(3)
6. Thisshouldgiveusthreeresultsstartingwiththenearestrestaurant,withthesubsequentonesgivenaspertheincreasingdistance.Ifwelookattheimagegivenearlier,wekindofagreewiththeresultsgivenhere.
7. Letusaddmoreoptionstothequery.Theindividualhastowalkand,thus,wantsthedistancetoberestrictedtoaparticularvalueintheresults.Letusrewritethequerywiththefollowingmodification:
$db.areaMap.find({co:{$near:[12,8],$maxDistance:4},type:'R'})
8. Observethenumberofresultsretrievedthistimearound.
Howitworks…Letusnowgothroughwhatwedid.Beforewecontinue,letusdefinewhatexactlywemeanbythedistancebetweentwopoints.Supposeonacartesianplanewehavetwopoints,(x1,y1)and(x2,y2),thedistancebetweenthemwouldbecomputedusingthefollowingformula:
Forexample,supposethetwopointsare(2,10)and(12,3),thedistancewouldbeasfollows:
AfterknowinghowcalculationsfordistancearedonebehindthescenesbyMongoDB,letusseewhatwedidrightfromstep1.
WestartedbyimportingthedatanormallyintoaareaMapcollectioninthexdatabaseandcreatedanindexasdb.areaMap.ensureIndex({co:'2d'}).Theindexiscreatedonthecofieldinthedocumentandthevalueisaspecialvalue2d,whichdenotesthatthisisaspecialtypeofindexcalled2Dgeospatialindex.Usually,wegivethisvalueas1,or-1inothercases,denotingtheorderoftheindex.
Therearetwotypesofindexes:2Dindexandsphericalindex.A2Dindexiscommonlyusedforplaneswhosespanislessanddoesnotinvolvesphericalsurfaces.Itcouldbesomethingsuchasamapofthebuilding,alocality,orevenasmallcitywherethecurvatureoftheearthcoveringtheportionofthelandisnotreallysignificant.However,oncethespanofthemapincreasesandcoverstheglobe,2Dindexeswillbeinaccurateforpredictingthevalues,asthecurvatureoftheearthneedstobeconsideredinthecalculations.Insuchcases,wegoforsphericalindexes,whichwewilldiscusssoon.
Oncethe2Dindexiscreated,wecanuseittoquerythecollectionandfindsomepointsnearthepointqueried,asfollows:
>db.areaMap.find({co:{$near:[12,8]},type:'R'}).limit(3)
WewillqueryfordocumentsthatareoftypeR,thatis,thosedocumentsfor“restaurants”andthatareclosesttothepoint(12,8).Theresultsreturnedbythisquerywillbeintheincreasingorderofthedistancefromthepointinquestion,(12,8)inthiscase.Thelimitjustlimitstheresulttothetopthreedocuments.Wemayalsoprovide$maxDistanceinthequery,whichwillrestricttheresultstoadistancelessthanorequaltotheprovidedvalue.Wequeriedforlocationsnotmorethanfourunitsaway,asfollows:
>db.areaMap.find({co:{$near:[12,8],$maxDistance:4},type:'R'})
SphericalindexesandGeoJSON-compliantdatainMongoDBBeforewecontinuewiththisrecipe,weneedtolookatthepreviousrecipetogetanunderstandingofwhatgeospatialindexesareinMongoDBandhowtousethe2Dindexes.WhatwedidsofarwastoimporttheJSONdocumentsinanonstandardformatintheMongoDBcollection,creategeospatialindexes,andquerythem.Thisapproachworksperfectlyfineandinfact,wastheonlyoptionavailableuntilMongoDB2.4.Version2.4ofMongoDBsupportsanadditionalwaytostore,index,andquerythedocumentsinthecollections.ThereisastandardwaytorepresentgeospatialdataparticularlymeantforgeodataexchangeinJSON,andthespecificationofGeoJSONmentionsitindetailathttp://geojson.org/geojson-spec.html.Wecannowstorethedatainthisformat.
Therearevariousgeographicalfiguretypessupportedbythisspecification.However,forourusecase,wewillbeusingthetypepoint.FirstletusseehowthedocumentweimportedbeforeusinganonstandardformatlooksandhowtheoneusingtheGeoJSONformatlooks:
Nonstandardway
{"_id":1,"name":"WhiteStreet","type":"B",co:[4,23]}
GeoJSONformat
{"_id":1,"name":"WhiteStreet","type":"B",co:{type:'Point',
coordinates:[4,23]}}
TheGeoJSONformatlooksmorecomplicatedthanthenonstandardformat,andforourparticularcaseIdoagree.However,whenrepresentingpolygonsandotherlines,thenonstandardformatmighthavetostoremultipledocuments;inthatcase,itcanbestoredinasingledocumentjustbychangingthevalueofthetypefield.Refertothespecificationformoredetails.
GettingreadyTheprerequisitesforthisrecipearethesameastheprerequisitesforthepreviousrecipe,exceptthatthefilestobeimportedwillbe2dMapGeoJSONData.jsonandcountries.geo.json.Downloadthesefilesfromthebook’swebsiteandkeepthemonthelocalfilesystemtoimportthemlater.
NoteSpecialthankstoJohanSundströmforsharingtheworlddata.TheGeoJSONfortheworldistakenfromhttps://github.com/johan/world.geo.json.ThefileismassagedtoenableimportingandindexcreationinMongo.Version2.4doesn’tsupportMultiPolygonandthus,allMultiPolygontypesofshapesareomitted.Thisshortcomingseemstobefixedinversion2.6though.
Howtodoit…1. ImporttheGeoJSON-compatibledatainanewcollectionasfollows.Thiscontains
26documents,similartowhatweimportedlasttimearound,exceptthattheyareformattedusingtheGeoJSONformat.
$mongoimport-careaMapGeoJSON-dtest--drop2dMapGeoJSONData.json
$mongoimport-cworldMap-dtest--dropcountries.geo.json
2. Createageospatialindexonthiscollectionasfollows:
>db.areaMapGeoJSON.ensureIndex({"co":"2dsphere"})
>db.worldMap.ensureIndex({geometry:'2dsphere'})
3. WewillfirstquerytheareaMapGeoJSONcollectionasfollows:
>db.areaMapGeoJSON.find(
{co:{
$near:{
$geometry:{
type:'Point',
coordinates:[12,8]
}
}
},
type:'R'
}).limit(3)
4. Next,wewilltryandfindalltherestaurantsthatfallwithinthesquaredrawnbetweenthepoints(0,0),(0,11),(11,11),and(11,0).Refertothepreviousscreenshottogetaclearvisualofthepointsandtheresultstoexpect.
5. Writethefollowingqueryandobservetheresults:
>db.areaMapGeoJSON.find(
{co:{
$geoIntersects:{
$geometry:{
type:'Polygon',
coordinates:[[[0,0],[0,11],[11,11],[11,0],[0,0]]]
}
}
},
type:'R'
})
6. Checkwhetheritcontainsthethreerestaurantsatcoordinates(2,6),(10,5),and(10,1)asexpected.
7. Next,wewilltryandperformsomeoperationsthatwouldfindallthematchingobjectsthatliecompletelywithinanotherenclosingpolygon.Supposewewanttofindsomebusstopsthatliewithinagivensquareblock;suchusecasescanbeaddressedusingthe$geoWithinoperator,andthequerytoachieveitisasfollows:
>db.areaMapGeoJSON.find(
{co:{
$geoWithin:{
$geometry:{
type:'Polygon',
coordinates:[[[3,9],[3,24],[6,24],[6,9],[3,9]]]}
}
},
type:'B'
}
)
8. Verifytheresults;weshouldhavethreebusstopsintheresult.Refertotheprecedingscreenshottogettheexpectedresultsofthequery.
9. Whenweexecutetheprecedingcommands,theyjustprintthedocumentsintheascendingorderofthedistance.However,wedon’tseetheactualdistanceintheresult.Letusexecutethesamequeryasinstep3andadditionallygetthecalculateddistancesasfollows:
>db.runCommand({
geoNear:"areaMapGeoJSON",
near:[12,8],
spherical:true,
limit:3,
query:{type:'R'}
}
)
10. Theprecedingqueryreturnsonedocumentwithanarraywithinthefieldcalledresults,whichcontainsthematchingdocumentsandthecalculateddistances.Theresultalsocontainssomeadditionalstatsgivingthemaximumdistance,theaverageofthedistancesintheresult,totaldocumentsscanned,andthetimetakeninmilliseconds.
11. WewillfinallyqueryontheworldMapcollectiontofindwhichcountrytheprovidedcoordinateliesin.ExecutethefollowingqueryfromtheMongoshell:
db.worldMap.find(
{geometry:{
$geoIntersects:{
$geometry:{
type:'Point',
coordinates:[7,52]
}
}
}
}
,{properties:1,_id:0}
)
12. ThepossibleoperationswecanperformwiththeworldMapcollectionarenumerous,anditisnotpracticallypossibletocoveralloftheminthisrecipe.Iwouldencourageyoutoplayaroundwiththiscollectionandtryoutdifferentusecases.
Howitworks…StartingfromMongoDBVersion2.4,thestandardwayforstoringgeospatialdatainJSONisalsosupported.Notethatthelegacyapproachwesawisalsosupported.However,ifyouarestartingfresh,itisrecommendedthatyougoaheadwiththisapproach,forthefollowingreasons:
ItisastandardandanybodyawareofthespecificationwilleasilybeabletounderstandthestructureofthedocumentItmakesstoringcomplexshapes,polygons,andmultilineseasyItalsoletsusqueryeasilyfortheintersectionoftheshapes,using$geoIntersectandothernewsetsofoperators
ForusingGeoJSON-compatibledocuments,weimportJSONdocumentsinthe2dMapGeoJSONData.jsonfileintotheareaMapGeoJSONcollectionandcreatetheindexasfollows:
>db.areaMapGeoJSON.ensureIndex({"co":"2dsphere"})
ThecollectionhasdatasimilartowhatwehadimportedintotheareaMapcollectioninthepreviousrecipe,butwithadifferentstructurethatiscompatiblewiththeJSONformat.Thetypeusedhereis2Dsphereandnot2D.The2dspheretypeofindexalsoconsidersthesphericalsurfacesincalculations.Notethatthefieldco,onwhichwearecreatingthegeospatialindex,isnotanarrayofcoordinatesbutadocumentitselfthatisGeoJSON-compatible.
Wequerywherethevalueofthe$nearoperatorisnotanarrayofthecoordinates,aswedidinthepreviousrecipe,butadocumentwiththe$geometrykeyandthevalueisaGeoJSON-compatibledocumentforapointwiththecoordinates.Theresults,irrespectiveofthequeryweuse,areidentical.Refertostep3inthisrecipeandstep5inthepreviousrecipe,toseethedifferenceinthequery.TheapproachusingGeoJSONlooksmorecomplicatedbutithassomeadvantages,whichwewillseesoon.
Itisimportanttonotethatwecannotmixthetwoapproaches.TryexecutingthequeryintheGeoJSONformatwejustexecutedontheareaMapcollectionandnotethatthoughwedonotgetanyerror,theresultsarenotcorrect.
Weusedthe$geoIntersectsoperatorinstep5ofthisrecipe.ThisisonlypossiblewhenthedocumentsarestoredintheGeoJSONformatinthedatabase.Thequerysimplyfindsallthepointsinourcasethatintersectanyshapewecreate.WecreateapolygonusingtheGeoJSONformat,asfollows:
{
type:'Polygon',
coordinates:[[[0,0],[0,11],[11,11],[11,0],[0,0]]]
}
Thecoordinatesareforthesquaregivingthefourcornersinaclockwisedirection,withthelastcoordinatethesameasthefirst,denotingittobecomplete.Thequeryexecutedisthesameas$near,apartfromthefactthatthe$nearoperatorisreplacedby
$geoIntersects,andthevalueofthe$geometryfieldistheGeoJSONdocumentofthepolygonwithwhichwewishtofindtheintersectingpointsintheareaMapGeoJSONcollection.Ifwelookattheresultsobtainedandlookattheprecedingscreenshot,theyindeedarewhatweexpected.
The$geoWithinoperator(http://docs.mongodb.org/manual/reference/operator/query/geoWithin/)isprettyhandytousewhenwewanttofindthepointsinthepolygonorevenwithinanotherpolygon.Notethatonlyshapescompletelyinsidethegivenpolygonwillbereturned.Supposethat,justlikeourworldMapcollection,wehaveacitiescollectionwiththeircoordinatesspecifiedinasimilarmanner.Wecanthenusethepolygonofacountrytoqueryallthepolygonsinthecitiescollectionthatlieentirelywithinit,thusgivingusthecities.Obviously,aneasierandfasterwaywouldbestorethecountrycodeinthecitydocument.Alternatively,ifwehavesomedatamissinginthecitiescollectionandthecountryisnotpresent,onepointanywherewithinthecity’spolygon(asacityentirelyliesinonecountry)canbeusedandaquerycanbeexecutedontheworldMapcollectiontogetitscountry,whichwehavedemonstratedinstep11.
Acombinationofwhatwesawearliercanbeputtogoodusetocomputethedistancesbetweentwopointsorevenexecutegeometricoperation.
Someofthefunctionalities,suchasgettingthecentroidofapolygonfigure,oreventheareaofapolygon,storedasGeoJSONinthecollection,arenotsupportedoutoftheboxandthereshouldhavebeensomeutilityfunctionstohelpcomputethesegivencoordinates.Thesefeaturesaregoodanditiscommonlyrequiredtohavethem;perhaps,wemighthavesomesupportinfuturereleasesforoperationsthatcanbeimplementedbydevelopersthemselves.Also,thereisnostraightforwardwaytofindout,ifthereisanoverlapbetweentwopolygons,atwhichcoordinatestheyoverlap,whatistheareaofoverlap,andsoon.The$geoIntersectsoperatorwesawdoestelluswhichpolygonsdointersectwiththegivenpolygon,point,orline.
ThoughunrelatedtoMongo,theGeoJSONformatdoesn’thavesupportforcircles;hence,storingcirclesinMongousingGeoJSONformatisnotpossible.Formoredetailsongeospatialoperators,visithttp://docs.mongodb.org/manual/reference/operator/query-geospatial/.
Implementingafull-textsearchinMongoDBManyofus(Iwon’tbewrongifIsayallofus)useGoogleeverydaytosearchcontentontheWeb.Tocutalongstoryshort,thetextthatweprovideinthetextboxonGoogle’spageisusedtosearchthepagesontheWebthatithasindexed.ThesearchresultsarethenreturnedtousinanorderdeterminedbyGoogle’spagerankalgorithm.Wemightwanttohaveasimilarfunctionalityinourdatabasethatletsussearchforsometextcontentandgivesthecorrespondingsearchresults.Notethatthistextsearchisnotthesameasfindingthetextaspartofasentence,whichcaneasilybedoneusingregex.Itgoeswaybeyondthatandcanbeusedtogetresultsthatcontainthesame,aresimilarsounding,orhaveasimilarbaseword;wecanevenreturnevenasynonymintheactualsentence.
SinceMongoDBVersion2.4,thetextindexesintroducedletuscreatetextindexesonaparticularfieldinthedocumentandenabletextsearchonthosewords.Inthisrecipe,wewillbeimportingsomedocumentsandcreatingtextindexesonthem,whichwelaterquerytoretrievetheresults.
GettingreadyAsimplesinglenodeiswhatweneedforthetest.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,forhowtostarttheserver.However,donotstarttheserveryet.Thereisanadditionalflagprovidedduringthestartuptoenabletextsearch.DownloadtheBlogEntries.jsonfilefromthebook’swebsiteandkeepitonyourlocaldrive,readytobeimported.
Howtodoit…1. StarttheMongoDBserverlisteningtoport27017asfollows:
$mongod/data/mongo/db--smallfiles–oplogSize50--setParameter
textSearchEnabled=true
Asversion2.4isused,weneedtoexplicitlyenabletextsearchusingtextSearchEnabled.Forversion2.6andabove,thiscommand-lineoptioncanbeskipped.
2. Oncetheserverisstarted,wewillbecreatingthetestdatainacollectionasfollows.WiththeBlogEntries.jsonfileplacedinthecurrentdirectory,wewillbecreatingauserBlogcollectionusingmongoimport:
$mongoimport-dtest-cuserBlogBlogEntries.json--drop
3. NowconnecttotheMongoprocessfromtheMongoshell,bytypingthefollowingcommandfromtheoperatingsystemshell:
$mongo
4. Onceconnected,getafeelofthedocumentsintheuserBlogcollectionasfollows:
>db.userBlog.findOne()
5. Theblog_textfieldisofinterest,andthisistheoneonwithwhichwewillbecreatingatextsearchindex.
6. Createatextindexontheblog_textfieldofthedocumentasfollows:
>db.userBlog.ensureIndex({'blog_text':'text'})
7. NowexecutethefollowingsearchonthecollectionfromtheMongoshell.Thefollowingwayistheonlywaytoperformatextsearchinversion2.4.Inversion2.6,thoughitworksfine,itisdeprecated.
$db.runCommand({'text':'userBlog',search:'plotzoo'})
8. Lookattheresultsobtained.9. Executeanothersearchasfollows:
$db.runCommand({'text':'userBlog',search:"Zoo-plot"})
Howitworks…Letusnowseehowitallworks.Textsearchisdonebyaprocesscalledreverseindexes.Insimpleterms,thisisamechanismwherethesentencesarebrokenintowordsandthen,theseindividualwordspointbacktothedocumentthattheybelongto.Theprocessisnotstraightforward,though;letusseewhathappensinthisprocessstepbystepatahighlevel:
1. Considerthesentence,“Iplayedcricketyesterday”.Thefirststepistobreakthissentenceintotokensas[“I”,“played”,“cricket”,“yesterday”].
2. Next,thestopwordsfromthebroken-downsentenceareremoved,andweareleftwithasubsetofthese.Stopwordsarealistofverycommonwordsthatareeliminated,asitmakesnosensetoindexthemsincetheycanpotentiallyaffecttheaccuracyofthesearchwhenusedinthesearchquery.Inthiscase,wewillbeleftwiththewords[“played”,“cricket”,“yesterday”].Stopwordsarelanguage-specificandwillbedifferentfordifferentlanguages.
3. Finally,thesewordsarestemmedtotheirbasewords.Inthiscase,itwillbe[“play”,“cricket”,“yesterday”].Stemmingistheprocessofreducingawordtoitsroot.Forinstance,thewordsplay,playing,played,andplayshavethesamerootwordplay.Therearealotofalgorithmsandframeworkspresentforstemmingawordtoitsrootform.Formoreinformationonstemmingandthealgorithmsusedforthispurpose,visithttp://en.wikipedia.org/wiki/Stemming.Similartoeliminatingthestopwords,thestemmingalgorithmislanguage-dependent.TheexamplesgivenherewerefortheEnglishlanguage.
Ifwelookattheindexcreationprocess,itisasfollows:
>db.userBlog.ensureIndex({'blog_text':'text'})
ThekeygivenintheJSONargumentisthenameofthefieldonwhichthetextindexistobecreated,andthevaluewillalwaysbetextdenotingthattheindextobecreatedisatextindex.Oncetheindexiscreated,atahighlevel,theprecedingthreestepsgetexecutedonthecontentofthefieldonwhichtheindexiscreatedineachdocument,andareverseindexiscreated.Youmayalsochoosetocreateatextindexonmorethanonefield.Supposewehadtwofields,blog_text1andblog_text2;wecancreatetheindexas{'blog_text1':'text','blog_text2':'text'}.Thevalue{'$**':'text'}createsanindexonallfieldsofthedocument.
Finally,weexecutedthesearchoperationbyinvokingthefollowingcommand:
db.runCommand({'text':'userBlog',search:'plotzoo'})
TheprecedingcommandrunsthetextsearchontheuserBlogcollectionandthesearchstringusedisplotzoo.Thissearchesforthevalueplotorzoointhetext,inanyorder.Ifwelookattheresults,weseethatwehavetwomatcheddocumentsandthesedocumentsareorderedbythescore.Thisscoretellsushowrelevantthedocumentsearchedis;thehigherthescore,themoreitsrelevance.Inourcase,oneofthedocumentshadboththe
wordsplotandzooinitandthusgotahigherscorethananotherdocument,asweseeinthefollowingexample.Theresultalsocontainsashortsummaryofthetotalnumberofdocumentsscannedtogettheresults,thenumberofresults,andthetotaltimetakentosearch.
{
"queryDebugString":"bought|zoo||||||",
"language":"english",
"results":[
{
"score":2.6353665865384617,
"obj":{
"_id":5,
...
}
},
{
"score":0.5263157894736842,
"obj":{
"_id":6,
...
}
}
],
"stats":{
"nscanned":3,
"nscannedObjects":0,
"n":2,
"nfound":2,
"timeMicros":119
},
"ok":1
}
Inversion2.6,therecommendedwaytoqueryforthesameresultisasfollows:
>db.userBlog.find({$text:{$search:'plotzoo'}})
NotethatifwecomparetheresultoforderingwiththepreviousexecutionusingrunCommand,weseethatheretheresultsaregivenintheascendingorderofthescore.Also,thescoreisnotavailableintheresultthatwasavailableinthepreviousrunusingrunCommand.Togetthescoresintheresult,weneedtomodifythequeryabit,asfollows:
>db.userBlog.find({$text:{$search:'plotzoo'}},{score:{$meta:
"textScore"}})
Nowwehaveanadditionaldocumentprovidedinthefindmethodthatasksforthescorecalculatedforthetextmatch.Theresultsarestillnotorderedinthedescendingorderofthescore.Letusseehowtosorttheresultsbyscore:
>db.userBlog.find({$text:{$search:'plotzoo'}},{score:{$meta:
"textScore"}}).sort({score:{$meta:"textScore"}})
Aswecansee,thequeryisthesameasbefore.It’sjusttheadditionalsortfunctionweaddedthatwillsorttheresultsbythedescendingorderofthescore.
Whenthesearchwasexecutedas{'text':'userBlog',search:"Zoo-plot"},itsearchedforallthedocumentsthatcontainthewordZoobutdon’tcontainthewordplot.Thuswegetonlyoneresult.The-signisfornegationandleavesoutthedocumentfromthesearchresultcontainingthatword.However,donotexpecttofindalldocumentswithoutthewordplotbyjustgiving-plotinthesearch.
Ifwelookatthecontentsreturnedasaresultofthesearch,theycontainthematcheddocumentsinentirety.Ifwearenotinterestedinfulldocuments,butonlyafewsectionsofadocument,wecanuseprojectiontogetthedesiredfieldsofthedocument.Forinstance,usethefollowingquery:
>db.runCommand({'text':'userBlog',search:"zooplot",project:{_id:1}})
ThiswillbethesameasfindingallthedocumentsintheuserBlogcollectioncontainingthewordsZooorplot,buttheresultswillcontainthe_idfieldfromtheresultingdocuments.
Ifmultiplefieldsareusedtocreateanindex,thenwemayhavedifferentweightsfordifferentfieldsinthedocument.Forinstance,ifblog_text1andblog_text2aretwofields,wecreateanindex;wewantblog_text1givenahigherweightthanblog_text2,sowecreatetheindexasfollows:
>db.collection.ensureIndex(
{
blog_text1:"text",
blog_text2:"text"
},
{
weights:{
blog_text1:2,
blog_text2:1,
},
name:"MyCustomIndexName"
}
)
Thisgivesthecontentinblog_text1twiceasmuchweightasthatinblog_text2.Thus,ifawordisfoundintwodocuments,butispresentintheblog_text1fieldofthefirstdocumentandblog_text2oftheseconddocument,thenthescoreoffirstdocumentwillbemorethanthatofthesecond.NotethatwehavealsoprovidedthenameoftheindexusingthenamefieldasMyCustomIndexName.
WealsoseefromthelanguagekeythatthelanguageinthiscaseisEnglish.MongoDBsupportsvariouslanguagestoimplementtextsearch.Languagesareimportantwhenindexingthecontent,astheydecidethestopwords;thestemmingofwordsislanguage-specificaswell.Visithttp://docs.mongodb.org/manual/reference/command/text/#text-search-languagesformoredetailsonthelanguagessupportedbyMongofortextsearch.Sohowdowechoosethelanguagewhilecreatingtheindex?Bydefault,ifnothingisprovided,theindexiscreatedassumingthatthelanguageisEnglish.However,ifweknowthelanguageisFrench,wecreatetheindexasfollows:
>db.userBlog.ensureIndex({'text':'text'},{'default_language':'french'})
SupposewehadoriginallycreatedtheindexusingtheFrenchlanguage,thegetIndexesmethodwillreturnthefollowingdocument:
[
{
"v":1,
"key":{
"_id":1
},
"ns":"test.userBlog",
"name":"_id_"
},
{
"v":1,
"key":{
"_fts":"text",
"_ftsx":1
},
"ns":"test.userBlog",
"name":"text_text",
"default_language":"french",
"weights":{
"text":1
},
"language_override":"language",
"textIndexVersion":1
}
]
However,ifthelanguagewasdifferentonaper-documentbasis,whichisprettycommoninscenariossuchasblogs,wehaveawayout.Ifwelookattheprecedingdocument,thevalueofthelanguage_overridefieldislanguage.Thismeansthat,onaper-documentbasis,wecanstorethelanguageofthecontentusingthisfield.Initsabsence,thevaluewillbeassumedasthedefaultvalue;Frenchintheprecedingcase.Thus,wecanhave:
{_id:1,language:'english',text:….}//LanguageisEnglish
{_id:2,language:'german',text:….}//LanguageisGerman
{_id:3,text:….}//Languageisthedefaultone;Frenchinthiscase
There’smore…TouseMongoDBtextsearchinproduction,youwouldneedversion2.6.Tillversion2.4,theMongoDBtextsearchwasinbeta.IntegratingMongoDBwithothersystemssuchasSolrandElasticsearchisawisechoicetomakefornow,atleasttillthetextsearchfeatureinMongomatures.Inthenextrecipe,wewillseehowtointegrateMongowithElasticsearch,usingtheMongoconnector.
SeealsoFormoreinformationonthe$textoperator,visithttp://docs.mongodb.org/manual/reference/operator/query/text/
IntegratingMongoDBwithElasticsearchforafull-textsearchMongoDBhasintegratedtextsearchfeatures,aswesawinthepreviousrecipe.However,therearemultiplereasonswhyonewouldnotusetheMongotextsearchfeatureandwouldfallbacktoconventionalsearchenginessuchasSolrorElasticsearch.Thefollowingareafewofthereasons:
Thetextsearchfeatureisproduction-readyinversion2.6.Inversion2.4,itwasintroducedinbeta,whichisnotsuitableforproductionusecases.ProductssuchasSolrandElasticsearcharebuiltontopofLucene,whichhasprovenitselfinthesearchenginearena.SolrandElasticsearchareprettystableproductstoo.YoumightalreadyhaveexpertiseonproductssuchasSolrandElasticsearchandwouldliketousethemasfull-textsearchenginesratherthanMongoDB.SomeparticularfeaturethatyourapplicationmightrequiremaybemissinginMongoDBsearch,.
SettingupadedicatedsearchenginedoesneedadditionaleffortstointegrateitwithaMongoDBinstance.Inthisrecipe,wewillseehowtointegrateaMongoDBinstancewiththesearchengineElasticsearch.
WewillbeusingtheMongoconnectorforintegrationpurpose.Itisanopensourceprojectthatisavailableathttps://github.com/10gen-labs/mongo-connector.
GettingreadyRefertotheInstallingPyMongorecipeinChapter3,ProgrammingLanguageDrivers,toinstallandsetupPython.ThetoolpipisusedtogettheMongoconnector.However,ifyouareworkingontheWindowsplatform,thestepstoinstallpipwerenotmentionedearlier.Visithttps://sites.google.com/site/pydatalog/python/pip-for-windowstogetpipforWindows.
Theprerequisitesforstartingasingleinstanceareallweneedforthisrecipe.However,inthisrecipe,wewillstarttheserverasasinglenodereplicasetfordemonstrationpurpose.
DownloadtheBlogEntries.jsonfilefromthebook’swebsiteandkeepitonyourlocaldrive,readytobeimported.
DownloadElasticsearchforyourtargetplatformfromhttp://www.elasticsearch.org/overview/elkdownloads/.Extractthedownloadedarchive,andfromtheshell,gotothebindirectoryoftheextraction.
Wewillbegettingthemongo-connectorsourcefromgithub.comandrunningit.AGitclientisneededforthispurpose.DownloadandinstalltheGitclientonyourmachine.Visithttp://git-scm.com/downloadsandfollowtheinstructionstoinstallGitonyourtargetoperatingsystem.IfyouarenotcomfortableinstallingGitonyouroperatingsystem,thenthereisanalternativeavailablethatletsyoudownloadthesourceasanarchive.
Visithttps://github.com/10gen-labs/mongo-connector.Here,youwillgetanoptionthatletsyoudownloadthecurrentsourceasanarchive,whichwecanthenextractonourlocaldrive.Thefollowingscreenshotshowsthedownloadoptionavailableonthebottom-rightcornerofthescreen:
NoteNotethatwecanalsoinstallmongo-connectorinaveryeasywayusingpipasfollows:
pipinstallmongo-connector
However,theversioninPyPIisveryold,withnotmanyfeaturessupportedandthus,usingthelatestversionfromtherepositoryisrecommended.
Justlikeinthepreviousrecipe,wherewesawtextsearchinMongo,wewillusethefivedocumentstotestoursimplesearch.DownloadandkeepBlogEntries.json
Howtodoit…1. Atthispoint,itisassumedthatPython,PyMongo,andpipforyouroperatingsystem
platformareinstalled.Wewillnowgetmongo-connectorfromthesource.IfyouhavealreadyinstalledtheGitclient,wewillbeexecutingthefollowingstepsontheoperatingsystemshell.Ifyouhavedecidedtodownloadtherepositoryasanarchive,youmayskipthisstep.Gotothedirectorywhereyouwouldliketoclonetheconnectorrepository,andexecutethefollowingcommands:
$gitclonehttps://github.com/10gen-labs/mongo-connector.git
$cdmongo-connector
$pythonsetup.pyinstall
2. TheprecedingsetupwillalsoinstalltheElasticsearchclientthatwillbeusedbythisapplication.
3. WewillnowstartasingleMongoinstance,butasareplicaset.Fromtheoperatingsystemconsole,executethefollowingcommand:
$mongod--dbpath/data/mongo/db--replSettextSearch--smallfiles--
oplogSize50
4. StartaMongoshellandconnecttothestartedinstanceasfollows:
$mongo
5. FromtheMongoshell,initiatethereplicasetasfollows:
>rs.initiate()
6. Thereplicasetwillbeinitiatedinafewmoments.Meanwhile,wecanproceedtostarttheElasticsearchserverinstance.
7. Executethefollowingcommandfromthecommandlineaftergoingtothebindirectoryoftheextractedelasticsearcharchive:
$elasticsearch
8. Wewon’tbegettingintoElasticsearchsettingsandwillstartitinthedefaultmode.9. Oncestarted,enterhttp://localhost:9200/_nodes/process?prettyinthe
browser.10. IfweseeaJSONdocument,suchasthefollowing,givingtheprocessdetails,we
havesuccessfullystartedElasticsearch:
{
"cluster_name":"elasticsearch",
"nodes":{
"p0gMLKzsT7CjwoPdrl-unA":{
"name":"Zaladane",
"transport_address":"inet[/192.168.2.3:9300]",
"host":"Amol-PC",
"ip":"192.168.2.3",
"version":"1.0.1",
"build":"5c03844",
"http_address":"inet[/192.168.2.3:9200]",
"process":{
"refresh_interval":1000,
"id":5628,
"max_file_descriptors":-1,
"mlockall":false
}
}
}
}
11. OncetheElasticsearchserverandMongoinstanceareupandrunning,andthenecessaryPythonlibrariesinstalled,wewillstarttheconnectorthatwillsyncthedatabetweenthestartedMongoinstanceandtheElasticsearchserver.
Forthesakeofthistest,wewillbeusingtheuser_blogcollectioninthetestdatabase.Thefieldonwhichwewouldliketohavetextsearchimplementedistheblog_textfieldinthedocument.
12. StarttheMongoconnectorfromtheoperatingsystemshellasfollows.ThefollowingcommandwasexecutedwiththeMongoconnector’sdirectoryasthecurrentdirectory:
$pythonmongo_connector/connector.py-mlocalhost:27017-t
http://localhost:9200-ntest.user_blog--fieldsblog_text-d
mongo_connector/doc_managers/elastic_doc_manager.py
13. ImporttheBlogEntries.jsonfileintothecollectionusingthemongoimportutilityasfollows.Thecommandisexecutedwiththe.jsonfilepresentinthecurrentdirectory:
$mongoimport-dtest-cuser_blogBlogEntries.json--drop
14. Openabrowserofyourchoiceandenterhttp://localhost:9200/_search?q=blog_text:facebookinit.
15. Youshouldseesomethinglikethefollowingscreenshotinthebrowser:
Howitworks…Basically,Mongoconnectortailstheoplogtofindnewupdatesthatitpublishestoanotherendpoint.WeusedElasticsearchinourcase,butitcouldevenbeSolr.YoumaychoosetowriteacustomDocManagerthatwouldpluginwiththeconnector.Formoredetails,visithttps://github.com/10gen-labs/mongo-connector/wiki.TheReadmeforhttps://github.com/10gen-labs/mongo-connectorgivessomedetailedinformationaswell.
Wegavetheconnectorthe-m,-t,-n,--fields,and-doptions.Theirmeaningasfollows:
Option Description
-m TheURLoftheMongoDBhosttowhichtheconnectorconnectstogetthedatatobesynchronized.
-t
ThetargetURLofthesystemwithwhichthedataistobesynchronized;Elasticsearchinthiscase.TheURLformatwilldependonthetargetsystem.ShouldyouchoosetoimplementyourownDocManager,theformatwillbeonethatyourDocManagerunderstands.
-n
Thisisthenamespacethatwewouldliketokeepsynchronizedwiththeexternalsystem.Theconnectorwilljustbelookingforchangesinthesenamespaceswhiletailingtheoplogfordata.Thevaluewillbeseparatedbycommasifmorethanonenamespaceistobesynchronized.
--
fields
Thesearethefieldsfromthedocumentthatwouldbesenttotheexternalsystem.Inourcase,itdoesn’tmakesensetoindextheentiredocumentandwasteresources.Itisrecommendedtoaddtotheindexjusttothefieldswhereyouwouldliketoaddtextsearchsupport.Theidentifier_idfieldandthenamespaceofthesourcearealsopresentintheresult,aswecanseeintheprecedingscreenshot.The_idfieldcanthenbeusedtoquerythetargetcollection.
-d Thisisthedocumentmanagertobeused;inourcase,wehaveusedtheElasticsearch’sdocumentmanager.
Formoresupportedoptions,refertothereadmeoftheconnector’spageonGitHub.
OncetheinsertisexecutedontheMongoDBserver,theconnectordetectsthenewlyaddeddocumentstothecollectionofitsinterest,thatis,user_blog,andstartssendingthedatatobeindexedfromthenewlyaddeddocumentstoElasticsearch.Toconfirmtheaddition,weexecuteaqueryinthebrowsertoviewtheresults.
Elasticsearchwillcomplainaboutindexnameswithuppercasecharactersinthem.Themongoconnectordoesn’ttakecareofthisandthus,ifthenameofthecollectionhastobeinlowercase(forexample,userBlog),itwillfail.
There’smore…WehavenotdoneanyadditionalconfigurationonElasticsearch,asthatwasnottheobjectiveofthisrecipe.WeweremoreinterestedinintegratingMongoDBandElasticsearch.YouwillhavetorefertotheElasticsearchdocumentationformoreadvancedconfigoptions.IfintegrationwithElasticsearchisrequired,thereisaconceptcalledriversinElasticsearch,thatcanbeusedaswell.RiversareElasticsearch’swaytogetdatafromanotherdatasource.ForMongoDB,thecodeforarivercanbefoundathttps://github.com/richardwilly98/elasticsearch-river-mongodb/.README.mdinthisrepositoryhasstepsonhowtosetup.
Inthischapter,weexploredarecipenamedImplementingtriggersinMongousingoplog,onhowtoimplementtrigger-likefunctionalitiesusingMongo.ThisconnectorandtheMongoDBriverforElasticsearchrelyonthesamelogictogetthedataoutofMongoasandhowitisneeded.
SeealsoTheElasticsearchdocumentationathttp://www.elasticsearch.org/guide/en/elasticsearch/reference/
Chapter6.MonitoringandBackupsInthischapter,wewillbetakingalookatthefollowingrecipes:
SigningupforMMSandsettinguptheMMSmonitoringagentManagingusersandgroupsontheMMSconsoleMonitoringMongoDBinstancesonMMSSettingupmonitoringalertsonMMSBackingupandrestoringdatainMongousingout-of-the-boxtoolsConfiguringtheMMSbackupserviceManagingbackupsintheMMSbackupservice
IntroductionMonitoringandbackupsareimportantaspectsofanymission-criticalsoftwareinproduction.Monitoringproactivelyletsustakeactionswheneveranyabnormaleventoccursinthesystem,whichcancompromisethedataconsistency,availability,ortheperformanceofthesystem.Issuesmightcometolightafterhavingasignificantimpactintheabsenceofproactivemonitoringofthesystems.Wecoveredadministration-relatedrecipesinChapter4,Administration,andbothmonitoringandbackupactivitiesarepartofit.However,theydemandaseparatechapter,asthecontenttobecoveredisextensive.Inthischapter,wewillseehowtomonitorvariousparametersandsetupalertsforvariousparametersofyourMongoDBcluster,usingtheMongoDBMonitoringService(MMS).Wewilllookatsomemechanismstobackupthedatausingtheout-of-the-boxtoolsprovidedandalsousingtheMMSbackupservice.
SigningupforMMSandsettinguptheMMSmonitoringagentMMSisacloud-basedoron-premisesservicethatenablesyoutomonitoryourMongoDBcluster.Theon-premiseversionisavailablewiththeEnterprisesubscriptiononly.Itgivesyouonecentralplacethatletstheadministratorsmonitorthehealthoftheserverinstancesandtheboxesonwhichtheinstancesarerunning.Inthisrecipe,wewillseewhatthesoftwarerequirementsareandhowtosetupMMSforMongo.
GettingreadyWewillbestartingasingleinstanceofmongod,whichwewillbeusingforthepurposeofmonitoring.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tostartaMongoDBinstanceandconnecttoitfromaMongoshell.Themonitoringagent,usedtosendthestatisticsoftheMongoinstancetothemonitoringservice,usesPythonandPyMongo.RefertotheInstallingPyMongorecipeinChapter3,ProgrammingLanguageDrivers,toknowmoreabouthowtoinstallPythonandPyMongo,thePythonclientofMongoDB.
Howtodoit…1. Ifyoudon’talreadyhaveanMMSaccount,thenloginathttps://mms.mongodb.com/
andsignupforanaccount.Onsigningupandloggingin,weshouldseethefollowingpage:
2. ClickontheGetStartedbuttonunderMonitoring.3. OncewereachtheDownloadAgentoptioninthemenu,clickontheappropriateOS
platformtodownloadtheagent.Followtheinstructionsgiven,afterselectingtheappropriateOSplatform.NotedowntheAPIkeytoo.Forexample,iftheWindowsplatformisselected,wewouldseethefollowingpage:
4. Oncetheinstallationiscomplete,openthemonitoring-agent.configfile,whichwillbepresentintheconfigurationfolderselectedwhileinstallingtheagent.
5. LookoutforthemmsApiKeykeyinthefileandsetitsvaluetotheAPIkeythatwasnoteddownearlierinstep3.
6. Tostartaservicemanually,wehavetogotoservices.msconMSWindows,whichcanbedonebytypingservices.mscintheRundialog(Windows+R).TheservicewillbenamedMMSMonitoringAgent.Onthewebpage,clickontheVerifyAgentbutton.Ifallgoeswell,thestartedagentwillbeverifiedandthesuccessmessagewillbeshown.
7. Thenextstepistoconfigurethehost.Thishostistheonethatisseenfromtheagent’sperspective,runningontheorganization/individual’sinfrastructure.Thefollowingscreenshowsthescreenusedfortheadditionofahost.Thehostnameistheinternalhostname(thehostnameontheclient’snetwork);theMMSontheclouddoesn’tneedtoreachouttotheMongoDBprocesses.ItistheagentthatcollectsthedatafromtheseMongoDBprocessesandsendsthedatatotheMMSservice.
8. Oncethehostdetailsareadded,clickontheVerifyHostbutton.Oncethe
verificationisdone,clickontheStartMonitoringbutton.
WehavesuccessfullysetupMMSandaddedonehosttoit,whichwouldbemonitored.
Howitworks…Inthisrecipe,wehavesetuptheMMSagentandmonitoringforastandaloneMongoDBinstance.Theinstallationandsetupprocessisprettysimple.Wealsoaddedastandaloneinstanceandallwasok.
Supposewehaveareplicasetupandrunning(refertotheStartingmultipleinstancesaspartofareplicasetrecipeinChapter1,InstallingandStartingtheMongoDBServer,formoredetailsonhowtostartareplicaset),andthethreemembersarelisteningtoports27000,27001,and27002,respectively.Refertostep7intheHowtodoit…section,wherewesetuponestandalonehost.IfweselectReplicaSetinthedropdownforHostType,andfortheinternalhostnamewegiveavalidhostnameofanymemberofthereplicaset(inmycase,Amol-PCandport27001weregiven,whichisasecondaryinstance),allotherinstanceswillautomaticallybediscoveredandtheywillbevisibleunderthehosts,asshowninthefollowingscreenshot:
Wedidn’tseewhatistobedonewhensecurityisenabledonthecluster,whichisprettycommoninproductionenvironments.Ifauthenticationisenabled,weneedpropercredentialsfortheMMSagenttogatherthestatistics.TheDBusernameandpasswordthatwegivewhileaddinganewhost(step7oftheHowtodoit…section)shouldhaveaminimumofclusterAdminandreadAnyDatabaseroles.
There’smore…WhatwesawinthisrecipewassettingupanMMSagentandcreatinganaccountfromtheMMSconsole.However,wecanaddgroupsandusersfortheMMSconsoleasadministrators,grantingvarioususersprivilegesforperformingvariousoperationsondifferentgroups.Inthenextrecipe,wewillthrowsomelightonuserandgroupmanagementintheMMSconsole.
ManagingusersandgroupsontheMMSconsoleInthepreviousrecipe,wesawhowtosetupanMMSaccountandhowtosetupanMMSagent.Inthisrecipe,wewillthrowsomelightonhowtosetupthegroupsanduseraccesstotheMMSconsole.
GettingreadyRefertothepreviousrecipeforsettinguptheagentandtheMMSaccount.Thisistheonlyprerequisiteforthisrecipe.
Howtodoit…1. StartbynavigatingtoAdministration|Usersontheleft-handsideofthescreen,as
showninthefollowingscreenshot:
2. Hereyoucanviewtheexistingusersandalsoaddnewusers.OnclickingontheAddUserbutton(circledinthetop-rightcornerofthepreviousscreenshot),youshouldseethefollowingpop-upwindowallowingyoutoaddanewuser:
Theprecedingscreenwillbeusedtoaddusers.Takenoteofthevariousavailableroles.
3. Similarly,bynavigatingtoAdministration|MyGroups,youcanviewandalsoaddnewgroups,byclickingontheAddGroupbutton.Inthetextbox,provideanameforthegroup.Rememberthatthenameofthegroupyouentershouldbeavailableglobally.ThegivennameofthegroupshouldbeuniqueacrossalluserbasesofMMSandnotjustyouraccount.
Whenanewgroupiscreated,itwillbevisibleintheupper-leftcornerinadropdownforallthegroups,asshowninthefollowingscreenshot:
Youcanswitchbetweenthegroupsusingthisdropdown,whichshouldshowallthedetailsandstatsrelevanttotheselectedgroup.
NoteRememberthatagrouponcecreatedcannotbedeleted.Sobecarefulwhilecreatingone.
Howitworks…Thetaskswecompletedintherecipeareprettystraightforwardanddon’tneedalotofexplanation,exceptforonequestion.Whenandwhydoweaddagroup?ItiswhenwewanttosegregateourMongoDBinstancesbydifferentenvironmentsorapplications.TherewillbeadifferentMMSagentrunningforeachgroup.Creatinganewgroupisnecessarywhenwewanttohaveseparatemonitoringgroupsfordifferentenvironmentsofanapplication(development,QA,production,andsoon),andeachgrouphasdifferentprivilegesfortheusers.Thatis,thesameagentcannotbeusedfortwodifferentgroups.Ifwerememberfromthepreviousrecipe,whileconfiguringtheMMSagent,wegiveitanAPIkeyuniquetothegroup.ToviewtheAPIkeyforthegroup,selecttheappropriategroupfromthedropdownonthetop(ifyouruserhasaccessonlytoonegroup,thedropdownwon’tbeseen),gotoAdministration|GroupSettings,asshowninthefollowingscreenshot.ThegroupIDandtheAPIkeywillbothbeshownatthetopofthepage.
Notethatnotalluserroleswillseethisoption.Forexample,userswithread-onlyprivilegescanonlypersonalizetheirprofile,andmostoftheotheroptionswillnotbevisible.
MonitoringMongoDBinstancesonMMSThepreviousrecipes,SigningupforMMSandsettinguptheMMSmonitoringagentandManagingusersandgroupsintheMMSconsole,showedushowtosetupanMMSaccountandagent,addhosts,andmanageuseraccesstotheMMSconsole.ThecoreobjectiveofMMSismonitoringthehostinstances,whichisstillnotdiscussed.Inthisrecipe,wewillbeperformingsomeoperationsonthehostthatweaddedtoMMSinthefirstrecipe,andwewillmonitoritfromtheMMSconsole.
GettingreadyFollowtherecipeSigningupforMMSandsettinguptheMMSmonitoringagentandthatisprettymuchwhatisneededforthisrecipe.Youmaychoosetohaveastandaloneinstanceorareplicaset,eitherwaysisfine.Also,openaMongoshellandconnecttotheprimaryinstancefromit(itisareplicaset).
Howtodoit…1. StartbyloggingintotheMMSconsoleandclickingonDeploymentintheupper-left
corner,andthenagainontheDeploymentlinkinthesubmenu,asshowninthefollowingscreenshot:
2. Clickingononeofthehostnamesshown,wewillseealargevarietyofgraphsshowingvariousstatistics.Inthisrecipe,wewillanalyzeamajorityofthese.
3. Openthebundledownloadedforthebook.InChapter4,Administration,weusedaJavaScriptfilenamedKeepServerBusy.jstokeeptheserverbusywithsomeoperations.Wewillbeusingthesamescriptthistimearound.
4. Intheoperatingsystemshell,executethefollowingcommandwiththe.jsfileinthecurrentdirectory.Theshellconnectstotheport,inmycaseport27000,fortheprimary.
$mongoKeepServerBusy.js--port27000--quiet
5. Oncestarted,keepitrunningandgiveit5–10minutesbeforeyoustartmonitoringthegraphsontheMMSconsole.
Howitworks…TheUnderstandingthemongostatandmongotoputilitiesutilitiesrecipeinChapter4,Administration,demonstratedhowtheseutilitiescanbeusedtogetthecurrentoperationsandresourceutilization.Thatisafairlybasicandhelpfulwaytomonitoraparticularinstance.MMS,however,givesusoneplacetomonitortheMongoDBinstancewithprettyeasy-to-understandgraphs.MMSalsogivesushistoricalstats,whichmongostatandmongotopcannotgive.
Beforewegoaheadwiththeanalysisofthemetrics,IwouldliketomentionthatincaseofMMSmonitoring,thedataisnotqueriednorsentoutoverthepublicnetwork.Itisjustthestatisticsthataresentoverasecurechannelbytheagent.Thesourcecodefortheagentisopensourceandisavailableforexaminationifneeded.Themongodserversneednotbeaccessiblefromthepublicnetwork,asthecloud-basedMMSservicenevercommunicatestotheserverinstancesdirectly.ItistheMMSagentthatcommunicatestotheMMSservice.Typically,oneagentisenoughtomonitorseveralservers,unlessyouplantosegregatethemintodifferentgroups.Also,itisrecommendedtoruntheagentonadedicatedmachine/virtualmachineandnotshareitwithanyofthemongodormongosinstances,unlessitisalesscrucialtestinstancegroupyouaremonitoring.
Letusseesomeofthesestatisticsontheconsole;westartwiththememoryrelatedones.Thefollowinggraphshowstheresident,mapped,andvirtualmemory:
Asseeninthepreviousgraph,theresidentmemoryforthedatasetis82MB,whichisverylow,anditistheactualphysicalmemoryusedupbythemongodprocess.Thiscurrentvalueissignificantlybelowthefreememoryavailable,andgenerally,thiswillincreaseoveraperiodoftimeuntilitreachesapointwhereithasusedupalargechunkofthetotalavailablephysicalmemory.Thisisautomaticallytakencareofbythemongodserverprocess,andwecan’tforceittouseupmorememory,eventhoughitisavailableonthemachineitisrunningon.
Themappedmemory,ontheotherhand,isaboutthetotalsizeofthedatabase,andismappedbyMongoDB.Thissizecanbe(andusuallyis)muchhigherthanthephysical
memoryavailable,whichenablesthemongodprocesstoaddresstheentiredatasetasitispresentinmemoryevenifitisn’tpresent.MongoDBoffloadsthisresponsibilityofmappingandloadingofdatatoandfromthedisktotheunderlyingoperatingsystem.WheneveramemorylocationisaccessedanditisnotavailableintheRAM(thatis,theresidentmemory),theoperatingsystemfetchesthepageintomemory,evictingsomepagetomakespaceforthenewpageifnecessary.Whatexactlyisamemory-mappedfile?Letustrytoseewithasuper-scaled-downversion.Supposewehaveafileof1KB(1024bytes)andtheRAMisonly512bytes,thenobviouslywecannothavethewholefileinthememory.However,youcanasktheoperatingsystemtomapthisfiletotheavailableRAMinpages.Supposethepageisof128bytes,thenthetotalfileiseightpages(128*8=1024).However,theOScanloadfourpagesonly,andassumethatitloadedthefirstfourpages(upto512bytes)inmemory.Whenweaccessthebytenumber200,itisokandfoundinmemory,asitispresentonpage2.Butwhatifweaccessbyte800,whichislogicallyonpage7,whichisnotloadedinmemory?WhattheOSdoesis,ittakesonepageoutfromthememoryandloadspage7,whichcontainsbytenumber800.MongoDBasanapplicationgetsafeelthateverythingwasloadedinmemoryandwasaccessedbythebyteindex,butactuallyitwasn’t,andOStransparentlydidtheworkforus.Asthepageaccessedwasnotpresentinmemoryandwehadtogotothedisktoloaditinmemory,itiscalledapagefault.
Gettingbacktothestatsshowninthegraph,thevirtualmemorycontainsallthememoryusage,includingthemappedmemory,plusanyadditionalmemoryused,suchasthememoryassociatedwiththethreadstackassociatedwitheachconnection,andsoon.Ifjournalingisenabled,thissizewilldefinitelybemorethantwicethatofthemappedmemory,asjournalingtoowillhaveaseparatememorymappingforthedata.Thuswehavetwoaddressesmappingthesamememorylocation.Thisdoesn’tmeanthatthepagewillbeloadedtwice.Itjustmeansthattwodifferentmemorylocationscanbeusedtoaddressthesamephysicalmemory.Veryhighvirtualmemorymightneedsomeinvestigations.Thereisnopredeterminedvalueforwhattoohighoralowvalueis;generallythesevaluesaremonitoredforyoursystemundernormalcircumstanceswhenyouarehappywiththeperformanceofyoursystem.Thesebenchmarkvaluesshouldthenbecomparedwiththefiguresseenwhenthesystemperformancegoesdown,andthenappropriateactionscanbetaken.
Aswesawearlier,pagefaultsarecausedwhenanaccessedmemorylocationisnotpresentintheresidentmemory,causingOStoloadthepagefromthememory.ThisIOactivitywilldefinitelycausetheperformancetoreduce,andtoomanypagefaultscanbringdownthedatabaseperformancedramatically.Thefollowinggraphshowsquiteafewpagefaultsoccurringperminute.However,ifthediskusedisSSDsinsteadofthespinningdisk,thehitintermsofseektimefromdrivemightnotbesignificantlyhigh.
Alargenumberofpagefaultsusuallyoccurwhenenoughphysicalmemoryisn’tavailabletoaccommodatethedataset,andtheoperatingsystemneedstogetthedatafromthediskintothememory.NotethatthisstatshownearlieristakenonanMSWindowsplatformandthisgraphmightseemhighforaverytrivialoperation.Thevalueshownhereisthesumofhardandsoftpagefaultsanddoesn’treallygiveatruefigureofhowgood(orbad)thesystemisdoing.ThesefigureswouldbedifferentonaUnix-basedoperatingsystem.ThereisaJIRAopenatthetimeofwritingthisbook,whichreportsthisproblem(https://jira.mongodb.org/browse/SERVER-5799).
Onethingyoumightneedtorememberisthat,inproductionsystems,MongoDBdoesn’tworkwellwithNUMAarchitectureandyoumightseealotofpagefaultsoccurringeveniftheavailablememoryseemstobehighenough.Refertohttp://docs.mongodb.org/manual/administration/production-notes/formoredetails.
Thereisanadditionalgraph,asseennext,whichgivessomedetailsaboutnonmappedmemory.Aswesawearlierinthissection,therearethreetypesofmemory,namely,mapped,resident,andvirtual.Mappedmemoryisalwayslessthanvirtualmemory.Virtualmemorywillbemorethantwicethatofmappedmemoryifjournalingisenabled.Ifwelookatthegraphgivenearlierinthissection,weseethatthemappedmemoryis192MB,whereasthevirtualmemoryis532MB.Asjournalingisenabled,thememoryismorethantwicethatofthemappedmemory.Whenjournalingisenabled,thesamepageofdataismappedtwiceinmemory.Notethatthepageisphysicallyloadedonlyonce;itisjustthatthesamelocationcanbeaddressedusingtwodifferentaddresses.
Letusfindthedifferencebetweenthevirtualmemory,whichis532MBandtwicethemappedmemory,whichis2*192=384MB.Thedifferencebetweenthesefiguresis148MB(532-384).
Whatweseenextistheportionofvirtualmemorythatisnotmapped.Thisvalueisthesameaswhatwejustcalculated.
Asmentionedearlier,ahighorlowvaluefornonmappedmemoryisnotdefined;however,whenthevaluereachesGBs,wemighthavetoinvestigate,ifthepossiblenumberofopenconnectionsishigh,andcheckifthereisaleakwithclientapplicationsnotclosingthemafterusingit.Thereisagraphthatgivesusthenumberofconnectionsopenanditlooksasfollows:
Onceweknowthenumberofconnectionsandfindittoohighascomparedtothenormalexpectedcount,wewillneedtofindtheclientswhohaveopenedtheconnectionstothatinstance.WecanexecutethefollowingJavaScriptcodefromtheshelltogetthosedetails.Unfortunately,atthetimeofwritingthisbook,MMSdidn’thavethisfeaturetolistouttheclientconnectiondetails.
testMon:PRIMARY>varcurrentOps=db.currentOp(true).inprog;
currentOps.forEach(function(c){
if(c.hasOwnProperty('client')){
print('Client:'+c.client+",connectionidis:"+c.desc);
}
//Getotherdetailsasneeded
});
Thedb.currentOpmethodreturnsalltheidleandsystemoperationsintheresult.Wetheniteratethroughalltheresultsandprintouttheclienthostandtheconnectiondetails.AtypicaldocumentintheresultofthecurrentOpmethodlookslikethefollowingcodesnippet.Youmaychoosetotweaktheprecedingpieceofcodetoincludemoredetailsaccordingtoyourneeds.
{
"opid":62052485,
"active":false,
"op":"query",
"ns":"",
"query":{
"replSetGetStatus":1,
"forShell":1
},
"client":"127.0.0.1:64460",
"desc":"conn3651",
"connectionId":3651,
"waitingForLock":false,
"numYields":0,
"lockStats":{
"timeLockedMicros":{
},
"timeAcquiringMicros":{
}
}
}
TheUnderstandingthemongostatandmongotoputilitiesrecipeinChapter4,Administration,wasusedtogetsomedetailsonthepercentageoftimeforwhichadatabasewaslocked,andthenumberofupdate,insert,delete,andgetmoreoperationsexecutedpersecond.Youmayrefertothisrecipeandtryitout.WeusedthesameJavaScriptthatwehaveusedcurrentlytokeeptheserverbusy.
IntheMMSconsole,wehavesimilargraphsgivingthesedetailsasfollows:
Thefirstone,Opcounters,showsthenumberofoperationsexecutedasofaparticularpointintime.Thisshouldbesimilartowhatwesawusingthemongostatutility.Similarly,theoneontherightshowsusthepercentageoftimeforwhichaDBwaslocked.Thepreviousdropdownlistsoutthedatabasenames;wecanselectanappropriatedatabaseforwhichwewanttoseethestats.Again,thisstatisticcanbeseenusingthemongostatutility.Theonlydifferenceis,withthecommand-lineutility,weseethestatsasofthecurrenttimewhereashere,weseethehistoricalstatsaswell.
InMongoDB,indexesarestoredinB-trees,andthefollowinggraphshowsthenumberoftimestheB-treeindexwasaccessed,hit,andmissed.Attheminimum,theRAMshouldbeenoughtoaccommodatetheindexesforoptimumperformance;soinmetrics,themissesshouldbezeroorverylow.Ahighnumberofmissesresultsinapagefaultfortheindexandpossibly,additionalpagefaultsforthecorrespondingdata,ifthequeryisnotcovered;allitsdatacannotbesourcedfromtheindex,whichisadoubleblowforitsperformance.Onegoodpractice,wheneverquerying,istouseprojectionsandfetchonlythenecessaryfieldsfromthedocument.Thisishelpfulwheneverwehaveourselectedfieldspresentinanindex,inwhichcase,thequeryiscoveredandallthenecessarydataissourcedonlyfromtheindex.
Tofindoutmoreaboutcoveredindexes,refertotheCreatinganindexandviewingplansofqueriesrecipeinChapter2,Command-lineOperationsandIndexes.
Forbusyapplications,whenMongoDBacquiresalockonthedatabase,otherreadandwriteoperationsgetqueuedup.Ifthevolumesareveryhighwithmultiplewriteandreadoperationscontendingforlock,theoperationsqueueup.Untilversion2.4ofMongoDB,thelocksareatdatabaselevel;thus,evenifthewritesarehappeningonanothercollection,readoperationsonanycollectioninthatdatabasewillblock.Thisqueuingoperationaffectstheperformanceofthesystemandisagoodindicatorthatthedatamightneedtobeshardedacrosstoscalethesystem.
TipRemember,novalueisdefinedashighorlow;itisanacceptablevaluebasedonanapplicationtoapplicationbasis.
MongoDBflushesthedataimmediatelyfromthejournalandperiodicallyfromthedatafiletothedisk.Thefollowingmetricsgiveustheflushtimeperminuteatagivenpointintime.Iftheflushtakesupasignificantpercentageofthetimeperminute,wecansafelysaythatthewriteoperationsareformingabottleneckfortheperformance.
There’smore…WehaveseenmonitoringoftheMongoDBinstances/clusterinthisrecipe.However,settingupalertstobenotifiedwhencertainthresholdvaluesarecrossed,iswhatwestillhaven’tseen.Inthenextrecipe,wewillseehowtoachievethiswithasamplealert,whichissentoutoverane-mailwhenthepagefaultscrossapredeterminedvalue.
SeealsoMonitoringhardware,suchasCPUusage,isprettyuseful,andtheMMSconsoledoessupportthat.It,however,needsmunin-nodetobeinstalledtoenableCPUmonitoring.Refertohttp://mms.mongodb.com/help/monitoring/configuring/tosetupmunin-nodeandhardwaremonitoring.Toupdatethemonitoringagent,refertohttp://mms.mongodb.com/help/monitoring/tutorial/update-mms/.
SettingupmonitoringalertsonMMSInthepreviousrecipe,wesawhowwecanmonitorvariousmetricsfromtheMMSconsole.ThisisagreatwaytoseeallthestatsinoneplaceandgetanoverviewofthehealthoftheMongoDBinstancesandcluster.However,itisnotpossibletomonitorthesystemcontinuouslyforthesupportpersonnel,andtherehastobesomemechanismtoautomaticallysendoutalertsinthecaseofsomethresholdbeingexceeded.Inthisrecipe,wewillsetupanalertwheneverthepagefaultsexceed1000.
Howtodoit…1. ClickontheActivityoptionfromtheleft-handsidemenuoptionsandthenclickon
AlertSettings.OntheAlertSettingspage,clickonAddAlert.2. Addanewalertforthehost,whichisaprimaryinstance,ifthepagefaultsexceeda
givennumber,whichis1000pagefaultsperminuteinourcase.Thenotificationwaschosentobee-mailinthiscase,andtheintervalafterwhichthealertwillbesentissetat10minutes.
3. ClickonSavetosavethealert.
Howitworks…Thestepswereprettysimple.WhatwedidwassuccessfullysetupMMSalertswhenthepagefaultsexceeded1000perminute.Aswesawinthepreviousrecipe,nofixedvalueisclassifiedashighorlow.Itissomethingthatisacceptableforyoursystem,whichshouldcomewithbenchmarkingthesystemduringthetestingphasesinyourenvironment.Similartopagefaults,thereisavastarrayofalertsthatcanbesetup.Onceanalertisraised,itwillbesentevery10minutes,aswehavesetuntiltheconditionforsendingthealertsisnotmet,which,inthiscase,isifthenumberofpagefaultsfallbelow1000orsomebodymanuallyacknowledgesthealertwhichmeansnoalertwillbesentfurtherforthatincident.
Asweseeinthefollowingscreenshot,thealertisopenandwecanacknowledgethealert:
OnclickingAcknowledge,thefollowingpopupwillletuschoosethedurationforwhichwewillacknowledge:
Thismeansthatforthisparticularincident,nomorealertswillbesentoutuntiltheselectedtimeperiodelapses.
TheopenalertscanbeviewedbyclickingontheActivitiesmenuoptionontheleft-hand
SeealsoVisithttp://www.mongodb.com/blog/post/five-mms-monitoring-alerts-keep-your-mongodb-deployment-trackforsomeoftheimportantalertsthatyoushouldsetupforyourdeployment
BackingupandrestoringdatainMongousingout-of-theboxtoolsInthisrecipe,wewilllookatsomebasicbackupandrestoreoperationsusingutilitiessuchasmongodumpandmongorestoretobackupandrestorefiles.
GettingreadyWewillbestartingasingleinstanceofmongod.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,tostartaMongoinstanceandconnecttoitfromaMongoshell.Wewillneedsomedatatobackup;ifyoualreadyhavesomedatainyourtestdatabasethatwouldbefine,elsecreatesomefromthecountries.geo.jsonfileavailableinthecodebundle,usingthefollowingcommand:
$mongoimport-ccountries-dtest--dropcountries.geo.json
Howtodoit…1. Withthedatainthetestdatabase,executethefollowingcommand,assumingwe
wanttoexportthedatatoalocaldirectorycalleddumpinthecurrentdirectory:
$mongodump-odump-oplog-hlocalhost-port27017
Verifythatthereisdatainthedumpdirectory.Allfilesshouldbe.bsonfiles,onepercollection,intherespectivedatabasefoldercreated.
2. NowletusimportthedatabackintotheMongoDBserverusingthefollowingcommand.Thisisagainwithanassumptionthatwehavethedirectorydumpinthecurrentdirectorywiththerequired.bsonfilespresentinit.
mongorestore--drop-hlocalhost-port27017dump-oplogReplay
Howitworks…Weexecutedjustacoupleofstepstoexportandrestorethedata.Letusnowseeexactlywhatitdoesandwhatthecommand-lineoptionsforthisutilityare.Themongodumputilityisusedtoexportthedatabaseinto.bsonfiles,whichcanlaterbeusedtorestorethedatainthedatabase.Theexportutilityexportsonefolderperdatabase,exceptthelocaldatabase,andtheneachofthemwillhaveone.bsonfilepercollection.Inourcaseweusedthe-oplogoptiontoexportapartoftheoplogaswell,andthedatawillbeexportedtotheoplog.bsonfile.Similarly,weimportthedatabackintothedatabaseusingthemongorestoreutility.Weexplicitlyasktheexistingdatatobedroppedbyprovidingthe--dropoptionbeforetheimportandreplayofthecontentsintheoplog,ifany.
Themongodumputilitysimplyqueriesthecollectionandexportsthecontentstothefiles.Thebiggerthecollection,themorewillbethetimetakentorestorethecontents.Itisthusadvisabletopreventthewriteoperationswhenthedumpisbeingtaken.Incaseofshardedenvironments,thebalancershouldbeturnedoff.Ifthedumpistakenwhilethesystemisrunning,exportitwiththe-oplogoptiontoexportthecontentsoftheoplogaswell.Thisoplogcanthenbeusedtorestorethepoint-in-timedata.Thefollowingaresomeoftheimportantoptionsavailableforthemongodumpandmongorestoreutilities,firstformongodump.
Option Description
--help Thisshowsallthepossiblesupportedoptionsandabriefdescriptionofthoseoptions.
-hor--host
Thisisthehostthatmustbeconnectedto.Bydefault,itislocalhostonport27017.Ifastandaloneinstanceistobeconnectedto,wecangivethehostnameas<hostname>:<portnumber>.Forareplicaset,theformatwillbe<replicasetname>/<hostname>:<port>,….<hostname>:<port>,wherethecomma-separatedlistofhostnamesandportsiscalledtheseedlist,whichcancontainallorasubsetofhostnamesinareplicaset.
--portThisistheportnumberofthetargetMongoDBinstance.Itisnotreallyrelevantiftheportnumberisprovidedintheprevious-hor--hostoption.
-uor--usernameThisprovidestheusernameoftheuser,usingwhichthedatawouldbeexported.Asthedataisreadfromalldatabases,theuserisatleastexpectedtohavereadprivilegesinalldatabases.
-por--password Thisisthepasswordusedinconjunctionwiththeusername.
--
authenticationDatabase
Thisisthedatabaseinwhichtheusercredentialsarekept;ifnotspecified,thedatabasespecifiedinthe--dboptionisused.
-dor--db Thisisthedatabasetobackup.Ifnotspecified,thenallthedatabasesareexported.
-cor--collection Thisisthecollectioninthedatabasetobeexported.
-oor--out Thisisthedirectorytowhichthefileswillbeexported.Bydefault,theutilitywillcreateadumpfolderinthecurrentdirectoryandexportthecontentstothatdirectory.
--dbpath
Thevalueisthedirectorywherethedatabasefileswillbefound.UsethisoptiononlywhenweintendnottoconnecttoarunningMongoDBinstancebutwritetothedatabasefilesdirectly.Theservershouldnotbeupandrunningwhilereadingdirectlyfromthedatabasefiles,astheexportlocksthedatafiles,whichcan’thappenifaserverisupandrunning.Alockfilewillbecreatedinthedirectorywhilethelockisacquired.
--oplog
Withtheoptionenabled,thedatafromtheoplogfromthetimetheexportprocessstartedisalsoexported.Withoutthisoptionenabled,thedataintheexportwillnotrepresentasinglepointintimeifwritesarehappeninginparallel,astheexportprocesscantakefewhoursanditsimplyisaqueryoperationonallthecollections.Exportingtheoploggivesanoptiontorestoreapoint-in-timedata.Thereisnoneedtospecifythisoptionifyouarepreventingwriteoperationswhiletheexportisinprogress.
Similarly,forthemongorestoreutility,theoptionsareasfollows.Themeaningoftheoptions--help,-hor--host,--port,-uor--username,-por--password,--authenticationDatabase,-dor--db,-cor–collectionissameasincaseofmongodump:
Option Description
--dbpath
Thevalueisthedirectorywherethedatabasefileswillbefound.UsethisoptiononlywhenweintendnottoconnecttoarunningMongoDBinstancebutwritetothedatabasefilesdirectly.Theservershouldnotbeupandrunningwhilewritingdirectlytothedatabasefiles,astherestoreoperationlocksthedatafiles,whichcan’thappenifaserverisupandrunning.Alockfilewillbecreatedinthedirectorywhilethelockisacquired.
--drop Droptheexistingdatainthecollectionbeforerestoringthedatafromtheexporteddumps.
--
oplogReplay
Ifthedatawasexportedwhilewritestothedatabasewereallowed,andifthe--oplogoptionwasenabledduringexport,theoplogexportedwillbereplayedonthedatatobringtheentiredatainthedatabasetothesamepointintime.
--
oplogLimit
Thevalueofthisparameterisanumberrepresentingthetimeinseconds.ThisoptionisusedinconjunctionwiththeoplogReplaycommand-lineoption,whichisusedtotelltherestoreutilitytoreplaytheoplogandstopjustatthelimitspecifiedbythisoption.
Onemighteventhink“whynotcopythefilesandtakeabackup?”.Thatworkswell,butthereareafewproblemsassociatedwithit.Thefirstbeing,youcannotgetapoint-in-timebackupunlessthewriteoperationsaredisabledandsecondly,thespaceusedforbackupsisveryhigh,asthecopywouldalsocopythezero-paddedfilesofthedatabase,asagainstthemongodumputilitythatexportsjustthedata.
Havingsaidthat,filesystemsnapshottingisacommonlyusedpracticeforbackups.Onethingtorememberisthat,whiletakingthesnapshot,thejournalfilesandthedatafilesneedtocomeinthesamesnapshotforconsistency.
ConfiguringtheMMSbackupserviceMMSbackupisarelativelynewofferingbyMongoDBforreal-timeincrementalbackupofyourMongoDBinstances,replicasets,andshards,anditoffersyoupoint-in-timerecoveryforyourinstances.Theserviceisavailableason-prem(inyourdatacenter)orcloud.Wewill,however,bedemonstratingtheon-cloudservice,whichistheonlyoptionforthecommunityandbasicsubscriptions.Formoredetailsontheavailableoptions,youcanrefertothedifferentproductofferingsbyMongoDBathttps://www.mongodb.com/products/subscriptions.
GettingreadyTheMongoMMSbackupservicewillworkonlyonMongo2.0andabove.Wewillstartasingleserverthatwewouldbackup.MMSbackupreliesontheoplogforcontinuousbackup,andasoplogisavailableonlyinreplicasets,theserverneedstobestartedasareplicaset.RefertotheInstallingPyMongorecipeinChapter3,ProgrammingLanguageDrivers,toknowmoreabouthowtoinstallPythonandthePythonclientofMongo,PyMongo.
Howtodoit…1. Ifyoudon’thaveanMMSaccountalready,thenlogintohttps://mms.mongodb.com/
andsignupforanaccount.Forscreenshots,refertotheSigningupforMMSandsettinguptheMMSmonitoringagentrecipe.
2. StartasingleinstanceofMongobyreplacingthevalueoftheappropriatefilesystempathonyourmachineasfollows:
$mongod--replSettestBackup--smallfiles--oplogSize50--dbpath
/data/mongo/db
NotethatsmallfilesandoplogSizeareoptionssetonlyforthepurposeoftestingandtheyarenottobeusedinproduction.
3. Startashell,connecttothisstartedinstance,andinitiatethereplicasetasfollows:
>rs.initiate()
Thereplicasetwillbeupandrunninginsometime.
4. Gobacktothebrowserandpointtomms.mongodb.com.Addanewhostbyclickingonthe+AddHostbutton.Selectthetypeasreplicasetandhostnameasyourhostnameandthedefaultport(27017inourcase).RefertotheSigningupforMMSandsettinguptheMMSmonitoringagentrecipeforthescreenshotsoftheaddhostprocess.
5. Oncethehostissuccessfullyadded,registerforMMSbackupbyclickingontheBackupoptionontheleft-handsideandthenonBeginSetup.
6. AnSMSorGoogleAuthenticatorcanbeusedforregistration.IfasmartphoneisavailablewithAndroid,iOS,orBlackberryOS,GoogleAuthenticatorisagoodoption.ForsomecountriessuchasIndia,GoogleAuthenticatoristheonlyoptionavailable.
7. AssumingGoogleAuthenticatorisnotconfiguredalreadyandweareplanningtouseit,wewouldneedtheapptobeinstalledonyoursmartphone.GototherespectiveappstoreofyourmobileOSplatformandinstalltheGoogleAuthenticatorsoftware.
8. Withthesoftwareinstalledonthephone,comebacktothebrowser.WeshouldseethefollowingscreenonselectingGoogleAuthenticator:
9. BeginthesetupforanewaccountbyscanningtheQRcodefromtheGoogleAuthenticatorapplication.Ifbarcodescanningisaproblem,youmaychoosetomanuallyenterthekeygivenontheright-handsideofthescreen.
10. Oncethescanningiscompletedorthekeyisenteredsuccessfully,yoursmartphoneshouldshowasix-digitnumberthatchangesevery30seconds.EnterthatnumberintheAuthenticationCodeboxgivenonthescreen.
NoteItisimportantnottodeletethisaccountinGoogleAuthenticatoronyourphone,asthiswouldbeusedinfuturewheneverwewishtochangeanysettingsrelatedtobackup,suchasstoppingbackup,changingtheexclusionlist,andliterallyanyoperationinMMSbackup.TheQRcodeandkeywouldnotbevisibleagainoncethesetupisdone.YouwouldhavetocontactMongoDBsupporttogettheconfigurationreset.
11. Oncetheauthenticationisdone,thenextscreenyoushouldseeisforthebillingaddressandbillingdetails,suchasthecardyouregister.AllchargesbelowUSD5arewaivedoff,soyoushouldbeoktotryoutasmalltestinstancebeforebeingcharged.
12. Oncethecreditcarddetailsaresaved,wemoveaheadwiththesetup.Wewillhavetoinstallabackupagent;thisisaseparateagentfromthemonitoringagent.Choosethe
appropriateplatformandfollowtheinstructionsforitsinstallation.Takenoteofthelocationwheretheconfigurationfilesoftheagentwillbeplaced.
13. Anewpopupwillcontaintheinstructions/linktothearchive/installerfortheplatformandthestepstoinstall.ItshouldalsocontaintheapiKey.TakenoteofthatAPIkey,whichwewillneedinthenextstep.
14. Oncetheinstallationiscomplete,openthelocal.configfileplacedintheconfigdirectoryoftheagentinstallation(thelocationthatwasshown/modifiedduringtheinstallationoftheagent)andpaste/typeintheapiKeynoteddowninthepreviousstep.
15. Oncetheagentisconfiguredandstarted,clickontheVerifyAgentbutton.16. Oncetheagentissuccessfullyverified,weshouldstartbyaddingahosttobackup.
Thedropdownshouldshowusallthereplicasetsandshardswehaveadded.Selecttheappropriateoneandfillthesyncsourceastheprimaryinstance,asthatistheonlyonewehaveinourstandaloneinstance.Syncsourceisonlyusedfortheinitialsyncprocess.Wheneverwehaveaproperreplicasetwithmultipleinstances,itispreferabletouseasecondaryasasync-processinstance.
Astheinstanceisnotstartedwithsecurity,leavetheDBUsernameandPasswordfieldsblank.
17. ClickontheManageexcludednamespacesbuttonifyouwishtoskipaparticulardatabaseorcollectionbeingbackedup.Ifnothingisprovided,bydefault,everythingwillbebackedup.Theformatforthecollectionnamewouldbe<databasename>.<collectionname>.Alternatively,itcouldbejustthedatabasename,inwhichcaseallcollectionsinthatdatabasewouldnotbeeligibleforbackup.
18. Oncethedetailsareallok,clickontheStartbutton.ThisshouldcompletethesetupofthebackupprocessforareplicasetonMMS.
TipTheinstallationstepsIperformedwereonWindowsOS,andtheserviceneedstobestartedmanuallyinthatcase.PresstheWindowsbutton+Randtypeservices.msc.ThenameoftheserviceisMMSBackupAgent.
Howitworks…ThestepsareprettysimpleandthisisallweneedtodotosetupaserverforMongoMMSbackup.OneimportantthingmentionedearlieristhatMMSbackupusesmultifactorauthenticationforanyoperationoncethebackupissetup;theaccountsetupinGoogleAuthenticatorforMongoDBshouldnotbedeleted.Thereisnowaytorecovertheoriginalkeyusedtosetuptheauthenticator.YouwillhavetocleartheGoogleAuthenticatorsettingsandsetupanewkey.Todothat,clickontheHelp&SupportlinkatthebottomleftofthescreenandclickonHowdoIresetmytwo-factorauthentication?.
Onclickingthelink,anewwindowwillopenup,asshowninthefollowingscreenshot,whichwillaskfortheusername.Ane-mailwillbesentouttotheregisterede-mailID,whichallowsyoutoresetthetwo-factorauthentication.
Asmentioned,oplogisusedtosynchronizethecurrentMongoDBinstancewiththeMMSservice.However,fortheinitialsync,aninstance’sdatafilesareused.Whichinstancetouseisprovidedbyuswhenwesetupthebackupofthereplicaset.Asthisisaresource-heavyoperation,wemustpreferablyuseasecondaryinstanceforthisonbusysystemssoasnottoaddmorequeryingontheprimaryinstancebytheMMSbackupagent.Oncetheinstanceisdonewithinitialsynchronization,theoplogoftheprimarywillbeusedtogetdataonacontinuousbasis.Theagentdoeswriteperiodicallytoacollectioncalledmms.backupintheadmindatabase.
ThebackupagentforMMSbackupisdifferentfromtheMMSmonitoringagent.Thoughthereisnorestrictiononhavingthembothrunonthesamemachine,youmightneedtoevaluatethatbeforehavingsuchasetupinproduction.Asafebetwouldbetohavethemrunningonseparatemachines.Neverruneitheroftheseagentswithamongodormongosinstanceonthesameboxinproduction.Thereareacoupleofimportantreasonswhyitisnotrecommendedtoruntheagentsonthesameboxasthemongodinstances.Theyareasfollows:
Theresourceutilizationoftheagentisdependentontheclustersizeitmonitors.Wedon’twanttheagenttousealotofresourcesaffectingtheperformanceoftheproductioninstance.Theagentcouldbemonitoringalotofserverinstancesatonetime.Asthereisonly
oneinstanceofthisagent,wedonotwantittogodownduringthedatabaseservermaintenanceandrestart.
ThecommunityeditionofMongoDBbuiltwithSSLortheEnterpriseversions,withtheSSLoptionusedforcommunicationbetweentheclientandtheMongoDBserver,mustperformsomeadditionalsteps.ThefirststepistochecktheMydeploymentsupportsSSLforMongoDBconnectionsflagwhenwesetupthereplicasetforbackup(seestep16).Notethecheckboxatthebottomofthescreenshot;itshouldbechecked.Secondly,openthelocal.configfilefortheMMSconfigurationandlookoutforthefollowingtwoproperties:
sslTrustedServerCertificates=
sslRequireValidServerCertificates=true
Thefirstisafully-qualifiedpathofthecertifyingauthority’scertificateinthePEMformat.ThiscertificatewillbeusedtoverifythecertificatepresentedbythemongodinstancerunningoverSSL.Thesecondpropertycanbesettofalseifthecertificateverificationistobedisabled;thisishowevernotarecommendedoption.AsfarasthetrafficbetweenthebackupagentandMMSbackupisconcerned,datasentfromtheagenttotheMMSserviceoverSSLissecure,irrespectiveofwhetherSSLisenabledonyourMongoDBinstancesornot.Thedataatrestinthedatacenterforthebackedupdataisnotencrypted.
Ifsecurityisenabledonthemongodinstance,ausernameandpasswordneedstobeprovided,whichwillbeusedbytheMMSbackupagent.Theusernameandpasswordareprovidedwhilesettingupbackupforthereplicaset,asseeninstep16.
Astheagentneedstoreadtheoplog,possiblyalldatabasesfortheinitialsyncandwritedatatotheadmindatabase;therolesexpectedfromtheuserarereadAnyDatabase,clusterAdmin,readWriteonadminandlocaldatabase,anduserAdminAnyDatabasedatabaseroleinthecaseofversion2.4andabove.Inversionspriorto2.4,wewouldexpecttheusertohavereadaccessonallthedatabasesandread/writeaccesstoadminandlocaldatabases.
Whilesettingupareplicasetforbackupyoumaygetanerrorsuchas,Insufficientoplogsize:Theoplogwindowmustbeatleast1hoursoverthelast24hoursforallactivereplicasetmembers.Pleaseincreasetheoplog.Whileyoumaythinkthisisalwayssomethingtodowithoplogsize,itisalsoseenwhenthereplicasethasaninstancethatisinarecoverystate.Thismightfeelmisleading,sodolookoutforrecoveringnodes,ifany,inthereplicaset,whilesettingupabackupforareplicaset.AspertheMMSsupport,itseemstoorestrictivetonotletsetupareplicasetforbackupwithsomerecoveringnodesanditmightbefixedinfuture.
ManagingbackupsintheMMSbackupserviceInthepreviousrecipe,welearnedhowtosetuptheMMSbackupserviceandasimpleone-memberreplicasetwassetupforbackup.Thoughasinglememberreplicasetmakesnosenseatall,itwasneeded,asastandaloneinstancecannotbesetupforbackupinMMS.Inthisrecipe,wedivedeeperandlookattheoperationswecanperformontheserverthatissetupforbackup,suchasstarting,stopping,orterminatingabackup;managingexclusionlists;managingbackupsnapshots;andretainingandrestoringtopoint-in-timedata.
GettingreadyThepreviousrecipeisallthatisneededtobefollowedforthisrecipe.Thenecessarysetupdescribedinitisexpectedtobedone,aswearegoingtousethesameserverwehadsetupforbackupinthatrecipe.
Howtodoit…1. Withtheserverupandrunning,let’simportsomedatainit.Itcanbeanything,but
wechosetousethecountries.geo.jsonfilethatwasusedinthepreviouschapter.ItshouldbeavailableinthebundledownloadedfromthePacktPublishingwebsite.
Startbyimportingthedataintoacollectioncalledcountriesinthetestdatabase.Usethefollowingcommandtodoit.Thefollowingimportcommandwasexecutedwiththecurrentdirectoryhavingthecountries.geo.jsonfile:
$mongoimport-ccountries-dtest--dropcountries.geo.json
2. Wehavealreadyseenhowtoexcludenamespaceswhenthereplicasetbackupwasbeingsetup.Wewillnowseehowtoexcludenamespacesoncethebackupforareplicasetisdone.ClickontheBackupmenuoptionontheleftandthenonReplicaSetStatus,whichopensbydefaultwhenBackupisclicked.Clickonthegearbuttonontheright-handsideoftherowwherethereplicasetisshown.Itshouldlookasfollows:
3. Asshowninthepreviousscreenshot,clickontheEditExcludedNamespacesoptionandtypeinthenameofthecollectionthatwewanttoexclude.SupposewewanttoexcludetheapplicationLogscollectioninthetestdatabase,typeintest.applicationLogs.
4. Onsavingit,youwillbeaskedtoenterthetokencodethatiscurrentlydisplayedonyourGoogleAuthenticator.
5. Onsuccessfulvalidationofthecode,thenamespacetest.applicationLogswillbeaddedtothelistofnamespacesexcludedfrombeingbackedup.
6. Weshallnowseehowtomanagethesnapshotscheduling.Snapshotisthestateofthedatabaseasofaparticularpointintime.Tomanagethesnapshotfrequencyandretentionpolicy,clickonthegearbuttonshowninstep2andclickonEditSnapshotSchedule.
7. Asseeninthefollowingscreenshot,wecansetthetimeswhenthesnapshotsaretakenandtheirretentionperiod.Moreonthiswillbecoveredinthenextsection.Anychangestoitwouldneedmultifactorauthenticationtosavethechanges.
8. WewillnowlookathowwegoaboutrestoringthedatausingMMSbackup.Atanypointintimewheneverwewanttorestorethedata,clickonBackupandtheReplicaSetStatus/ShardClusterStatusandthenclickontheset/clustername.
9. Onclickingit,wewillseethesnapshotsthataresavedagainstthisset.Itshouldlooksomethinglikewhatisseeninthefollowingscreenshot:
Wehaveencircledsomeoftheportionsonthescreen,whichwewillseeonebyone.
10. Torestoreasofatimewhenthesnapshotwastaken,clickontheRestorethissnapshotlinkintheACTIONScolumnofthegrid.
11. Thepreviousscreenshotshowsushowwecanexportthedata,eitheroverHTTPSorSCP.WeselectPullviaSecureHTTP(HTTPS)fornow,andclickonAuthenticate.WewillseeaboutSCPinthenextsection.
12. EnterthetokenthatisreceivedeitheroverSMSorseenonGoogleAuthenticator,andclickonFinalizeRequestonenteringtheauthcode.
13. Onsuccessfulauthentication,clickonRestoreJobsasshowninthefollowingscreenshot.Thisisaone-timedownloadthatwillletyoudownloadthetar.gzarchive.Clickonthedownloadlinktodownloadthetar.gzarchive.
14. Oncethearchiveisdownloaded,extractittogetthedatabasefileswithinit.15. Stopthemongodinstance,replacethedatabasefileswiththeonesthatareextracted,
andrestarttheservertogetthedataasofthetimewhenthesnapshotwastaken.Notethatthedatabasefilewillnotcontaindataforthecollectionthatwasexcludedfrombackupifatall.
Wewillnowseehowtogetthepoint-in-timedatausingMMSbackup:
1. ClickonReplicaSetStatusorShardClusterStatusandthenonthecluster/setthat
istoberestored.
1. Ontheright-handsideofthescreen,clickontheRestorebutton.2. Thisshouldgivealistofavailablesnapshots,oryoumayenteracustomtime.
ChecktheUseCustomPointInTimecheckbox.ClickontheDatefieldandselectadateandatimetowhichyouwanttorestorethedatato,inhoursandminutes,andclickonNext.NotethatthePointinTimefeatureonlyrestorestoapointinthelast24hours.
HereyouwouldbeaskedtheHTTPSorSCPformat.Subsequentstepsaresimilartowhatwedidonapreviousoccasionstep14and15onwards.
Howitworks…Afterthebackupforareplicasetwassetup,wefirstimportedsomerandomdataintothetestdatabasesothatwecanexpectthattobesenttotheMMSbackupservicethatwewouldrestoreatalaterpointintime.Wesawhowtoexcludenamespacesfrombeingbackedupinsteps2,3,4,and5.
Now,lookingatthesnapshotandretentionpolicysettings,wecanseewehavethechoiceofthetimeintervalinwhichthesnapshotsaretobetakenandthenumberofdaysforwhichtheyaretoberetained(step9).Wecanseethat,bydefault,snapshotsaretakenevery6hoursandtheyaresavedfor2days.Thesnapshotthatistakenattheendofthedaygetssavedforaweek,whichis7days.Thesnapshottakenattheendoftheweekandmonthissavedfor4weeksand13monthsrespectively.Asnapshotcanbetakenonceevery6,8,12,and24hours.However,oneneedstounderstandtheflipsideoftakingsnapshotsafterlongtimedurations.Supposethelastsnapshotistakenat18:00hours,gettingthedataasof18:00hoursforrestoreisveryeasy,asitisstoredontheMMSbackupservers.However,weneedthedataasof21:30hoursforrestoration.AsMMSbackupsupportspoint-in-timebackup,itwouldusethebasesnapshotas18:00hoursandthenjustreplaythechangesonitafterthesnapshotistaken,till21:30hours.Thisreplayingissimilartohowanoplogwouldbereplayedonthedata.Thereisacostforthisreplayandthus,gettingpoint-in-timebackupisslightlymoreexpensivethangettingthedatafromasnapshot.Herewehadtoreplaythedatafor3.5hours,from18:00hoursto21:30hours.Imagineifthesnapshotsweresettobetakenafter12hoursandourfirstsnapshotwastakenat00:00hours;wewouldhavesnapshotsat00:00hoursand12:00hourseveryday.Torestorethedataasof21:30hours,with12:00hoursasthelastsnapshot,wewillhavetoreplay9.5hoursofdata,whichismuchmoreexpensive.Morefrequentsnapshotsmeansmorestoragespaceusagebutlesstimeneededtorestoreadatabasetoagivenpointintime.Atthesametime,lessfrequentsnapshotsrequirelessstoragebutatthecostofmoretimetorestorethedatatoapointintime.Youneedtodecideandhaveatrade-offbetweenthesetwo,spaceandtimeforrestoration.Forthedailysnapshot,wecanchoosetheretentionfrom3–180days.Similarly,fortheweeklyandmonthlysnapshots,theretentionperiodcanbechosenbetween1–52weeksand1–36months,respectively.
Thescreenshotinstep9hasacolumnfortheexpiryofthesnapshot.Forthefirstsnapshottakenitisis1year,whereasothersexpirein2days.Theexpirationisasperwhatwediscussedearlier.Onchangingtheexpirationvalues,theoldsnapshotsarenotaffectedoradjustedasperthechangedtimes.Thenewsnapshotstakenwillhoweverbeasperthemodifiedsettingsforretentionandfrequency.
Wesawhowtodownloadthedump(step10onwards)andthenuseittorestorethedatainthedatabase.Itwasprettystraightforwardanddoesn’tneedalotofexplanation,exceptforafewthings.First,ifthedataisforashard,therewillbemultiplefolders,oneforeachshard;andeachofthemwillhavethedatabasefilesasagainstwhatwesawhereinthecaseofareplicaset,wherewehaveasinglefolderwithdatabasefilesinit.
Finally,letuslookatthescreenwhenwechooseSCPastheoption:
SCPisforsecurecopy.Thefileswillbecopiedoverasecurechanneltoamachine’sfilesystem.ThehostthatisgivenneedstohaveapublicIP,whichwillbeusedtoSCPthefiles.ThismakesalotofsensewhenwewantthedatafromMMStobedeliveredtoamachinerunningonUnixOSonthecloud,sayoneoftheAWSvirtualinstances.RatherthangettingthefileusingHTTPSonourlocalmachineandthenreuploadingittotheserveronthecloud,youcanspecifythelocationwherethedataneedstobecopiedintheTargetDirectoryblock,thehostname,andthecredentials.Thereareacoupleofwaysforauthenticationaswell;apasswordisaneasywaywithanadditionaloptiontoSSHkeypair.IfyouhavetoconfigurethefirewallsofyourhostonthecloudtoallowincomingtrafficovertheSSHport,thepublicIPaddressesaregivenatthebottomofthescreen(64.70.114.115/32or4.71.186.0/24inourscreenshot),whichyoushouldwhitelisttoallowincomingsecurecopyrequestoverport22.
SeealsoWehaveseenrunningbackupsusingMMS,whichusesoplogsforthispurpose.TheImplementingtriggersinMongousingoplogrecipeinChapter5,AdvancedOperations,usesoplogtoimplementtrigger-likefunctionalities.Thisconceptisthebackboneofthereal-timebackupusedbytheMMSbackupservice.
Chapter7.CloudDeploymentonMongoDBInthischapter,wewillcoverthefollowingrecipes:
SettingupandmanagingtheMongoLabaccountSettingupasandboxMongoDBinstanceonMongoLabPerformingoperationsonMongoDBfromtheMongoLabGUISettingupMongoDBonAmazonEC2usingtheMongoDBAMISettingupMongoDBonAmazonEC2withoutusingtheMongoDBAMI
IntroductionThoughexplainingcloudcomputingisnotinthescopeofthisbook,Iwillexplainitinjustoneparagraph.Anybusiness,bigorsmall,needshardwareinfrastructureanddifferentsoftwareinstalledonit.Anoperatingsystemisthebasicsoftwareneeded,alongwithdifferentservers(fromasoftwareperspective)forstorage,mail,Web,database,DNS,andsoon.Thelistofsoftwareframeworks/platformsneededmightendupbeinglarge.Thepointofinteresthereisthattheinitialbudgetforthishardwareandsoftwareplatformishigh;wearenotevenconsideringtherealestateneededtohostit.ThisiswherecloudcomputingproviderssuchasAmazon,Rackspace,Google,andMicrosoftcomeintoplay.Theyhavehostedhigh-endhardwareandsoftwareindifferentdatacentersacrosstheglobeandletuschoosefromdifferentconfigurationstostartaninstance.Then,thisisaccessedremotelyoverthepublicnetworkformanagementpurposes.Literally,alloursetupisdoneinthecloudprovider’sdatacenter,andwejustpayasweuse.Shutdowntheinstanceandyoustoppayingforit.Notonlysmallstart-upsbutlargeenterprisesalsooftentemporarilyfallbacktocloudserversforatemporaryriseinthecomputingresourcedemand.Thepricesofferedbytheprovidersareverycompetitivetoo;particularly,AmazonWebService(AWS)ofalloftheminmyopinionanditspopularitysaysitall.
Thewikipageathttp://en.wikipedia.org/wiki/Cloud_computinghasalotofdetail,perhapsabittoomuchforsomeonenewtotheconcept,butitisagoodread,nevertheless.Thearticleathttp://computer.howstuffworks.com/cloud-computing/cloud-computing.htmisprettygoodandIrecommendedthatyoureaditifyouarenotawareoftheconceptofcloudcomputing.
Inthischapter,wewillsetupMongoDBinstancesonthecloudusingMongoDBserviceprovidersandthen,byourselvesonAWS.
SettingupandmanagingtheMongoLabaccountInthisrecipe,wewillbeevaluatingoneofthevendors,MongoLab,providingMongoDBasaservice.ThisintroductoryrecipewillintroducetoyouwhatMongoDBasaserviceis,andthenitwilldemonstratehowtosetupandmanageanaccountinMongoLab(https://mongolab.com/).
Inalltherecipesinthisbook,wehavecoveredsettingup,administering,monitoring,anddevelopingtheinstancesofMongoDBintheorganizations/personalpremisessofar.Thisnotonlyneedsmanpowerwiththeappropriateskillsettomanagethedeployments,butalsoappropriatehardwaretoinstallandrunMongoservers.Thisneedslargeinvestmentsupfrontthatmightnotbeaviablesolutionforstart-upsorevenorganizationsthatarenotclearaboutadoptingthistechnologyormigratingtoit.Theymightwanttoevaluateitandseehowitgoesbeforemovingfullfledgedtothissolution.WhatwouldbeidealistohaveaserviceproviderthattakescareofhostingtheMongoDBdeployments,managing,andmonitoringthedeployments,andprovidingsupport.Theorganizationsthatoptfortheseservicesneednotinvestupfrontinsettinguptheserversnorrecruitoroutsourcetoconsultantsfortheadministrationandmonitoringoftheinstances.Allthatoneneedstodoischoosethehardwareandsoftwareplatform,configuration,andtheappropriateMongoDBversion,andsetupanenvironmentfromauser-friendlyGUI.Itevengivesyouanoptiontouseyourexistingcloudprovider’sservers.
Havingexplainedinbriefwhatthesevendor-hostingservicesdoandwhytheyareneeded,wewillstartthisrecipebysettingupanaccountwithMongoLabandseesomebasicuserandaccountmanagement.MongoLabisbynomeanstheonlyhostingproviderforMongoDB.Youmightalsowanttotakealookathttp://www.mongohq.com/andhttp://www.objectrocket.com/.Atthetimeofwritingthisbook,MongoDBitselfstartedprovidingMongoDBasaserviceonAzurecloudandiscurrentlyinbetaphase.
Howtodoit…1. Visithttps://mongolab.com/signup/tosignup.Ifyoudon’thaveanaccountcreated,
justfillintherelevantdetailsandcreateanaccount.2. Oncetheaccountiscreated,clickontheAccountlinkinthetop-rightcornerofthe
page,asshowninthefollowingscreenshot:
3. ClickontheAccountUserstabinthetop-leftcorner;itshouldbeselectedbydefault:
4. Toaddanewaccount,clickonthe+Addaccountuserbutton.Onepop-upwindowwillaskfortheusername,e-mailID,andpasswordoftheuser.Entertherelevantdetails,andclickontheAddbutton.
5. Clickontheuser,andyouwillbeabletonavigatetoapagewhereyoucanchangetheusername,e-mailID,andpassword.YoumighttransfertheadminrightstotheuserbyclickingontheChangetoAdminbuttononthispage.
6. Similarly,byclickingonyourownuserdetails,youwillhavetheoptionstochangetheusername,e-mailID,andpassword.
7. ClickontheSetuptwo-factorauthenticationbuttontoactivatethemultifactorauthenticationusingGoogleAuthenticator.YouneedtohavetheGoogleAuthenticatorinstalledonyourAndroid,iOS,orBlackBerryphonetoproceedwiththesetupofmultifactorauthentication.
8. Onclickingonthebutton,weshouldseetheQRcodethatcanbescannedusingtheGoogleAuthenticator,orifscanningisnotpossible,clickontheURLunderneaththeQRcode,whichwillshowthecode.Manuallysetupatime-basedaccountintheGoogleAuthenticator.TherearetwotypesofGoogleAuthenticatoraccounts:time
basedandcounterbased.Formoredetails,visithttp://en.wikipedia.org/wiki/Google_Authenticator.
9. Similarly,youcandeleteusersfromtheAccountpagebyclickingonthecrossnexttotheuser’srowunderAccountUsers.
Howitworks…Thereisnothingmuchtoexplaininthissection.Thesetupprocessanduseradministrationareprettysimple.Notethattheusersweaddedherearenotdatabaseusers.ThesearetheusersthathaveaccesstoAccountonMongoLab,forwhichweaddedthem.Theaccountcanbethenameoftheorganizationandcanbeseenatthetopofthescreen.ThemultifactorauthenticationaccountsetupintheGoogleAuthenticatorsoftwareonthehandhelddeviceshouldnotbedeleted,aswhenevertheuserlogsintotheMongoLabaccountfromthebrowser,hewillbeaskedtoentertheGoogleAuthenticatoraccounttocontinue.
SettingupasandboxMongoDBinstanceonMongoLabInthepreviousrecipe,wesawhowtosetupanaccountonMongoLabandadduserstoyouraccount.Westillhaven’tseenhowtofireupaninstanceonthecloudanduseittoperformsomesimpleoperations.Inthisrecipe,thisisexactlywhatwewilldo.
GettingreadyRefertothepreviousrecipetosetupanaccountwithMongoLab.Wewillsetupafreesandboxinstance.WewillrequiresomewaytoconnecttothisstartedMongoinstance,andthus,wewillneedaMongoshell,whichcomesonlywiththecompleteMongoinstallation,oryoumightchoosetouseaprogramminglanguageofyourchoicetoconnecttothestartedMongoinstance.RefertoChapter3,ProgrammingLanguageDrivers,forrecipesonconnectingandperformingoperationsusingaJavaorPythonclient.
Howtodoit…1. Gotothehomepageathttps://mongolab.com/homeandclickontheCreatenew
button.2. Selectacloudprovider;forthisexample,wechoseAmazonWebServices.
3. ClickonSingle-node(development)andthenontheSandboxoption.Donotchangethelocationofthecloudserver,asthefreesandboxinstanceisnotavailableinalldatacenters.Sincethisissandboxweareokwithanylocation.
4. Addanynameforyourdatabase.ThenameIchoseismongolab-test.ClickonCreatenewMongoDBdeploymentafterenteringthename.
5. Thiswilltakeyoutothehomepage,andthedatabasewillnowbevisible.Clickontheinstancename.ThepagehereshowsthedetailsoftheMongoDBinstanceselected.Theinstructiontoconnectfromtheshellorprogramminglanguageisgivenatthetopofthepage,alongwiththepublichostnameofthestartedinstance.
6. ClickontheUserstabandthenontheAdddatabaseuserbutton.
7. Inthepop-upwindow,addtheusernameandpasswordastestUserandtestUser,respectively(oranyofyourchoice).
8. Withtheuseradded,starttheMongoshellasfollows,assumingthatthenameofthedatabaseismongolab-test,andtheusernameandpasswordistestUser:
$mongo<host-name>/mongolab-test–utestUser–ptestUser
9. Onconnecting,executethefollowingcommandintheshellandcheckifthedatabasenameismongolab-test:
>db
10. Insertonedocumentinacollectionasfollows:
>db.messages.insert({_id:1,message:'Hellomongolab'})
11. Querythecollectionasfollows:
>db.messages.findOne()
Howitworks…Thestepsexecutedareverysimple.Wecreatedonesharedsandboxinstanceinthecloud.MongoLabitselfdoesnothosttheinstancesbutusesoneofthecloudproviderstodothehosting.MongoLabdoesnotsupportsandboxinstancesforallproviders.Thestoragewiththesandboxinstanceis0.5GBandissharedwithotherinstancesonthesamemachine.Sharedinstancesarecheaperthanrunningonadedicatedinstance,butthepriceispaidinperformance.TheCPUandI/Oaresharedwithotherinstances,andthus,theperformanceofoursharedinstanceisnotnecessarilyinourcontrol.Foraproductionusecase,sharedinstanceisnotarecommendedoption.Similarly,weneedtosetupareplicasetwhenrunninginproduction.Ifwelookatthescreenshotinstep2,wewillseeanothertabnexttotheSingle-node(development)option.ThisiswhereyoumightchoosetheconfigurationforthemachineintermsofRAManddiskcapacity(andthepricetoo)andsetupareplicaset.
Asyoucansee,yougettochoosewhichversionofMongoDBtouse.EvenifanewversionofMongoDBgetsreleased,MongoLabwillnotstartsupportingitimmediately,asitusuallywaitsforafewminorversionstoberolledoutbeforesupportingthenewversionforproductionusers.Also,whenwechooseaconfiguration,thedefaultavailableoptionistwodatanodesandonearbiter,whichissufficientforthemajorityofusecases.
TheRAManddiskchosendependcompletelyonthenatureofthedataandhowquery/writeintensiveitis.Thissizingissomethingwedoirrespectiveofwhetherwearedeployingonourowninfrastructureoronthecloud.TheworkingsetissomethingthatisimportanttobeknownbeforewechoosetheRAMofthehardware.POCandexperiments
aredonetodealwithasubsetofdata,andthen,theestimationcanbedonefortheentiredataset.RefertotheEstimatingtheworkingsetrecipeinChapter4,Administration,toestimatetheworkingsetonyoursampledataset.IftheI/OactivityishighandlowI/Olatencyisdesired,youmightevenoptforSSD,aswesawintheprecedingscreenshot.Standaloneinstancesareasgoodasreplicasetsintermsofscalability,butnotintermsofavailability.Thus,wemightchoosestandaloneinstancesforsuchestimationanddevelopmentpurposes.Sharedinstances,bothfreeandpaid,aregoodcandidatesfordevelopmentpurposes.Notethatsharedinstancescannotberestartedondemandaswecanfordedicatedinstances.
Whatcloudproviderdowechoose?Ifyoualreadyhaveyourapplicationserversdeployedinthecloud,obviously,ithastobethesamevendorasyourexistingvendor.Itisrecommendedthatyouusethesamecloudvendorfortheapplicationserveranddatabase.Also,theyarebothdeployedonthesamelocationtominimizelatencyandimproveperformance.Ifyouarestartingafresh,theninvestsometimeinchoosingthecloudprovider.Lookatallotherservicesthattheapplicationwillneed,suchasthestorage,compute,andotherservicesincludinge-mails,notificationservices,andsoon.Allthisanalysisisoutsidethescopeofthisbook,butonceyouaredonewiththisandfinalizedwithaprovider,youmightaccordinglychoosetheprovidertouseinMongoLab.Asfaraspricinggoes,alltheleadingprovidersoffercompetitivepricing.
PerformingoperationsonMongoDBfromMongoLabGUIInthepreviousrecipe,wesawhowtosetupasimplesandboxinstanceforMongoDBinthecloudusingMongoLab.Inthisrecipe,we’llbuildonitandseewhatservicesMongoLabprovidesfromtheperspectivesofmanagement,administration,monitoring,andbackup.
Howtodoit…1. Gotohttps://mongolab.com/home;youshouldseealistofdatabases,servers,and
clusters.Ifyouhavefollowedthepreviousrecipe,youwouldseeonestandalonedatabase,mongolab-test(orwhatevernameyouchoseforthedatabase).Clickonthedatabasename;thiswilltakeyoutothedatabasedetailspage.
2. OnclickingontheCollectionstab,whichshouldbeselectedbydefault,wewillseealistofcollectionspresentinthedatabase.Ifthepreviousrecipewasexecutedbeforethisone,youwouldseeonecollectionmessageinthedatabase.
3. Clickonthenameofthecollection,andwewillbenavigatedtothecollectiondetailspageasfollows:
4. ClickontheStatsoptiontoviewthestatsofthecollection.Exceptforwhetherthecollectionandthemaximumnumberofdocumentsinacollectionarecappedornot,thecontentscomeasaresultofthefollowingcommand:
db.<collectionName>.stats()
5. IntheDocumentstab,wecanquerythecollection.Bydefault,wewillseeallthedocumentswith10documentsshownperpage,whichcanbechangedfromtherecords/pagedrop-downlist.Amaximumvalueof100canbechosen.
6. Thereisanotherwaytoviewthedocuments,whichisasatable.ClickonthetableradiobuttoninDisplaymodeandclickonthe(edittableview)linktocreate/editthetableview.Inthepopupshown,enterthefollowingdocumentforthemessages
collectionandclickonSubmit:
{
"id":"_id",
"MessageText":"message"
}
Ondoingthis,thedisplaywillchangeasfollows:
7. Fromthe–Startnewsearch–drop-downlist,selectthe[newsearch]option,asshowninthefollowingscreenshot:
8. Withthenewquery,wewillseethefollowingfieldstoletusenterthequerystring,sortorder,andprojections.Enterthequeryas{"_id":1}andfieldsas{"message":1,"_id":0}.
9. YoumightchoosetosavethequerybyclickingontheSavethissearchbuttonandgiveanametothequerytobesaved.
10. Individualdocumentscanbedeletedbyclickingonthecrossnexttoeachrecord.Similarly,theDeleteallbuttonwilldeleteallthecontentsofthecollection.
11. Similarly,clickingon+Adddocumentwilldisplayaneditortotypeinthedocumentthatwillbeinsertedintothecollection.AsMongoDBisschemaless,thedocumentneednothaveafixedsetoffields;theapplicationshouldmakesenseoutofit.
12. Gotohttps://mongolab.com/databases/<yourdatabasename>(mongolab-testinthiscase),whichcanalsobereachedbyclickingonthedatabasenamefromthehomepage.
13. ClickontheStatstabnexttotheUserstab.Thecontentshowninthetableistheresultofthedb.stats()command.
14. Similarly,clickontheBackupstabatthetop,nexttotheStatstab.Here,wecanselectoptionstotakearecurringorone-timebackup.
15. WhenyouclickonSchedulerecurringbackup,youwillgetapop-upwindowthatwillletyouenterthedetailsofthescheduling,suchasthefrequencyofthebackup,thetimeofthedaywhenthebackupneedstobetaken,andthenumberofbackupstokeep.
16. ThebackuplocationcanbechosentobeeitherMongoLab’sownSimpleStorageService(S3)bucketortheRackspacecloudfile.Youmightchoosetouseyourownaccount’sstorage,inwhichcase,youwillhavetosharetheAWSaccesskey/secretkeyoruserID/APIkeyincaseofRackspace.
Howitworks…Steps1to5areprettystraightforward.Instep6,weprovidedaJSONdocumenttoshowtheresultsinatabularformat.Theformatofthedocumentisasfollows:
{
<displaycolumn1>:<nameofthefieldintheJSONdocument>,
<displaycolumn2>:<nameofthefieldintheJSONdocument>,
<displaycolumnn>:<nameofthefieldintheJSONdocument>
}
Thekeyisthenameofthecolumntodisplay,andthevalueisthenameofthefieldintheactualdocumentwhosevaluewillbeshownasthevalueofthatcolumn.Togetaclearunderstanding,lookatthedocumentdefinedforthemessagescollection,lookatthedocumentinthemessagescollection,andthentakealookatthedisplayedtabulardata.ThefollowingistheJSONdocumentweprovided;itstatesthenameofthecolumnasthevalueofthekeyandtheactualfieldinthedocumentasthevalueofthecolumn:
{
"id":"_id",
"MessageText":"message"
}
Also,notethatthefieldnameandvaluesoftheJSONdocumentshereareenclosedinquotes.TheMongoshellislenientinthesensethatitallowsustogivefieldnameswithoutquotes.
Ifweseestep16,wewillseethatthebackupsarestoredeitherinMongoLab’sAWSS3/RackspaceCloudFilesorinyourcustomAWSS3bucket/RackspaceCloudFiles.Inlattercases,youneedtoshareyourAWS/RackspacecredentialswithMongoLab.Ifthisisaconcernandthecredentialscanpotentiallybeusedtoaccessotherresources,itisrecommendedthatyoucreateaseparateaccountanduseitforbackuppurposesfromMongoLab.YoumightalsousethebackupcreatedtocreateanewMongoDBserverinstancefromMongoLab.Needlesstosay,ifyouhaveusedyourownAWSS3bucket/RackspaceCloudFiles,storagechargesareadditional;theyarenotapartofMongoLab’scharges.
Therearesomeimportantpointsworthmentioning.MongoLabprovidesaRESTAPIforvariousoperations.TheRESTAPItoocanbeusedinplaceofthestandarddriverstoperformCRUDoperations.However,usingMongoDBclientlibrariesistherecommendedapproach.OnegoodreasontousetheRESTAPIrightnowoverthelanguagedriverisiftheclientisconnectingtotheMongoDBserveroverpublicnetwork.TheshellwestartedonourlocalmachinethatconnectstotheMongoDBserveronthecloudsendsunencrypteddatatotheserver,whichmakesitvulnerable.Ontheotherhand,ifRESTAPIsareused,thetrafficissentoverasecurechannelasHTTPSisused.MongoLabplanstosupportasecurechannelforcommunicationbetweentheclientandtheserverinfuture,butatthetimeofwritingthisbook,itisnotavailable.Iftheapplicationanddatabaseareinthesamedatacenterofthecloudprovider,youaresafeanddependonthesecurity
providedbythecloudproviderfortheirlocalnetwork,whichgenerallyisnotaconcern.However,thereisnothingyoucandoforsecurecommunicationotherthanensuringthatyourdatadoesn’tgooverpublicnetworks.
OnemorescenariowhereMongoLabdoesn’tworkiswhenyouwanttheinstancestorunonyourowninstanceofavirtualmachineratherthanontheonechosenbyMongoLaborwhenwewanttheapplicationtobeinavirtualprivatecloud.CloudprovidersprovideservicessuchasAmazonVPC,whereapartoftheAWScloudcanbetreatedasapartofyournetwork.IfyouintendtodeployyourMongoDBinstanceinsuchanenvironment,MongoLabcannotbeused.
SettingupMongoDBonAmazonEC2usingtheMongoDBAMIIntheearlierfewrecipes,wesawhowtostartMongoDBinthecloudusingahostedserviceprovidedbyMongoLab,whichgaveanalternativetosetupMongoDBonallleadingcloudvendors.However,ifweplantohostandmonitortheinstanceourselvesforgreatercontrolorsetupwithinourownvirtualprivatecloud,wecandoitourselves.Thoughtheprocedurevariesfromcloudprovidertocloudprovider,wewilldemonstrateitusingAWS.Thereareacoupleofwaystodothis,butinthisrecipe,wewilldoitusingtheAmazonMachineImage(AMI).TheAMIisatemplatethatcontainsdetailssuchastheoperatingsystemandthesoftwarethatwillbeavailableonthestartedvirtualmachine.Allthisinformationwillbeusedwhilebootingupanewvirtualmachineinstanceonthecloud.ToknowmoreabouttheAMI,visithttp://en.wikipedia.org/wiki/Amazon_Machine_Image.
TalkingaboutAWS,ElasticCloudCompute(EC2)isaservicethatletsyoucreate,start,andstopserversofdifferentconfigurationsinthecloudthatrunonoperatingsystemsofyourchoice(thepricesdifferaccordingly).Similarly,AmazonElasticBlockStore(EBS)isaservicethatprovidespersistentblockstoragewithhighavailabilityandlowlatency.Initially,eachinstancehasastoreknownastheephemeralstoreattachedtoit.Thisisatemporarystore,andthedatamightbelostwhentheinstancerestarts.EBSblockstorageisthusattachedtotheEC2instancetomaintainpersistenceevenwhentheinstanceisstoppedandthenrestarted.StandardEBSdoesn’tpromiseaminimumguaranteefortheI/Ooperationspersecond(IOPS).Formoderateworkload,thedefaultofabout100IOPSisok.However,forhigh-performanceI/O,EBSblockswithguaranteedIOPSarealsoavailable.ThepricingismoreascomparedtothestandardEBSblock,butitisagoodoptiontooptforiflowIOratecanbeabottleneckintheperformanceofthesystem.
GettingreadyThefirstthingyouneedtodoissignupforanAWSaccount.Visithttp://aws.amazon.com/andclickonSignUp.LoginifyouhaveanAmazonaccount;otherwise,createanewone.Youwillhavetogiveyourcreditcarddetails,althoughtherecipeswehaveherewillusethefreemicroinstanceunlessweexplicitlymentionotherwise.WewillconnecttotheinstanceonthecloudusingPuTTY.YoucandownloadandinstallPuTTYonyourmachineifyouhavenotalreadydoneso.Itcanbedownloadedfromhttp://www.putty.org/.
FortheinstallationofusingAMI,wecannotusethemicroinstanceandwillhavetousetheminimumofstandardlarge.GetmoredetailsonthepricingofEC2instancesindifferentregionsathttps://aws.amazon.com/ec2/pricing/.Choosetheappropriateregionbasedonthegeographicalandfinancialfactors.
1. Thefirstthingyouneedtodoiscreateakeypairifyouhavenotalreadycreatedone.Steps1to5areonlyforthecreationofthekeypair.ThiskeypairwillbeusedtologintotheUnixinstancestartedinthecloudfromthePuTTYclient.Skiptostep6ifthekeypairisalreadycreatedandthe.pemfileisavailablewithyou.
2. Gotohttps://console.aws.amazon.com/ec2/andmakesuretheregionyouhaveinthetop-rightcorner(asshowninthefollowingscreenshot)isthesameastheoneinwhichyouareplanningtosetuptheinstance:
3. Oncetheregionisselected,thepagewiththeResourcesheadingwillshowalltheinstances,keypairs,IPaddresses,andsoonforthisregion.ClickontheKeyPairslink;thiswillnavigateyoutothepagewherealltheexistingkeypairswillbeshown,andyoucancreatenewones.
4. ClickontheCreateKeyPairbutton,andinthepop-upwindowtypeinanynameofyourchoice.Let’ssay,wecallitEC2TestKeyPair,andclickonCreate.
5. Oncethekeypairiscreated,a.pemfilewillbegenerated.Ensurethatthefileis
saved,asthiswillbeneededforsubsequentaccesstothemachine.6. Next,wewillconvertthis.pemfiletoa.ppkfiletobeusedwithPuTTY.7. StartPuTTYgen.Ifitisnotalreadyavailable,itcanbedownloadedfrom
http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html.8. Wewillseethefollowingscreenshot:
9. SelecttheSSH-2RSAoptionandclickontheLoadbutton.Inthefile,selectAllfilesandselectthe.pemfilethatwasdownloadedwhenthekeypairwasgeneratedintheEC2console.
10. Oncethe.pemfileisimported,clickontheSaveprivatekeyoptionandsavethefilewithanyname.Thistime,thefileisa.ppkfile.SavethisfiletologintotheEC2instancefromPuTTYinfuture.
Howtodoit…1. VisittheAmazonmarketplaceathttps://aws.amazon.com/marketplace/andsearch
forMongoDB,asshowninthefollowingscreenshot:
2. LookoutforAMIsbyMongoDB,asthesearetheofficialonessoldbyMongoDB.TherearedifferentAMIsavailablewithdifferentprovisionedI/Orates.Forthisexample,wewillchoosetheonewith1000IOPSofdata.
3. ClickonNameoftheImage,whichisaURL,andwewillbenavigatedtothedetailspage.Thefollowingportionofthedetailspageisofparticularourinterest.NoticetheHighlightssection.TherearethreeadditionalEBSvolumesthatarereservedandwillbeattachedtothisinstance:onewillbeusedfordata(withthehighestIOPS),oneforjournal,andoneforlogs(withthelowestIOPS).
4. ThepagewillalsoprovideinformationontheAMIandthepricing.ClickontheContinuebutton.
5. Onthispage,wewillreviewalltheconfigurationsfortheEC2instancethatwillbestarted.ThefirstoptionwillbetheMongoDBversion,whichwillbethelatestone,andthesecondoptionwillbetheAWSregion,whichisUSEastbydefault.Choosetheversionandregionifrequired.
6. Thenextoptionistoselecttheinstance.WewillchoosetheStandardLarge(m1.large)optionforourtest.LeavetheVPCsettingssettodefault.
7. ThenextsettingistheSecuritysettingsthatallowconnectionsfromtheentireWorldtothestartedinstanceofEC2.WewillchoosetousethesettingsrecommendedbythevendoroftheAMI.YouarefreetouseanyothersecuritypolicyifyouhavedefinedoneearlierinEC2.
8. Finally,selectthekeypair,theonewecreatedintheGettingreadysection.Oncedone,clickonAcceptTerms&Launchwith1-Click.
9. VisittheEC2consolefromthebrowserandclickonInstancesontheleft-handsidemenu.
10. Theinstancewilltakesometimetostart.Oncestarted,clickontheinstancenameinthelistofinstancestoseethepublicDNSandIPaddressatthebottomofthepage.CopythispublicDNS.
11. StartPuTTY,andclickontheAuthoptionunderConnection/SSH,asshowninthefollowingscreenshot:
12. Clickonthebrowserandloadthe.ppkfile,whichwasgeneratedearlierintheGettingreadysection.
13. Now,clickonSessionunderCategoryandenterthehostnamethatwascopiedinstep8.Theportwillremain22,asthisistheonlyopenportfromthepublicnetworktothisinstance.
14. Whenpromptedfortheuser,entertheuserasec2-userinPuTTY.Theprivatekeyloadedinsteps9and10willbeusedforauthentication,andyoudonotneedtoenterapassword.
15. Wewilluse/dataforthedataand/logstosavethelogs.Thesetwoareconfigurableparameters.Thejournalisalwayscreatedin/data/journalandisnotconfigurable.Refertostep3wherewementionedthattherearethreeEBSvolumesassociatedwiththisEC2instance.
16. Executethefollowingcommandtostartamongodinstancewithlogswrittentothemongo.logfileinthe/logsdirectoryandtheprocesstorunthebackground:
$sudomongod--logpath~/logs/mongo.log--smallfiles--oplogSize50--
fork
17. Now,startaMongoclientfromtheshellasfollows:
$mongo
18. ExecutethefollowingcommandfromtheMongoshellafterconnectingtotheMongoinstance:
>db.ec2Test.insert({_id:1,msg:'Hello,MyfirstMongoinstanceon
cloud'})
>db.ec2Test.find()
{"_id":1,"msg":"Hello,MyfirstMongoinstanceoncloud"}
>
Congratulations!Now,wehavesuccessfullystartedastandaloneMongoDBinstanceonanEC2instance.
Howitworks…Instep6,wesawhowtosetupthesecurityforthestartedinstance.WeconfiguredittojustallowincomingtrafficforSSHoverport22fromallhostsfromthepublicnetwork.Fortightersecurity,ratherthanallowingtrafficfromallhosts(0.0.0.0),wecanallowtrafficfromalimitedsetofIPaddresses.Let’sseeifwecanconnecttotheMongoDBinstancestartedoverthecloudfromtheMongoshellonthelocalmachine.Forthisactivity,wewillneedMongoDBsetuponthelocalmachine;ifnot,youmightjustreadthroughthecontentandunderstandtheconcept.
1. NotethepublicIPaddress/hostnameoftheinstancestartedinthecloudandenterthefollowingcommandonyourlocalmachine’scommandline:
$mongo--host<Publichostnameofthecloudinstance>
2. Wewillseethatthisoperationfailswiththefollowingexceptionontheconsole:
MongoDBshellversion:2.4.6
connectingto:ec2-54-87-4-215.compute-1.amazonaws.com:27017/test
SatMay0314:30:23.376Error:couldn'tconnecttoserverec2-54-87-4-
215.compute-1.amazonaws.com:27017atsrc/mongo/shell/mongo.js:147
exception:connectfailed
Thisissimplybecausetheincomingtrafficfromapublicnetworktothisserveroverport27017isblocked.Infact,alltraffic,exceptthatonport22,isblocked.
3. Wewillnowopenport27017forourcurrentIPaddress.Notethatthisisnotarecommendedapproachintheproductionenvironment;wearejustdoingthistotestconnectingtotheinstanceonthecloud.Instead,thecorrectwayistojustopentheSSHconnectiontothecloudinstanceandthenconnecttotheserverfromaclientrunoverthisinstance,aswedidintheprevioussection.
4. GototheEC2console,choosethecorrectregionatthetop,andthenclickontheSecurityGroupsmenuoptionontheleft-handside.Wewillseethesecuritygroupsdefinedasfollows:
5. Aswecansee,thereisagroupthatiscreatedwhenwestartedtheinstance.Clickon
thisgrouptoseethedetailsatthebottomofthescreenandawaytoedittherules.SelectthetypeasCustomTCPRule,portas27017,andsourceasMyIP/CustomIP,whereyoucanenteranyIPaddressorallIPaddresses.WewillchooseMyIPinthiscasefortestingpurposeandclickonSave,asshowninthefollowingscreenshot:
$mongo--host<Publichostnameofthecloudinstance>
6. Thistime,wewillbeabletoconnecttothisinstance.Now,connecttothisMongoDBinstancestartedinthecloudbytypinginthefollowingcommandfromyourlocalmachine’soperatingsystemshell:
$mongo--host<Publichostnameofthecloudinstance>
Whatwesawwasaverysimpledemoofwhatasecuritygroupofthiscloudinstancedoes.FormoredetailsonEC2instancesecurity,visithttp://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_Network_and_Security.html
WhatwesawsofarwashowtostartthemongodinstanceonthecloudusingtheMongoDBAMI.Ifthisisyouronlyobjective,thentherestofthecontentsinthesectioncanbeskipped.Whatwewillseenowishowthefilesystemissetuponthisinstance.
Movingon,let’slookatthefilesystemsetup.FromtheshellofthestartedMongoDBinstance,executethefollowingcommand:
[ec2-user@ip-10-236-144-125~]$mount
/dev/xvda1on/typeext4(rw,noatime)
procon/proctypeproc(rw)
sysfson/systypesysfs(rw)
devptson/dev/ptstypedevpts(rw,gid=5,mode=620)
tmpfson/dev/shmtypetmpfs(rw)
/dev/xvdfon/datatypeext4(rw,noexec,noatime)
/dev/xvdgon/journaltypeext4(rw,noexec,noatime)
/dev/xvdhon/logtypeext4(rw,noexec,noatime)
noneon/proc/sys/fs/binfmt_misctypebinfmt_misc(rw)
Wecanseethattherearethreedifferentmountpointsfor/data,/journal,and/log.ThesearethethreeprovisionedEBSstorageblockswith1000,250,and100IOPS,respectively.Thejournalis,however,createdinthejournaldirectoryinthedatadirectory.Let’slistthefilesinthe/datadirectoryasfollows:
[ec2-user@ip-10-236-144-125~]$ls-al/data
total98344
drwxr-xr-x4mongodmongod4096May308:35.
dr-xr-xr-x26rootroot4096May308:23..
lrwxrwxrwx1rootroot8Apr282013journal->/journal
Aswecansee,inthepartialcapturedoutputofthelscommand,thejournaldirectoryinthe/datadirectoryisalinktothe/journalmount.
Finally,let’sexecutethefollowingcommand:
[ec2-user@ip-10-236-144-125~]$cat/etc/fstab
#
LABEL=//ext4defaults,noatime11
tmpfs/dev/shmtmpfsdefaults00
devpts/dev/ptsdevptsgid=5,mode=62000
sysfs/syssysfsdefaults00
proc/procprocdefaults00
/dev/sdf/dataext4defaults,auto,noatime,noexec00
/dev/sdg/journalext4defaults,auto,noatime,noexec00
/dev/sdh/logext4defaults,auto,noatime,noexec00
Wewillseethatthethreemountpointsarealreadydefinedinthisfile.AsweuseAMIstocreatetheEBSmachine,wegetallthesethingsconfigured.
Let’slookatoneentry,/dev/sdf/dataext4defaults,auto,noatime,noexec00,addedinthefileandanalyzeit’sfieldsonebyone.Thesevaluesaretabseparated.
Thefirstvalue,/dev/sdf,isthedevicethatwearelookingtomountThesecondvalue,/data,isthedirectorytowhichthedirectorywillbemountedtoThethirdparameter,ext4,isthetypeofthefilesystemNext,wehavecomma-separatedvaluesofoptions:
Thevalue,default,isusedtoloaddefaultoptionsfortheext4partition.Thevalue,auto,isusedtoindicatethatthedevicewillbemountedautomaticallyonstartup;autoisthedefaultvalueandneednotbeexplicitlymentioned.Wheneverafileisaccessed,evenincaseofread,thelast-accessedtimeofthefileonthefilesystemisupdatedbyUnix.Thiswillhaveheavy,negativeperformanceimpactonbothreadandwriteoperations.SettingnoatimeinstructsOStonotupdatethislast-accessedtime.Thenoexecvalueinstructsthatthesefilesystemscannothaveexecutablesonthem.
Thefinaltwovaluesare0and0fordumpfrequencyandpassnumber.Bysettingthepassnumberto0,wedisablepartitionchecksforthesepartitions
Thatisprettymuchit;aswesaw,theAMIhasmadelifeeasyforusandgivenamachine
imagewithalltherecommendedsettingstohelpusgetuptospeedinspinningoffaserverinthecloudandstartingtheMongoDBserver.Allotherstepstostarttheservers,formreplicasetsandshards,andmonitorthemarethesameasthestepsusedtostartaserveronyourlocalmachineorinyourowndatacenters.RefertoChapter4,Administration,andChapter6,MonitoringandBackups,formorerecipesonadministrationandmonitoringMongoinstances.
MakesureyoustoptheEC2instance,ifthisisatest,assoonaspossiblefromtheEC2consoletoavoidbeingchargedunnecessarily.Astoppedinstancewillnotattractanycharges.TheblockedEBSinstancesarealsochargedfordataonit;ifyouplantonotusethisinstanceanymore,terminatetheinstanceandreleasetheEBSvolumesattached.
SettingupMongoDBonAmazonEC2withoutusingtheMongoDBAMIInthepreviousrecipe,wesawhowtostartastandalonedatabaseinthecloudusingtheMongoDBAMI;thisisperhapsthesimplestwaytostarttheserverontheEC2instance.However,whenusinganAMI,youaretiedtotheconfigurationstheAMIsupports.Forinstance,foranoncrucialinstance,youmightnotwantalargeinstanceoranEBSvolumewithguaranteedIOPS.YoumightevenbeOKwiththesameEBSvolumefordata,journal,andlogstocutonthecosts,astheinstanceyouaresettingupisadevelopmentortestinstance.Also,theoperatingsystemfortheAMIsisAmazonLinux;ifyouwishtouseadifferentOSandinstallMongoDBonit,AMIisn’tofanyhelp.Inallsuchscenarios,settinguptheinstancemanuallyistheonlyoptionleft.Thisisn’tasimplejob,andacarefulsetupofvariousfactors,includingthestandardoperatingsystemparameters,isneeded.Inthisrecipe,wewillnotgetintothesecomplicatedtasksbutrathersetupasmallmicroinstance,whichisasgoodasasandboxinstancewithoneEBSblockvolumeattachedtoit.
GettingreadyRefertotheGettingreadysectionofthepreviousrecipe,whichisaprerequisiteforthisrecipeaswell.
Howtodoit…1. Gotohttps://console.aws.amazon.com/ec2/,clickontheInstancesoptionintheleft-
handcorner,andthentheLaunchInstancebutton.
2. Aswewillstartafreemicroinstance,checktheFreetieronlycheckboxontheleft-handside.Ontheright-handside,selecttheinstancewewanttosetup.WewillchoosetouseUbuntuServer.ClickonSelecttonavigatetothenextwindow.
3. ChooseMicroInstanceandclickonReviewandLaunch.Ignorethesecuritywarning;thedefaultsecuritygroupthatyouhaveistheonethatwillacceptconnectionsoverport22fromallthehostsonthepublicnetwork.
4. Withouteditinganydefaultsettings,clickonLaunch.OnclickingLaunch,apopupwillappear,lettingyouchooseanexistingkeypair.Ifyouproceedwithoutakeypair,youwouldneedthepasswordorhavetocreateanewkeypair.Inthepreviousrecipe,wealreadycreatedakeypair;wewillusethishere.
5. ClickonLaunchInstancetostartthenewmicroinstance.6. Refertosteps9to12inthepreviousrecipetolearnhowtoconnecttothestarted
instanceusingPuTTY.Notethatwewillusetheubuntuuserinsteadofec2-user,whichweusedinthelastrecipe,asthistimeweareusingUbuntuinsteadofAmazonLinux.
7. BeforeweaddaMongoDBrepository,weneedtoimporttheMongoDBpublickeyasfollows:
$sudoapt-keyadv--keyserverhkp://keyserver.ubuntu.com:80--recv
7F0CEB10
8. Executethefollowingcommandontheoperatingsystemshell:
$echo'debhttp://downloads-distro.mongodb.org/repo/ubuntu-upstart/
dist10gen'|sudotee/etc/apt/sources.list.d/mongodb.list
9. Loadthelocaldatabasebyexecutingthefollowingcommand:
$sudoapt-getinstallmongodb-org
10. Executethefollowingcommandtocreatetherequireddirectories:
$sudomkdir/data/log
11. Startthemongodprocessasfollows:
$sudomongod--dbpath/data--logpath/log/mongodb.log--smallfiles--
oplogsize50–fork
12. Toensurethattheserverprocessisupandrunning,executethefollowingcommandfromtheshell,andwewillseethefollowingcommandinthelog:
$tail/log/mongodb.log
2014-05-04T13:41:16.533+0000[initandlisten]journaldir=/data/journal
2014-05-04T13:41:16.534+0000[initandlisten]recover:nojournalfiles
present,norecoveryneeded
2014-05-04T13:41:16.628+0000[initandlisten]waitingforconnectionson
port27017
13. StarttheMongoshellasfollowsandexecutethefollowingcommand:
$mongo
>db.ec2Test.insert({_id:1,message:'HelloWorld!'})
>db.ec2Test.findOne()
Howitworks…Alotofstepsareself-explanatory.Itisrecommendedthatyouatleastgothroughthepreviousrecipeasalotofconceptsthatareexplainedthereapplyforthisrecipe.Afewthingsthataredifferentareexplainedinthissection.Forinstallation,wechoseUbuntuagainstAmazonLinux,whichisstandardwhenyousetuptheserverusingtheAMI.Differentoperatingsystemshavedifferentstepsforinstallation.Visithttp://docs.mongodb.org/manual/installation/forstepstoinstallMongoDBondifferentplatforms.Steps7to9inthisrecipearespecificfortheinstallationofMongoDBonUbuntu.Refertothehttps://help.ubuntu.com/12.04/serverguide/apt-get.htmlpageformoredetailsontheapt-getcommandthatweexecutedheretoinstallMongoDB.
Inourcase,wechosetohavethedata,journal,andlogfoldersonthesameEBSvolume.Thisisbecausewhatwesetupisadevinstance.Inthecaseofaprodinstance,therewouldbedifferentEBSvolumeswithprovisionedIOPSforoptimumperformance.Thissetupallowsustogainadvantageofthefactthatthesedifferentvolumeshavedifferentcontrollers,andthus,concurrentwriteoperationsarepossible.EBSvolumeswithprovisionedvolumesarebackedbySSDdrives.Theproductiondeploymentnotesathttp://docs.mongodb.org/manual/administration/production-notes/statethatMongoDBdeploymentshouldbebackedbyRAID-10disks.WhendeployingonAWS,preferPIOPSoverRAID-10.Forinstance,if4000IOPSisdesired,thenchoosetheEBSvolumewith4000IOPSratherthanaRAID-10setupwith2X2000IOPSora4X1000IOPSsetup.ThisnotonlyeliminatesunnecessarycomplexitybutalsomakesitpossibletosnapshotasinglediskasagainstdealingwithmultipledisksintheRAID-10setup.Speakingofsnapshotting,journalloganddataarewrittentoseparatevolumesinthemajorityofproductiondeployments.Thisisthescenariowheresnapshottingdoesn’twork.WeneedtoflushtheDBwrites,lockthedataforfurtherwritesuntilbackupcompletes,andthenreleasethelock.Visithttp://docs.mongodb.org/manual/tutorial/backup-with-filesystem-snapshots/formoredetailsonsnapshottingandbackups.
Visithttp://docs.mongodb.org/ecosystem/platforms/formoredetailsondeploymentondifferentcloudproviders.ThereisasectionspecificallyforbackupsonAmazonEC2instances.PreferusingAMIstosetupMongoDBinstancesforproductiondeployments,asdemonstratedinthepreviousrecipe,overmanuallysettinguptheinstances.ManualsetupisokforasmalldevpurposewherealargeinstancewithEBSvolumeswithprovisionedIOPSisoverkill.
SeealsoCloudformationisawayinwhichyoucandefinetemplatesandautomateyourinstancecreationforEC2instances.Knowwhatcloudformationisathttps://aws.amazon.com/cloudformation/andhttps://mongodb-documentation.readthedocs.org/en/latest/ecosystem/tutorial/automate-deployment-with-cloudformation.html.Visithttp://en.wikipedia.org/wiki/Standard_RAID_levelsandhttp://en.wikipedia.org/wiki/Nested_RAID_levelstoknowmoreaboutRAID.
Chapter8.IntegrationwithHadoopInthischapter,wewillcoverthefollowingrecipes:
ExecutingourfirstsampleMapReducejobusingthemongo-hadoopconnectorWritingourfirstHadoopMapReducejobRunningMapReducejobsonHadoopusingstreamingRunningaMapReducejobonAmazonEMR
IntroductionHadoopisawell-knownopensourcesoftwareforprocessinglargedatasets.ItalsohasanAPIfortheMapReduceprogrammingmodel,whichiswidelyused.NearlyallBigDatasolutionshavesomesortofsupporttointegratewithHadooptouseitsMapReduceframework.MongoDBtoohasaconnectorthatintegrateswithHadoop;itletsuswriteMapReducejobsusingtheHadoopMapReduceAPI,processdatathatresidesintheMongoDB/MongoDBdumps,andwritetheresultbacktotheMongoDB/MongoDBdumpfiles.Inthischapter,wewilllookatsomerecipesthatdealwithbasicMongoDBandHadoopintegration.
ExecutingourfirstsampleMapReducejobusingthemongo-hadoopconnectorInthisrecipe,wewillseehowtobuildtheMongoHadoopconnectorfromsourceandsetupHadoopjustforthepurposeofrunningexamplesinthestandalonemode.TheconnectoristhebackbonethatrunsMapReducejobsonHadoopusingthedatainMongo.
GettingreadyTherearevariousdistributionsofHadoop;however,wewilluseApacheHadoop(http://hadoop.apache.org/).TheinstallationwillbedoneonaLinux-flavoredOS,andIamusingUbuntuLinux.Forproduction,ApacheHadoopalwaysrunsonaLinuxenvironment;Windowsisnottestedforproductionsystems.Fordevelopmentpurposes,however,Windowscanbeused.IfyouareaWindowsuser,IwouldrecommendthatyouinstallavirtualizationenvironmentsuchasVirtualBox(https://www.virtualbox.org/),setupaLinuxenvironment,andtheninstallHadooponit.SettingupVirtualBoxandthensettingupLinuxonitisnotshowninthisrecipe,butthisisnotatedioustask.TheprerequisiteforthisrecipeisamachinewiththeLinuxoperatingsystemonitandanInternetconnection.Theversionwesetuphereis2.4.0ofApacheHadoop.Atthetimeofwritingthisbook,thelatestversionofApacheHadoopandthatsupportedbythemongo-hadoopconnectoris2.4.0.
AGitclientisneededtoclonetherepositoryofthemongo-hadoopconnectortoalocalfilesystem.Refertohttp://git-scm.com/book/en/Getting-Started-Installing-GittoinstallGit.
YouwillalsoneedMongoDBtobeinstalledonyouroperatingsystem.Refertohttp://docs.mongodb.org/manual/installation/andinstallitaccordingly.Startthemongodinstancethatlistenstoport27017.YouarenotexpectedtobeanexpertinHadoop,butsomefamiliaritywithitwillbehelpful.KnowingtheconceptofMapReduceisimportant,andknowingtheHadoopMapReduceAPIwillbeanadvantage.Inthisrecipe,wewillexplainwhatisneededtogettheworkdone.YoumightprefertogetmoredetailsonHadoopanditsMapReduceAPIfromothersources.Thewikipageathttp://en.wikipedia.org/wiki/MapReducegivesenoughinformationontheMapReduceprogramming.
Howtodoit…1. First,installJava,Hadoop,andtherequiredpackages.2. StartbyinstallingJDKontheoperatingsystem.Typethefollowingcommandinthe
commandpromptoftheoperatingsystem:
$javac–version
3. Iftheprogramdoesn’texecuteandinstead,youaretoldaboutvariouspackagesthatcontainthejavacprogram,wewouldneedtoinstallitasfollows:
$sudoapt-getinstalldefault-jdk
ThisisallweneedtodotoinstallJava
4. Now,downloadthecurrentversionofHadoopfromhttp://www.apache.org/dyn/closer.cgi/hadoop/common/anddownloadversion2.4(orthelatestmongo-hadoopconnectorsupports).
5. Afterthe.tar.gzfileisdownloaded,executethefollowingcommandsinthecommandprompt:
$tar–xvzf<nameofthedownloaded.tar.gzfile>
$cd<extracteddirectory>
6. Opentheetc/hadoop/hadoop-env.shfileandreplaceexportJAVA_HOME=${JAVA_HOME}withexportJAVA_HOME=/usr/lib/jvm/default-java.
7. Wewillnowgetthemongo-hadoopconnectorcodefromGitHubonourlocalfilesystem.Notethatyoudon’tneedaGitHubaccounttoclonearepository.ClonetheGitprojectfromtheoperatingsystem’scommandpromptasfollows:
$gitclonehttps://github.com/mongodb/mongo-hadoop.git
$cdmongo-hadoop
8. Createasoftlinkasfollows;theHadoopinstallationdirectoryisthesameastheoneweextractedinstep5:
$ln–s<hadoopinstallationdirectory>~/hadoop-binaries
Forexample,ifHadoopinextracted/installedinthehomedirectory,thefollowingcommandwouldneedtobeexecuted:
$ln–s~/hadoop-2.4.0~/hadoop-binaries
Bydefault,themongo-hadoopconnectorwilllookforaHadoopdistributionunderthe~/hadoop-binariesfolder.So,eveniftheHadooparchiveisextractedelsewhere,wecancreateasoftlinktoit.Oncetheprecedinglinkiscreated,wewillhavetheHadoopbinariesinthe~/hadoop-binaries/hadoop-2.4.0/binpath.
9. Wewillnowbuildthemongo-hadoopconnectorfromsourceforApacheHadoopVersion2.4.0asfollows.Thebuild,bydefault,buildstheconnectorforthelatestversion;so,asofnow,the-Phadoop_versionparametercanbeleftout,as2.4isthelatestanyway:
$./gradlewjar–Phadoop_version='2.4'
Thisbuildprocesswilltakesometimetocomplete
10. Oncethebuildcompletessuccessfully,wearereadytoexecuteourfirstMapReducejob.WewouldbedoingitusingatreasuryYieldsampleprovidedwiththemongo-hadoopconnectorproject.ThefirstactivityistoimportthedatatoacollectioninMongo.
11. Assumingthatthemongodinstanceisupandrunningandlisteningtoport27017forconnectionsandthecurrentdirectoryistherootofthemongo-hadoopconnectorcodebase,executethefollowingcommand:
$mongoimport-cyield_historical.in-dmongo_hadoop--drop
examples/treasury_yield/src/main/resources/yield_historical_in.json
12. Oncetheimportactionissuccessful,weareleftwithcopyingtwoJARfilestothelibdirectory.Executethefollowingcommandsfromtheoperatingsystemshell:
$wgethttp://repo1.maven.org/maven2/org/mongodb/mongo-java-
driver/2.12.0/mongo-java-driver-2.12.0.jar
$cpcore/build/libs/mongo-hadoop-core-1.2.1-SNAPSHOT-hadoop_2.4.jar
~/hadoop-binaries/hadoop-2.4.0/lib/
$mvmongo-java-driver-2.12.0.jar~/hadoop-binaries/hadoop-2.4.0/lib
13. TheJARfilebuiltforthemongo-hadoopcoretobecopiedwasnamedasmongo-hadoop-core-1.2.1-SNAPSHOT-hadoop_2.4.jarforthetrunkversionofthecodeandbuiltforHadoop2.4.0.ChangethenameoftheJARaccordinglywhenyoubuildityourselfforadifferentversionoftheconnectorandHadoop.TheMongodrivercanbethelatestversion.Version2.12.0isthelatestoneatthetimeofwritingthisbook.
14. Now,executethefollowingcommandinthecommandpromptoftheoperatingsystemshell:
~/hadoop-binaries/hadoop-2.4.0/bin/hadoopjar
examples/treasury_yield/build/libs/treasury_yield-1.2.1-SNAPSHOT-
hadoop_2.4.jar\
com.mongodb.hadoop.examples.treasury.TreasuryYieldXMLConfig\
-Dmongo.input.split_size=8-Dmongo.job.verbose=true\
-
Dmongo.input.uri=mongodb://localhost:27017/mongo_hadoop.yield_historica
l.in\
-
Dmongo.output.uri=mongodb://localhost:27017/mongo_hadoop.yield_historic
al.out
15. Theoutputshouldprintoutalotofthings.However,thefollowinglineintheoutputwilltellusthattheMapReducejobissuccessful:
14/05/1121:38:54INFOmapreduce.Job:Jobjob_local1226390512_0001
completedsuccessfully
16. ConnectthemongodinstancethatrunsonalocalhostfromtheMongoclientandexecuteafindqueryonthefollowingcollection:
Howitworks…InstallingHadoopisnotatrivialtask,andwedon’tneedtogetintothistotryoursamplesforthemongo-hadoopconnector.TolearnaboutHadoop,therearededicatedbooksandarticlesavailable.Forthepurposeofthischapter,wewillsimplydownloadthearchiveandextractandruntheMapReducejobsinthestandalonemode.ThisisthequickestwaytogetgoingwithHadoop.Allthestepsuptostep6areneededtoinstallHadoop.Inthenextcoupleofsteps,wesimpleclonedthemongo-hadoopconnectorrepository.YoumightalsodownloadastablebuildversionforyourversionofHadoopfromhttps://github.com/mongodb/mongo-hadoop/releasesifyouprefertonotbuildfromsourceanddownloaddirectly.WethenbuilttheconnectorforourversionofHadoop(2.4.0)untilstep13.Fromstep14onwards,werantheactualMapReducejobtoworkonthedatainMongoDB.Weimportedthedataintotheyield_historical.incollection,whichwillbeusedasaninputtotheMapReducejob.GoaheadandquerythecollectionfromtheMongoshellusingthemongo_hadoopdatabasetoseeadocument.Don’tworryifyoudon’tunderstandthecontents;youwanttoseeinthisexamplewhatyouintendtodowiththisdata.
ThenextstepwastoinvoketheMapReduceoperationonthedata.ThecommandHadoopwasexecutedgivingoneofJAR’spath(examples/treasury_yield/build/libs/treasury_yield-1.2.1-SNAPSHOT-hadoop_2.4.jar).ThisistheJARfilethatcontainstheclassesthatimplementasampleMapReduceoperationfortreasuryyield.Thecom.mongodb.hadoop.examples.treasury.TreasuryYieldXMLConfigclassinthisJARfileisthebootstrapclassthatcontainsthemainmethod.Wewillseethisclasssoon.Therearelotsofconfigurationssupportedbytheconnector.Acompletelistofconfigurationscanbefoundathttps://github.com/mongodb/mongo-hadoop/blob/master/CONFIG.md.Fornow,wewilljustrememberthatmongo.input.uriandmongo.output.uriarethecollectionsforinputandoutput,respectively,oftheMapReduceoperations.
Withtheprojectcloned,youmightimportitintoanyJavaIDEofyourchoice.Weareparticularlyinterestedintheprojectat/examples/treasury_yieldandthecoreprojectpresentintherootoftheclonedrepository.
Let’slookatthecom.mongodb.hadoop.examples.treasury.TreasuryYieldXMLConfigclass.ThisistheentrypointintotheMapReducemethodandhasamainmethodinit.TowriteMapReducejobsforMongousingthemongo-hadoopconnector,themainclassalwayshastoextendfromcom.mongodb.hadoop.util.MongoTool.Thisclassimplementstheorg.apache.hadoop.Toolinterface,whichhastherunmethodandisimplementedforusbytheMongoToolclass.Allthatthemainmethodneedstodoisexecutethisclassusingtheorg.apache.hadoop.util.ToolRunnerclassbyinvokingitsstaticrunmethod,passingtheinstanceofourmainclass(aninstanceofTool).
ThereisastaticblockthatloadssomeconfigurationsfromtwoXMLfiles:hadoop-local.xmlandmongo-defaults.xml.Theformatofthesefiles(oranyXMLfile)isasfollows.Therootnodeofthefileistheconfigurationnodeandmultiplepropertynodes
underit:
<configuration>
<property>
<name>{propertyname}</name>
<value>{propertyvalue}</value>
</property>
...
</configuration>
ThepropertyvaluesthatmakesenseinthiscontextareallthosewementionedintheURLprovidedearlier.Weinstantiatecom.mongodb.hadoop.MongoConfigwrappinganinstanceoforg.apache.hadoop.conf.ConfigurationintheconstructorofthebootstrapclassTreasuryYieldXmlConfig.TheMongoConfigclassprovidessensibledefaults,whichisenoughtosatisfythemajorityoftheusecases.SomeofthemostimportantthingsweneedtosetintheMongoConfiginstancearetheoutputandtheinputformats,themapperandthereducerclasses,theoutputkeyandthevalueofmapper,andtheoutputkeyandthevalueofreducer.Theinputandoutputformatswillalwaysbethecom.mongodb.hadoop.MongoInputFormatandcom.mongodb.hadoop.MongoOutputFormatclasses,respectively;theyareprovidedbythemongo-hadoopconnectorlibrary.Forthemapperandreduceroutputkeyandthevalue,wehaveanyoftheorg.apache.hadoop.io.Writableimplementation.RefertotheHadoopdocumentationfordifferenttypesofWritableimplementationsintheorg.apache.hadoop.iopackage.Apartfromthese,themongo-hadoopconnectoralsoprovidesuswithsomeimplementationsinthecom.mongodb.hadoop.iopackage.Forthetreasuryyieldexample,weusedtheBSONWritableinstance.TheseconfigurablevaluescaneitherbeprovidedintheXMLfilewesawearlierorcanbeprogrammaticallyset.Finally,wehavetheoptiontoprovidethemasvmarguments,aswedidformongo.input.uriandmongo.output.uri.TheseparameterscanbeprovidedeitherinXMLorinvokeddirectlyfromthecodeintheMongoConfiginstance;thetwomethodsaresetInputURIandsetOutputURI.
Wewillnowlookatthemapperandreducerclassimplementations.Here,wewillcopytheimportantportionoftheclasstoanalyzeit.Refertotheclonedprojectfortheentireimplementation:
publicclassTreasuryYieldMapper
extendsMapper<Object,BSONObject,IntWritable,DoubleWritable>{
@Override
publicvoidmap(finalObjectpKey,
finalBSONObjectpValue,
finalContextpContext)
throwsIOException,InterruptedException{
finalintyear=((Date)pValue.get("_id")).getYear()+1900;
doublebid10Year=((Number)pValue.get("bc10Year")).doubleValue();
pContext.write(newIntWritable(year),newDoubleWritable(bid10Year));
}
}
Ourmapperextendstheorg.apache.hadoop.mapreduce.Mapperclass.Thefourgeneric
parametersareforthekeyclass,typeoftheinputvalue,typeoftheoutputkey,andtheoutputvalue,respectively.Thebodyofthemapmethodreadsthe_idvaluefromtheinputdocument,whichisdate,andextractstheyearoutofit.Then,itgetsthedoublevaluefromthedocumentforthebc10Yearfieldandsimplywritestothecontextkey-valuepairwherethekeyistheyearandthevalueisthedouble.Theimplementationheredoesn’trelyonthevalueofthepKeyparameterpassed;thiscanbeusedasthekey,insteadofhardcodingthe_idvalueintheimplementation.Thisvalueisbasicallythesamefieldthatwillbesetusingthemongo.input.keypropertyintheXMLorusingtheMongoConfig.setInputKeymethod.Ifnoneisset,_idisanywaythedefaultvalue.
Let’slookatthereducerimplementation(withtheloggingstatementsremoved):
publicclassTreasuryYieldReducer
extendsReducer<IntWritable,DoubleWritable,IntWritable,BSONWritable>{
@Override
publicvoidreduce(finalIntWritablepKey,finalIterable<DoubleWritable>
pValues,finalContextpContext)
throwsIOException,InterruptedException{
intcount=0;
doublesum=0;
for(finalDoubleWritablevalue:pValues){
sum+=value.get();
count++;
}
finaldoubleavg=sum/count;
BasicBSONObjectoutput=newBasicBSONObject();
output.put("count",count);
output.put("avg",avg);
output.put("sum",sum);
pContext.write(pKey,newBSONWritable(output));
}
}
Thisclassextendedfromorg.apache.hadoop.mapreduce.Reducerandhadfourgenericparametersagainfortheinputkey,inputvalue,outputkey,andtheoutputvaluerespectively.Theinputtoreduceristheoutputfrommapper.Thus,ifyounoticecarefully,thetypeofthefirsttwogenericparametersisthesameasthelasttwogenericparametersofmapperwesawearlier.Thethirdandfourthparametersinthiscasearethetypeofthekeyandthevalueemittedfromreduce,respectively.ThetypeofthevalueisBSONDocument,andthus,wehaveBSONWritableasthetype.
Wenowhavethereducemethodthathastwoparameters:thefirstoneisthekey,whichisthesameasthekeyemittedfromthemapfunction,andthesecondparameterisjava.lang.Iterableofthevaluesemittedforthesamekey.ThisishowstandardMapReducefunctionswork.Forinstance,ifthemapfunctiongavethekey-valuepairsas(1950,10),(1960,20),(1950,20),(1950,30),thenreducewouldbeinvokedwithtwouniquekeys,1950and1960.Thevalueforthekey1950willbeanIterablewith(10,20,30),whereasthatof1960willbeanIterableofasingleelement(20).ThereducefunctionofthereducerclasssimplyiteratesthroughthisIterableofdoubles,findsthesumandcountofthesenumbers,andwritesonekey-valuepairwherethekeyisthesame
astheincomingkeyandtheoutvalueisBasicBSONObject,withthesum,count,andaverageinitforthecomputedvalues.
Therearesomegoodsamples,includingtheenrondataset,intheexamplesoftheclonedmongo-hadoopconnector.Ifyouwouldliketoplayaroundabit,Iwouldrecommendthatyoutakealookattheseexampleprojectstooandrunthem.
There’smore…Whatwesawherewasaready-madesamplethatweexecuted.ThereisnothinglikewritingoneMapReducejobourselvestoclarifyourunderstanding.Inthenextrecipe,wewillwriteonesampleMapReducejobusingtheHadoopAPIinJavaandseeitinaction.
Seealsohttp://www.mail-archive.com/hadoop-user@lucene.apache.org/msg00378.htmltoknowwhattheWritableinterfaceisallaboutandwhyyoushouldnotuseplainoldserialization
WritingourfirstHadoopMapReducejobInthisrecipe,wewillwriteourfirstMapReducejobusingtheHadoopMapReduceAPIandrunitusingthemongo-hadoopconnectorthatgetsthedatafromMongoDB.RefertotheMapReduceinMongousingaJavaclientrecipeinChapter3,ProgrammingLanguageDrivers,toseehowMapReduceisimplementedusingaJavaclient,howtocreatetestdataandproblemstatements.
GettingreadyRefertothepreviousrecipetosetupthemongo-hadoopconnector.TheprerequisitesoftheExecutingourfirstsampleMapReducejobusingthemongo-hadoopconnectorrecipe(whichispresentinthischapter)andtheMapReduceinMongousingaJavaclientrecipeinChapter3,ProgrammingLanguageDrivers,areallweneedforthisrecipe.ThisisaMavenproject;thus,Mavenneedstobesetupandinstalled.RefertotheConnectingtoasinglenodefromaJavaclientrecipeinChapter1,InstallingandStartingtheMongoDBServer,wherewegavethestepstosetupMaveninWindows.However,thisprojectisbuiltonUbuntuLinux,andthefollowingisthecommandyouneedtoexecutefromtheoperatingsystemshelltogetMaven:
$sudoapt-getinstallmaven
Howtodoit…1. WehaveaJavamongo-hadoop-mapreduce-testproject,whichcanbedownloaded
fromthebook’swebsite.TheprojectistargetedatachievingthesameusecasethatweachievedintherecipesinChapter3,ProgrammingLanguageDrivers,whereweusedMongoDB’sMapReduceframework.WehadinvokedthatMapReducejobusingthePythonandJavaclientsonearlieroccasions.
2. Inthecommandprompt,withthecurrentdirectoryintherootoftheprojectwherethepom.xmlfileispresent,executethefollowingcommand:
$mvncleanpackage
3. TheJARmongo-hadoop-mapreduce-test-1.0.jarfilewillbebuiltandkeptinthetargetdirectory.
4. WiththeassumptionthattheCSVfileisalreadyimportedintothepostalCodescollection,executethefollowingcommandwiththecurrentdirectorystillintherootofthemongo-hadoop-mapreduce-testprojectwejustbuilt:
~/hadoop-binaries/hadoop-2.4.0/bin/hadoop\
jartarget/mongo-hadoop-mapreduce-test-1.0.jar\
com.packtpub.mongo.cookbook.TopStateMapReduceEntrypoint\
-Dmongo.input.split_size=8\
-Dmongo.job.verbose=true\
-Dmongo.input.uri=mongodb://localhost:27017/test.postalCodes\
-
Dmongo.output.uri=mongodb://localhost:27017/test.postalCodesHadoopmrOut
5. OncetheMapReducejobcompletes,opentheMongoshellbytypingthefollowingcommandintheoperatingsystemcommandpromptandexecutethefollowingqueryfromtheshell:
$mongo
>db.postalCodesHadoopmrOut.find().sort({count:-1}).limit(5)
6. ComparetheoutputwiththeoneswegotearlierwhenweexecutedtheMapReducejobsusingMongo’sMapReduceframework(Chapter3,ProgrammingLanguageDrivers).
Howitworks…Wehavekepttheclassesverysimpleandwiththefewestpossiblerequirements.Wejusthavethreeclassesinourproject:TopStateMapReduceEntrypoint,TopStateReducer,andTopStatesMapper.Alltheseclassesareinthesamepackagecalledcom.packtpub.mongo.cookbook.Themapfunctionofthemapperclassjustwritesakey-valuepairtothecontext;here,thekeyisthenameofthestate,andthevalueisanintegervalue1.ThefollowinglineofcodeisfromtheMapperfunction:
context.write(newText((String)value.get("state")),newIntWritable(1));
WhatthereducergetsisthesamekeythatisalistofstatesandanIterableofintegervalue1.Allthatwedoiswritetothecontextthesamenameofthestateandthesumoftheiterables.Now,sincethereisnosizemethodintheIterablethatcangivethecountinconstanttime,weareleftwithaddinguptheoneswegetinlineartime.ThefollowingisthecodesnippetintheReducermethod:
intsum=0;
for(IntWritablevalue:values){
sum+=value.get();
}
BSONObjectobject=newBasicBSONObject();
object.put("count",sum);
context.write(text,newBSONWritable(object));
WewritetocontextthetextstringthatisthekeyandthevaluethatisaJSONdocumentthatcontainsthecount.Themongo-hadoopconnectoristhenresponsibleforwritingtotheoutputcollectionwehave,thatis,postalCodesHadoopmrOut.Thedocumenthasthe_idfieldwhosevalueissameasthekeyemittedfromthemapper.Thus,whenweexecutethefollowingquery,wewillgetthetopfivestateswiththegreatestnumberofcitiesinourdatabase:
>db.postalCodesHadoopmrOut.find().sort({count:-1}).limit(5)
{"_id":"Maharashtra","count":6446}
{"_id":"Kerala","count":4684}
{"_id":"TamilNadu","count":3784}
{"_id":"AndhraPradesh","count":3550}
{"_id":"Karnataka","count":3204}
Finally,themainmethodofthemainentrypointclassisasfollows:
Configurationconf=newConfiguration();
MongoConfigconfig=newMongoConfig(conf);
config.setInputFormat(MongoInputFormat.class);
config.setMapperOutputKey(Text.class);
config.setMapperOutputValue(IntWritable.class);
config.setMapper(TopStatesMapper.class);
config.setOutputFormat(MongoOutputFormat.class);
config.setOutputKey(Text.class);
config.setOutputValue(BSONWritable.class);
config.setReducer(TopStateReducer.class);
ToolRunner.run(conf,newTopStateMapReduceEntrypoint(),args);
Allthatwedoiswraptheorg.apache.hadoop.conf.Configurationobjectwiththecom.mongodb.hadoop.MongoConfiginstancetosetvariouspropertiesandthensubmittheMapReducejobforexecutionusingToolRunner.
There’smore…Inthisrecipe,weexecutedasimpleMapReducejobonHadoopusingtheHadoopAPI,sourcingthedatafromMongoDB,andwritingittotheMongoDBcollection.Whatifwewanttowritethemapandreducefunctionsinadifferentlanguage?Fortunately,thisispossibleusingaconceptcalledHadoopstreaming,wherestdoutisusedasameanstocommunicatebetweentheprogramandtheHadoopMapReduceframework.Inthenextrecipe,wewilldemonstratehowtousePythontoimplementthesameusecaseastheoneinthisrecipeusingHadoopstreaming.
RunningMapReducejobsonHadoopusingstreamingInthepreviousrecipe,weimplementedasimpleMapReducejobusingtheJavaAPIofHadoop.TheusecasewasthesameastheoneintherecipesofChapter3,ProgrammingLanguageDrivers,wherewesawMapReduceimplementedusingMongoclientAPIsinPythonandJava.Inthisrecipe,wewilluseHadoopstreamingtoimplementMapReducejobs.
Theconceptofstreamingworksbasedoncommunicationusingstdinandstdout.GetmoreinformationonwhatHadoopstreamingisandhowitworksathttp://hadoop.apache.org/docs/r1.2.1/streaming.html.
GettingreadyRefertotheExecutingourfirstsampleMapReducejobusingthemongo-hadoopconnectorrecipetoseehowtosetupHadoopfordevelopmentpurposesandbuildthemongo-hadoopprojectusinggradle.AsfarasPythonlibrariesareconcerned,wewillinstalltherequiredlibraryfromsource.However,youcanusepiptocarryoutthesetupifyoudonotwishtobuildfromsource.Wewillalsoseehowtosetuppymongo-hadoopusingpip.
RefertotheInstallingPyMongorecipeinChapter3,ProgrammingLanguageDrivers,toseehowtoinstallPyMongoandpip.
Howitworks…1. Wewillfirstbuildpymongo–hadoopfromsource.Withtheprojectclonedtothelocal
filesystem,executethefollowingcommandsfromtherootoftheclonedproject:
$cdstreaming/language_support/python
$sudopythonsetup.pyinstall
2. Afteryouenterthepassword,setupwillcontinuetoinstallpymongo-hadooponyourmachine.
3. Thatisallweneedtodotobuildpymongo-hadoopfromsource.However,ifyouchosetonotbuildfromsource,youcouldexecutethefollowingcommandfromtheoperatingsystemshell:
$sudopipinstallpymongo_hadoop
4. Afterinstallingpymongo-hadoopineitherway,wewillnowimplementourmapperandreducerfunctionsinPython.
5. Themapperfunctionisasfollows:
#!/usr/bin/envpython
importsys
frompymongo_hadoopimportBSONMapper
defmapper(documents):
print>>sys.stderr,'Startingmapper'
fordocindocuments:
yield{'_id':doc['state'],'count':1}
print>>sys.stderr,'Mappercompleted'
BSONMapper(mapper)
6. Thereducerfunctionisasfollows:
#!/usr/bin/envpython
importsys
frompymongo_hadoopimportBSONReducer
defreducer(key,documents):
print>>sys.stderr,'Invokedreducerforkey"',key,'"'
count=0
fordocindocuments:
count+=1
return{'_id':key,'count':count}
BSONReducer(reducer)
7. The$HADOOP_HOMEand$HADOOP_CONNECTOR_HOMEenvironmentvariablesshouldpointtothebasedirectoryofHadoopandthebasedirectoryofthemongo-hadoopconnectorproject,respectively.Now,wewillinvoketheMapReducefunctionusingthefollowingcommandfromtheoperatingsystemshell.Thecodeavailableonthebook’swebsitehasthemapperandreducerPythonscriptandashellscriptthatwill
beusedtoinvokemapperandreducer:
$HADOOP_HOME/bin/hadoopjar\
$HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming*\
-libjars$HADOOP_CONNECTOR_HOME/streaming/build/libs/mongo-hadoop-
streaming-1.2.1-SNAPSHOT-hadoop_2.4.jar\
-input/tmp/in\
-output/tmp/out\
-inputformatcom.mongodb.hadoop.mapred.MongoInputFormat\
-outputformatcom.mongodb.hadoop.mapred.MongoOutputFormat\
-iomongodb\
-jobconfmongo.input.uri=mongodb://127.0.0.1:27017/test.postalCodes\
-jobconfmongo.output.uri=mongodb://127.0.0.1:27017/test.pyMRStreamTest
\
-jobconf
stream.io.identifier.resolver.class=com.mongodb.hadoop.streaming.io.Mon
goIdentifierResolver\
-mappermapper.py\
-reducerreducer.py
8. Themapper.pyandreducer.pyfilesarepresentinthecurrentdirectorywhenexecutingthiscommand.
9. Onexecutingthecommand,whichshouldtakesometimeforsuccessfulexecutionoftheMapReducejob,opentheMongoshellbytypingthefollowingcommandintheoperatingsystemcommandpromptandexecutethefollowingqueryfromtheshell:
$mongo
>db.pyMRStreamTest.find().sort({count:-1}).limit(5)
10. ComparetheoutputwiththeoneswegotearlierwhenweexecutedtheMapReducejobsusingMongo’sMapReduceframeworkinChapter3,ProgrammingLanguageDrivers.
Howtodoit…Let’slookatsteps5and6wherewewrotethemapperandreducerfunctions.Wedefinedamapfunctionthatacceptsalistofallthedocuments.Weiteratedthroughtheseyielddocumentswherethe_idfieldisthenameofthekey,andthevaluefieldcounthasthevalue1.Thenumberofdocumentsyieldedwillbethesameasthetotalnumberofinputdocuments.
Finally,weinstantiatedBSONMapper,whichacceptsthemapperfunctionastheparameter.Thefunctionreturnedageneratorobject,whichisthenusedbythisBSONMapperclasstofeedthevaluetotheMapReduceframework.Allweneedtorememberisthatthatthemapperfunctionneedstoreturnagenerator(whichisreturnedaswecallyieldintheloop)andtheninstantiatetheBSONMapperclass,whichisprovidedtousbythepymongo_hadoopmodule.Thoseintriguedenoughmightchoosetolookatthesourcecodeundertheprojectclonedonyourlocalfilesysteminthestreaming/language_support/python/pymongo_hadoop/mapper.pyfileandseewhatitdoes.Itisasmallpieceofcodethatissimpletounderstand.
Forthereducerfunction,wegotthekeyandalistofdocumentsforthiskeyasthevalue.Thekeyisthesameasthevalueofthe_idfieldemittedfromthedocumentinthemapfunction.Wesimplyreturnedanewdocumentherewith_idasthenameofthestateandcountasthenumberofdocumentsforthisstate.Rememberthat,here,wereturnadocumentandhavenotemittedoneaswedidinthemapfunction.Again,finally,weinstantiatedBSONReducerandpassedittothereducerfunction.Thesourcecodeundertheprojectclonedonourlocalfilesystemisinthestreaming/language_support/python/pymongo_hadoop/reducer.pyfile,whichhastheimplementationoftheBSONReducerclassfile.
WefinallyinvokedthecommandfromtheshelltoinitiatetheMapReducejobthatusesstreaming.AfewthingstonoteherearethatweneedtwoJARfiles:oneintheshare/hadoop/tools/libdirectoryoftheHadoopdistributionandoneinthemongo-hadoopconnector,whichispresentinthestreaming/build/libs/directory.Theinputandoutputformatsarecom.mongodb.hadoop.mapred.MongoInputFormatandcom.mongodb.hadoop.mapred.MongoOutputFormat,respectively.
Aswesawearlier,sysoutandsysinformthebackboneofstreaming.So,basically,weneedtoencodeourBSONobjectstowritetosysout;then,weshouldbeabletoreadsysintoconvertthecontenttoBSONobjectsagain.Forthispurpose,themongo-hadoopconnectorprovideduswithtwoframeworkclasses,com.mongodb.hadoop.streaming.io.MongoInputWriterandcom.mongodb.hadoop.streaming.io.MongoOutputReader,toencodeanddecodefromandtoBSONobjects,respectively.Theseclassesextendfromorg.apache.hadoop.streaming.io.InputWriterandorg.apache.hadoop.streaming.io.OutputReader.
Thevalueofthestream.io.identifier.resolver.classpropertyisgivenascom.mongodb.hadoop.streaming.io.MongoIdentifierResolver.Thisclassextendsfrom
org.apache.hadoop.streaming.io.IdentifierResolverandgivesusachancetoregisterourimplementationsoforg.apache.hadoop.streaming.io.InputWriterandorg.apache.hadoop.streaming.io.OutputReaderwiththeframework.WealsoregisteredtheoutputkeyandoutputvalueclassusingourcustomIdentifierResolver.Justremembertousethisresolveralwaysifyouareusingstreamingusingthemongo-hadoopconnector.
Finally,wegavethemapperandthereducerPythonfunctions,whichwediscussedearlier.Animportantthingtorememberis,donotprintoutlogstosysoutfromthemapperandreducerfunctions.Thesysoutandsysinarethemeansofcommunication,andwritinglogstothemcanyieldundesirablebehavior.Aswesawintheexample,writetostandarderror(stderr)or,alternatively,writetoalogfile.
WhenusingamultilinecommandinUnix,youcancontinuethecommandonthenextlineusing\.However,rememberthatthereshouldbenospacesafter\.
RunningaMapReducejobonAmazonEMRThisrecipeinvolvesrunningtheMapReducejobonthecloudusingAWS.YouwillneedanAWSaccountinordertoproceed.RegistertoAWSathttp://aws.amazon.com/.WewillseehowtorunaMapReducejobonthecloudusingAmazonElasticMapReduce(EMR).AmazonEMRisamanagedMapReduceserviceprovidedbyAmazononthecloud.Formoredetails,refertohttps://aws.amazon.com/elasticmapreduce/.AmazonEMRrequiresthedata,binaries/jars,andsoontobepresentintheS3bucketthatitprocesses.ItthenwritestheresultsbacktotheS3bucket.AmazonSimpleStorageService(S3)isanotherservicebyAWSfordatastorageonthecloud.FormoredetailsonAmazonS3,refertohttp://aws.amazon.com/s3/.Thoughwewillusethemongo-hadoopconnector,aninterestingfactisthatwewon’trequireaMongoDBinstancetobeupandrunning.WewillusetheMongoDBdatadumpstoredinanS3bucketanduseitforourdataanalysis.TheMapReduceprogramwillrunontheinputBSONdumpandgeneratetheresultingBSONdumpintheoutputbucket.ThelogsoftheMapReduceprogramwillbewrittentoanotherbucketdedicatedtologs.Thefollowingdiagramgivesusanideaofhowoursetupwilllooklikeatahighlevel:
GettingreadyWewillusethesameJavasampleforthisrecipeastheoneweusedintheWritingourfirstHadoopMapReducejobrecipe.Toknowmoreaboutthemapperandreducerclassimplementation,refertotheHowitworks…sectionoftheWritingourfirstHadoopMapReducejobrecipe.Wehaveamongo-hadoop-emr-testprojectavailablewiththecodethatcanbedownloadedfromthebook’swebsite;thiscodeisusedtocreateaMapReducejobonthecloudusingAWSEMRAPIs.
Tosimplifythings,wewilluploadjustoneJARfiletotheS3buckettoexecutetheMapReducejob.ThisJARfilewillbeassembledusingaBATfileforWindowsandashellscriptonUnix-basedoperatingsystems.Themongo-hadoop-emr-testJavaprojecthasamongo-hadoop-emr-binariessubdirectorythatcontainsthenecessarybinariesalongwiththescriptstoassemblethemintooneJARfile.TheassembledJARfilenamedmongo-hadoop-emr-assembly.jarisalsoprovidedinthesubdirectory.Runningthe.bator.shfilewilldeletethisJARfileandregeneratetheassembledJARfile;itisnotmandatorytodothis.TheassembledJARfilethatisalreadyprovidedisgoodenoughandwillworkjustfine.TheJavaprojectcontainsadatasubdirectorywithapostalCodes.bsonfileinit.ThisistheBSONdumpgeneratedoutofthedatabasethatcontainsthepostalCodescollection.ThemongodumputilityprovidedwiththeMongodistributionisusedtoextractthisdump.
Howtodoit…1. ThefirststepofthisexerciseistocreateabucketonS3.Youmightchoosetousean
existingbucket.However,forthisrecipe,Iamcreatingabucketnamedcom.packtpub.mongo.cookbook.emr-in.RememberthatthenameofthebuckethastobeuniqueacrossalltheS3buckets;otherwise,youwillnotbeabletocreateabucketwiththisveryname.Youwillhavetocreateonewithadifferentnameanduseitinplaceofcom.packtpub.mongo.cookbook.emr-inusedinthisrecipe.
TipDonotcreatebucketnameswithanunderscore(_);useahyphen(-)instead.Bucketcreationwithanunderscorewillnotfail,buttheMapReducejobwillfaillaterasitdoesn’tacceptunderscoresinthebucketnames.
2. WewilluploadtheassembledJARfilesanda.bsonfileforthedatatothenewlycreated(orexisting)S3bucket.Touploadthefiles,wewillusetheAWSwebconsole.ClickontheUploadbuttonandselecttheassembledJARfileandthepostalCodes.bsonfiletobeuploadedontheS3bucket.Afterupload,thecontentsofthebucketwilllooklikethefollowingscreenshot:
ThefollowingstepsaretoinitiatetheEMRjobfromtheAWSconsolewithoutwritingasinglelineofcode.WewillalsoseehowtoinitiatethesameusingtheAWSJavaSDK.Followsteps4to9ifyouarelookingtoinitiatetheEMRjobfromtheAWSconsole.Followsteps10and11tostarttheEMRjobusingtheJavaSDK.
1. WewillfirstinitiateaMapReducejobfromtheAWSconsole.Visithttps://console.aws.amazon.com/elasticmapreduce/andclickontheCreateClusterbutton.IntheClusterConfigurationscreen,enterthedetailsshowninthefollowingscreenshot,exceptfortheloggingbucket.Youwillneedtoselectthebuckettowhichthelogsneedtobewritten.Youmightalsoclickonthefoldericonnexttothetextboxforthebucketnameandselectthebucketpresentforyouraccounttobeusedastheloggingbucket,asshowninthefollowingscreenshot:
2. TheTerminationprotectionoptionissettoNo,asthisisatestinstance.Inthecaseofanyerror,wewouldprefertheinstancestoterminatetoavoidkeepingthemrunningandincurcharges.
3. IntheSoftwareConfigurationsection,selecttheHadoopversionas2.4.0andAMIversionas3.1.0.Removetheadditionalapplicationsbyclickingonthecrossnexttotheirnames,asshowninthefollowingscreenshot:
4. IntheHardwareConfigurationsection,selecttheEC2instancetypeasm1.medium.ThisistheminimumweneedtoselectforHadoopVersion2.4.0.Thenumberofinstancesfortheslaveandtaskinstancesiszero.Thefollowingscreenshotshowstheconfigurationselected:
5. IntheSecurityandAccesssection,leaveallthedefaultvalues.WealsohavenoneedforaBootstrapAction,soleavethisasistoo.
6. ThenextstepistosetupstepsfortheMapReducjob.IntheAddstepdrop-downmenu,selecttheCustomJARoptionandselecttheAuto-terminateoptiontoYes,asshowninthefollowingscreenshot:
7. Now,clickontheConfigureandaddbuttonandenterthedetails.8. ThevalueoftheJARS3Locationfieldisgivenas
s3://com.packtpub.mongo.cookbook.emr-in/mongo-hadoop-emr-assembly.jar.Thisisthelocationinmyinputbucket;youneedtochangetheinputbucketasperyourowninputbucket.ThenameoftheJARfilewillbesame.
9. EnterthefollowingargumentsintheArgumentstextarea.Thenameofthemainclassisfirstinthelist:
com.packtpub.mongo.cookbook.TopStateMapReduceEntrypoint
-Dmongo.job.input.format=com.mongodb.hadoop.BSONFileInputFormat
-Dmongo.job.mapper=com.packtpub.mongo.cookbook.TopStatesMapper
-Dmongo.job.reducer=com.packtpub.mongo.cookbook.TopStateReducer
-Dmongo.job.output=org.apache.hadoop.io.Text
-Dmongo.job.output.value=org.apache.hadoop.io.IntWritable
-Dmongo.job.output.value=org.apache.hadoop.io.IntWritable
-Dmongo.job.output.format=com.mongodb.hadoop.BSONFileOutputFormat
-Dmapred.input.dir=s3://com.packtpub.mongo.cookbook.emr-
in/postalCodes.bson
-Dmapred.output.dir=s3://com.packtpub.mongo.cookbook.emr-out/
10. Again,thevalueofthefinaltwoargumentscontainstheinputandoutputbucketsusedformyMapReducesample.Thisvaluewillchangeaccordingtoyourowninputandoutputbuckets.ThevalueofActiononfailurewillbeTerminatecluster.Thefollowingscreenshotshowsthevaluesfilled.ClickonSaveafteralltheprecedingdetailsareenteredin:
11. Now,clickontheCreateClusterbutton.Thiswilltakesometimetoprovisionandstartthecluster.
12. Inthefollowingfewsteps,wewillcreateaMapReducejobonEMRusingtheAWSJavaAPI.ImporttheEMRTestprojectprovidedwiththecodesamplesintoyourfavoriteIDE.Onceimported,openthecom.packtpub.mongo.cookbook.AWSElasticMapReduceEntrypointclass.
13. Therearefiveconstantsthatneedtobechangedintheclass.Theyareinput;output;logbucket,whichyouwilluseforyourexample;theEC2keyname;theAWSaccess;andthesecretkey.Theaccesskeyandsecretkeyactastheusernameandpassword,respectively,whenyouuseAWSSDK.Changethesevaluesaccordinglyandruntheprogram;onsuccessfulexecution,itshouldgiveyouajobIDforthenewlyinitiatedjob.
14. IrrespectiveofhowyouinitiatedtheEMRjob,visittheEMRconsoleathttps://console.aws.amazon.com/elasticmapreduce/toseethestatusofyoursubmittedID.ThejobIDyouseeinthesecondcolumnofyourinitiatedjobswillbethesameasthejobIDprintedtotheconsolewhenyouexecutedtheJavaprogram(ifyouinitiatedthejobusingtheJavaprogram).Clickonthenameofthejobinitiated;thisshouldnavigateyoutothejob-detailspage.Thehardwareprovisioningwilltakesometimeandthen,finally,yourMapReducestepwillrun.Oncethejobiscomplete,thestatusofthejobwilllooklikethefollowingscreenshot:
15. WhentheStepssectionisexpanded,itwilllooklikethefollowingscreenshot:
16. ClickonthestderrlinkbelowtheLogfilessectiontoviewallthelogs’outputfortheMapReducejob.
17. NowthattheMapReducejobiscomplete,ournextstepistoseetheresultsofit.VisittheS3consoleathttps://console.aws.amazon.com/s3,andvisittheoutbucketset.Inmycase,thefollowingisthecontentoftheoutbucket:
Thepart-r-0000.bsonfileinterestsus.ThisfilecontainstheresultsofourMapReducejob.
18. DownloadthefiletoyourlocalfilesystemandimportitintoarunningMongoinstancelocallyusingthemongorestoreutilityasfollows.Notethattherestoreutilityforthefollowingcommandexpectsamongodinstancetobeupandrunningandlisteningtoport27017andthepart-r-0000.bsonfileinthecurrentdirectory:
$mongorestorepart-r-00000.bson-dtest-cmongoEMRResults
19. Now,connecttothemongodinstanceusingtheMongoshellandexecutethefollowingquery:
>db.mongoEMRResults.find().sort({count:-1}).limit(5)
{"_id":"Maharashtra","count":6446}
{"_id":"Kerala","count":4684}
{"_id":"TamilNadu","count":3784}
{"_id":"AndhraPradesh","count":3550}
{"_id":"Karnataka","count":3204}
20. Theprecedingcommandshowsthetopfiveresults.Ifwecomparetheresultswegot
inChapter3,ProgrammingLanguageDrivers,forusingMongo’sMapReduceframeworkandtheWritingourfirstHadoopMapReducejobrecipeinthischapter,wewillseethattheresultsareidentical.
Howitworks…AmazonEMRisamanagedHadoopservicethattakescareofhardwareprovisioningandkeepsyouawayfromthehassleofsettingupyourowncluster.TheconceptsrelatedtoourMapReduceprogramarealreadycoveredintheWritingourfirstHadoopMapReducejobrecipe,andthereisnothingadditionaltomention.OnethingwedidwastoassembletheJARsthatweneedintoonebigfatJARtoexecuteourMapReducejob.ThisapproachisOKforoursmallMapReducejob.Inthecaseoflargerjobswherealotofthird-partyJARsareneeded,wewillhavetogoforanapproachwherewewilladdtheJARstothelibdirectoryoftheHadoopinstallationandexecuteitinthesamewaywedidinourMapReducejobthatweexecutedlocally.Anotherthingthatwediddifferentlyfromwhatwedidinourlocalsetupwastonotuseamongodinstancetosourcethedataandwritethedata;instead,weusedBSONdumpfilesfromtheMongodatabaseasaninputandwritetheoutputtoBSONfiles.TheoutputdumpwillthenbeimportedtoaMongodatabaselocally,andtheresultswillbeanalyzed.ItisprettycommontohavedatadumpsuploadedtoS3buckets;thus,runninganalyticsjobsonthisdatauploadedtoS3onthecloudusingcloudinfrastructureisagoodoption.ThedataaccessedfromthebucketsbytheEMRclusterneednothavepublicaccess,astheEMRjobrunsusingouraccount’scredentials;wearegoodtoaccessourownbucketstoreadandwritedata/logs.
SeealsoThedeveloper’sguideforEMRathttp://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/.https://github.com/mongodb/mongo-hadoop/tree/master/examples/elastic-mapreducetoseethesampleMapReducejobontheenrondatasetgivenaspartofthemongo-hadoopconnector’sexamples.YoumightchoosetoimplementthisexampleonAmazonEMRasperthegiveninstructions.
Chapter9.OpenSourceandProprietaryToolsInthischapter,wewillcoversomeopensourceandproprietarytools.Wewillcoverthefollowingrecipesinthischapter:
Developingusingspring-data-mongodbAccessingMongoDBusingJavaPersistenceAPIAccessingMongoDBoverRESTInstallingtheGUI-basedclient,MongoVUE,forMongoDB
IntroductionThereisavastarrayoftools/frameworksavailabletoeasethedevelopment/administrationprocessforsoftwarethatusesMongoDB.Wewilllookatsomeoftheseavailableframeworksandtools.Fordeveloperproductivity(Javadevelopersinthiscase),wewilllookatspring-data-mongodb,whichispartofthepopularSpringdatasuite.
JavaPersistenceAPI(JPA)isanobjectrelationalmapping(ORM)specificationthatiswidelyused,particularlywithrelationaldatabases(thiswastheobjectiveofORMframeworks).However,therearefewimplementationsthatletususeitwithNoSQLstores,MongoDBinthiscase.Wewilllookatoneproviderthatprovidesthisimplementationandputittothetestwithasimpleusecase.
Wewillusespring-data-resttoexposeCRUDrepositoriesforMongoDBoveraRESTinterfaceforclientstoinvokevariousoperationssupportedbytheunderlyingspring-data-mongorepository.
QueryingthedatabasefromtheshellisOK,butitwillbenicetohaveagoodGUItoenableustodoalladministrative/development-relatedtasksfromtheGUIratherthanexecutingthecommandsfromtheshelltoperformtheseactivities.Wewilllookatonesuchtool,MongoVUE,inthischapter.
Developingusingspring-data-mongodbFromtheperspectiveofdevelopers,whenaprogramneedstointeractwithaMongoDBinstance,theyneedtousetherespectiveclientAPIsfortheirspecificplatforms.Thetroublewithdoingthisisthatweneedtowritealotofboilerplatecode,anditisnotnecessarilyobject-oriented.Forinstance,wehaveaclasscalledPersonwithvariousattributessuchasname,age,andaddress.ThecorrespondingJSONdocumenttoosharesasimilarstructuretothisPersonclassasfollows:
{
name:"…",
age:..,
address:{lineOne:"…",…}
}
However,tostorethisdocument,weneedtoconvertthePersonclasstoaDBObject,whichisamapwithkeyandvaluepairs.WhatisreallydesiredistoletuspersistthisPersonclassitselfasanobjectinthedatabase,withouthavingtoconvertittoDBObject.
Also,someoftheoperationssuchassearchingbyaparticularfieldofadocument,savinganentity,deletinganentity,andsearchingbyIDareprettycommon,andwetendtorepeatedlywritesimilarboilerplatecode.Inthisrecipe,wewillseehowspring-data-mongodbrelievesusoftheselaboriousandcumbersometasksnotonlytoreducethedevelopmenteffortbutalsotodramaticallyreducethepossibilityofintroducingbugsinthesecommonlywrittenfunctions.
GettingreadyTheSpringDataMongoTestprojectpresentinthebundleinthechapterisaMavenprojectandistobeimportedintoanyIDEofyourchoice.TherequiredMavenartifactswillautomaticallybedownloaded.AsingleMongoDBinstancethatlistenstoport27017isrequiredtobeup-and-running.RefertotheSinglenodeinstallationofMongoDBrecipeinChapter1,InstallingandStartingtheMongoDBServer,toknowhowtostartastandaloneinstance.
Fortheaggregationexample,wewillusethepostalcodedata.RefertotheCreatingtestdatarecipeinChapter2,Command-lineOperationsandIndexes,forthecreationoftestdata.
Howtodoit…1. Wewillexploretherepositoryfeatureofspring-data-mongodbfirst.Openthetest
caseclassnamedcom.packtpub.mongo.cookbook.MongoCrudRepositoryTestfromyourIDEandexecuteit.IfallgoeswellandtheMongoDBserverinstanceisreachable,thetestcasewillexecutesuccessfully.
2. Anothertestcasenamedcom.packtpub.mongo.cookbook.MongoCrudRepositoryTest2isusedtoexploremorefeaturesoftherepositorysupportprovidedbyspring-data-mongodb.Thistestcasetooshouldgetexecutedsuccessfully.
3. WewillseehowMongoTemplateofspring-data-mongodbcanbeusedtoperformCRUDoperationsandothercommonoperationsonMongoDB.Openthecom.packtpub.mongo.cookbook.MongoTemplateTestclassandexecuteit.
4. Alternatively,ifanIDEisnotused,allthetestscanbeexecutedusingMavenfromthecommandpromptasfollows,withthecurrentdirectorybeingtherootoftheSpringDataMongoTestproject:
$mvncleantest
Howitworks…Wewillfirstlookatwhatwedidincom.packtpub.mongo.cookbook.MongoCrudRepositoryTest,wherewesawtherepositorysupportprovidedbyspring-data-mongodb.Justincaseyoudidn’tnotice,wehaven’twrittenasinglelineofcodefortherepository.ThemagicofimplementingtherequiredcodeforusisdonebytheSpringdataproject.Let’sstartbylookingattherelevantportionsoftheXMLconfigfile:
<mongo:repositoriesbase-package="com.packtpub.mongo.cookbook"/>
<mongo:mongoid="mongo"host="localhost"port="27017"/>
<mongo:db-factoryid="factory"dbname="test"mongo-ref="mongo"/>
<mongo:templateid="mongoTemplate"db-factory-ref="factory"/>
Wewillfirstlookatthelastthreelines:spring-data-mongodbnamespacedeclarationstoinstantiatecom.mongodb.Mongo,instantiatingafactoryforthecom.mongodb.DBinstancesfromtheclient,andatemplateinstance,usedtoperformvariousoperationsonMongoDB,respectively.Wewillseeorg.springframework.data.mongodb.core.MongoTemplateinmoredetaillater.
ThefirstlineisanamespacedeclarationforthebasepackageofalltheCRUDrepositorieswehave.Inthispackage,wehaveaninterfacewiththefollowingbody:
publicinterfacePersonRepositoryextends
PagingAndSortingRepository<Person,Integer>{
/**
*
*@paramlastName
*@return
*/
PersonfindByLastName(StringlastName);
}
ThePagingAndSortingRepositoryinterfaceisfromtheorg.springframework.data.repositorypackageoftheSpringdatacoreprojectandextendsfromCrudRepositoryofthesameproject.TheseinterfacesgiveussomemostcommonmethodssuchassearchingbytheID/primarykey,deletinganentity,insertinganentity,andupdatinganentity.Therepositoryneedsanobjectthatitmapstotheunderlyingdatastore.TheSpringdataprojectsupportsalargenumberofdatastoresthatarenotjustlimitedtoSQL(usingJavaDatabaseConnectivity(JDBC)andJPA)orMongoDBbutalsotootherNoSQLstoressuchasRedisandHadoopandsearchenginessuchasSolrandElasticsearch.Inthecaseofspring-data-mongodb,theobjectismappedtoadocumentinthecollection.
ThePagingAndSortingRepository<Person,Integer>signatureindicatesthatthefirstoneistheentitythattheCRUDrepositoryisbuiltfor,andthesecondoneisthetypeoftheprimarykey/IDfield.
WehaveaddedjustonemethodnamedfindByLastName;thisacceptsonestringvalueforthelastnameasaparameter.Thisisaninterestingoperation;itisspecifictoour
repositoryandnotevenimplementedbyus,butitwillstillworkjustasexpected.ThePersonclassisaPOJOwherewehaveannotatedtheIDfieldwiththeorg.springframework.data.annotation.Idannotation.Nothingelseisreallyspecialaboutthisclass;itjusthassomeplaingettersandsetters.
Withallthesesmalldetails,let’sjointhesedotstogetherbyansweringsomequestionsyou’llhaveinmind.First,wewillseewhichserver,database,andcollectionourdatagoesto.Ifwelookatthemongo:mongoXMLdefinitionfortheconfigfile,wewillseethatweinstantiatedthecom.mongodb.Mongoclassbyconnectingtoalocalhostandport27017.Themongo:db-factorydeclarationisusedtodenotethatthedatabasetobeusedistest.Onefinalquestionis,whichcollection?ThesimplenameofourclassisPerson.Thenameofthecollectionisthesimplenamewiththefirstcharacterlowercased;thus,Personwillbecomeperson,andsomethinglikeBillingAddresswillbecomethebillingAddresscollection.Thesearethedefaultvalues.However,ifyouneedtooverridethisvalue,youcanannotateyourclasswiththeorg.springframework.data.mongodb.core.mapping.Documentannotationanduseitsattributecollectiontogiveanynameofyourchoice,aswewillseeinanexamplelater.
Toviewthedocumentinthecollection,executejustonetestcasemethodcalledsaveAndQueryPersonfromthecom.packtpub.mongo.cookbook.MongoCrudRepositoryTestclass.Now,connecttotheMongoDBinstancefromtheMongoshellandexecutethefollowingquery:
>usetest
>db.person.findOne({_id:1})
{
"_id":1,
"_class":"com.packtpub.mongo.cookbook.domain.Person",
"firstName":"Steve",
"lastName":"Johnson",
"age":20,
"gender":"Male"
…
}
Aswecanseeintheprecedingresult,thecontentsofthedocumentaresimilartotheobjectwepersistedusingtheCRUDrepository.ThenamesofthefieldinthedocumentarethesameasthenamesoftherespectiveattributesintheJavaobject,withtwoexceptions.Thefieldannotatedwith@Idisnow_id,irrespectiveofthenameofthefieldintheJavaclass,andanadditional_classattributeisaddedtothedocumentwhosevalueisthefullyqualifiednameoftheJavaclassitself.Thisisnotofanyusefortheapplicationbutisusedbyspring-data-mongodbasmetadata.
Now,itmakesmoresenseandgivesusanideaaboutwhatspring-data-mongodbmustbedoingforallthebasicCRUDmethods.AlltheoperationsweperformwillusetheMongoTemplateclass(MongoOperationstobeprecise,whichisaninterfacethatMongoTemplateimplements)fromthespring-data-mongodbproject.Tofinditbyusingtheprimarykey,itwillinvokeafindquerybythe_idfieldonthecollectionderivedusingthePersonentityclass.Thesavemethodsimplycallsthesavemethodon
MongoOperations;thisinturncallsthesavemethodonthecom.mongodb.DBCollectionclass.
Westillhaven’tansweredhowthefindByLastNamemethodworked.HowdoesSpringknowwhatquerytoinvoketoreturnthedata?Thesearethespecialtypesofmethodsthatbeginwithfind,findBy,get,orgetBy.Therearesomerulesthatoneneedstofollowwhilenamingamethod,andtheproxyobjectontherepositoryinterfaceisabletocorrectlyconvertthismethodintoanappropriatequeryonthecollection.Forinstance,thefindByLastNamemethodintherepositoryofthePersonclasswillexecuteaqueryonthelastNamefieldinthedocumentofthePersonclass.Hence,thefindByLastName(StringlastName)methodwillfirethefollowingqueryinthedatabase:
db.person.find({'lastName':lastName})
Basedonthereturntypeofthemethoddefined,itwillreturneitheralistorthefirstresultinthereturnedresultfromthedatabase.WehaveusedfindByinourqueries.However,foranythingthatbeginswithfind,havinganytextinbetweenandhavingtextthatendsinByworks.Forinstance,findPersonByisthesameasfindBy.
FormoreinformationonthesefindBymethods,wehaveanothertestclass,MongoCrudRepositoryTest2.OpenthisclassinyourIDEwhereitcanbereadalongwiththistext.Wehavealreadyexecutedthistestcase;now,let’sseethesefindBymethodsusedandtheirbehavior.ThisclasshassevenfindBymethodsinit,withoneofthemethodsbeingavariantofanothermethodinthesameinterface.Togetaclearideaofthequeries,wewillfirstlookatoneofthedocumentsinthepersonTwocollectioninthetestdatabase.ExecutethefollowingcommandsontheMongoshellconnectedtotheMongoDBserverthatrunsonalocalhost:
>usetest
>db.personTwo.findOne({firstName:'Amit'})
{
"_id":2,
"_class":"com.packtpub.mongo.cookbook.domain.Person2",
"firstName":"Amit",
"lastName":"Sharma",
"age":25,
"gender":"Male",
"residentialAddress":{
"addressLineOne":"20,Centralstreet",
"city":"Mumbai",
"state":"Maharashtra",
"country":"India",
"zip":"400101"
}
}
Also,notethattherepositoryusesthePerson2class.However,thenameofthecollectionusedispersonTwo.Thiswaspossiblebecauseweusedthe@Document(collection="personTwo")annotationontopofthePerson2class.
Gettingbacktothesevenmethodsinthe
com.packtpub.mongo.cookbook.PersonRepositoryTworepositoryclass,let’slookatthemonebyone:
Method Description
findByAgeGreaterThanEqual
Thismethodwillfirethe{'age':{'$gte':<age>}}queryonthepersonTwocollection.
Thesecretliesinthenameofthemethod.Ifwebreakitup,whatwehaveafterfindBytellsuswhatwewant.Theageproperty(withfirstcharacterlowercased)isthefieldthatwouldbequeriedonthedocumentwiththe$gteoperatorbecausewehaveGreaterThanEqualinthenameofthemethod.Thevaluethatwouldbeusedforthecomparisonwouldbethevalueoftheparameterpassed.TheresultisacollectionofPerson2entities,aswewillhavemultiplematches.
findByAgeBetween
Thismethodwillqueryonagebutwilluseacombinationof$gtand$lttofindthematchingresult.Thequeryinthiscasewillbe{'age':{'$gt':from,'$lt':to}}.Itisimportanttonotethatboththefromandtovaluesareexclusiveintherange.Therearetwomethodsinthetestcase:findByAgeBetweenandfindByAgeBetween2.Thesemethodsdemonstratethebehaviorofthebetweenqueryfordifferentinputvalues.
findByAgeGreaterThan
Thismethodisaspecialmethodthatalsosortstheresultbecausetherearetwoparameterstothemethod:thevalueagainstwhichtheageparameterwillbecomparedisthefirstparameter,andthesecondparameteristhefieldoftypeorg.springframework.data.domain.Sort.Formoredetails,refertotheJavadocforspring-data-mongodb.
findPeopleByLastNameLike
Thismethodisusedtofindresultsbythelastnamethatmatchesapattern.Regularexpressionsareusedforthematchingpurpose.Forinstance,inthiscase,thequeryfiredwillbe{'lastName':<lastNameasregex>}.Thismethod’snamebeginswithfindPeopleByinsteadoffindBy,whichworksinthesamewayasfindBy.Thus,whenwesayfindByinallthedescriptions,weactuallymeanfind…By.
Thevalueprovidedastheparameterwillbeusedtomatchthelastname.
findByResidentialAddressCountry
Thisisaninterestingmethodtolookat.Here,wearelookingtosearchbythecountryoftheresidentialaddress.Thisis,infact,afieldintheAddressclassintheresidentialAddressfieldoftheperson.TakealookatthedocumentfromthepersonTwocollectionmentionedearlierforhowthequerywillbeused.
WhentheSpringdatafindsthenameasResidentialAddressCountry,itwilltrytofindvariouscombinationsusingthisstring.Forinstance,itcanlookatresidentialAddressCountryinPersonorinresidential.addressCountry,residentialAddress.country,orresidential.address.country.Iftherearenoconflictingvalues,asinourcase,residentialAddress.countryisasuccesspathinthePerson2objecttree,andthus,thiswillbeusedinthequery.
However,ifthereareconflicts,thenunderscorescanbeusedtoclearlyspecifywhatwearelookingat.Inthiscase,themethodcanberenamedtofindByResidentialAddress_countrytoclearlyspecifywhatweexpectastheresult.ThefindByCountry2testcasemethoddemonstratesthis.
Thisisaninterestingmethodaswell.Wearenotalwaysabletousethemethodnamestoimplementwhatweactuallywantto,asthenameofthemethodrequiredforSpringtoautomaticallyimplementthequerymightbebitawkwardtouseasis.Forinstance,findByCountryOfResidencesoundsbetterthanfindByResidentialAddressCountry.However,wearestuckwiththelatter,as
findByFirstNameAndCountry
thisishowspring-data-mongodbwillconstructthequery.UsingfindByCountryOfResidencegivesnodetailsonhowtoconstructthequerytoSpringdata.
However,thereisasolutiontothis.Youmightchoosetousethe@Queryannotationandspecifythequerytobeexecutedwhenthemethodisinvoked.Thefollowingistheannotationweusedinourcase:
@Query("{'firstName':?0,'residentialAddress.country':?1}")
Wewritethevalueasaquerythatwillgetexecuted,andwebindtheparametersofthefunctionstothequeryasnumberedparametersthatstartfrom0.Thus,thefirstparameterofthemethodwillbeboundto?0,thesecondto?1,andsoon.
WesawhowthefindByorgetBymethodisautomaticallytranslatedtoqueriesforMongoDB.Similarly,wehavesomewell-knownprefixesforthemethods.ThecountByprefixreturnsthelongnumberforthecountforagivencondition,whichisderivedfromtherestofthemethodnamesthataresimilartofindBy.WecanhavedeleteByorremoveBytodeletethedocumentsbythederivedcondition.Also,onethingtonoteaboutthecom.packtpub.mongo.cookbook.domain.Person2classisthatitdoesnothaveano-argumentconstructororsettertosetthevalues.Springwill,instead,usereflectiontoinstantiatethisobject.
AlotoffindBymethodsaresupportedbyspring-data-mongodb,andallarenotcoveredhere.Formoredetails,refertothespring-data-mongodbreferencemanual.AlotofXML-basedorJava-basedconfigurationoptionsareavailabletooandcanbefoundinthereferencemanual.ThelinksaregivenintheSeealsosectionlaterinthisrecipe.
Wearenotdoneyet,though;wehaveanothertestcase,com.packtpub.mongo.cookbook.MongoTemplateTest.Thisusesorg.springframework.data.mongodb.core.MongoTemplatetoperformvariousoperations.Wecanopenthetestcaseclass,andwewillseewhatoperationsareperformedandwhichmethodsoftheMongoTemplateclassareinvoked.
Let’slookatsomeoftheimportantandfrequentlyusedmethodsoftheMongoTemplateclass:
Method Description
save
Thismethodisusedtosave(insertifnew;otherwise,update)anentityinMongoDB.Themethodtakesoneparameter,theentity,andfindsthetargetcollectionbasedonitsnameorthe@Documentannotationpresentinit.
Thereisanoverloadedversionofthesavemethodthatalsoacceptsthesecondparameter,whichisthenameofthecollectiontowhichthedataentitypassedneedstobepersisted.
remove
Thismethodwillbeusedtoremovedocumentsfromthecollection.Ithasgotsomeoverloadedversionsinthisclass.Allofthemaccepteitheranentitytobedeletedortheorg.springframework.data.mongodb.core.query.Queryinstance,whichisusedtodeterminethedocument(s)tobedeleted.Thesecondparameteristhenthenameofthecollectionfromwhichthedocumentistobedeleted.Whenanentityisprovided,thenameofthecollectioncanbederived.WithaQueryinstanceprovided,wehavetogiveeitherthenameofthecollectionortheentityclassname,whichinturnwillbeusedtoderivethenameofthecollection.
updateMulti
Thisisthefunctioninvokedtoupdatemultipledocumentswithoneupdatecall.Thefirstparameteristhequerythatwillbeusedtomatchthedocuments.Thesecondparameterisaninstanceoforg.springframework.data.mongodb.core.query.Update.Thisistheupdatethatwillbeexecutedonthedocumentsselectedusingthefirstqueryobject.Thenextparameteristheentityclassorthecollectionnametoexecutetheupdateon.Formoredetailsonthemethodanditsvariousoverloadedversions,refertotheJavadoc.
updateFirst
ThisistheoppositeoftheupdateMultimethod.Thisoperationwillupdatejustthefirstmatchingdocument.Wehavenotcoveredthismethodinourunittestcase.
insert
Wementionedthatthesavemethodcanperforminsertionsandupdates.TheinsertmethodinthetemplatecallstheinsertmethodoftheunderlyingMongoclient.Ifoneentitydocumentistobeinserted,thereisnodifferenceincallingtheinsertorsavemethod.
However,aswesawinthetestcaseintheinsertMultiplemethod,wecreatedalistofthreePersoninstancesandpassedthemtotheinsertmethod.AllthethreedocumentsforthethreePersoninstanceswillgototheserveraspartofonecall.Thebehaviorofaninsertwhenitfailsisdeterminedbythecontinueonerrorparameterofthewriteconcern.Itwilldeterminewhetherthebulkinsertfailsonthefirstfailureorcontinuesevenaftererrorsthatreportonlythelasterror.Thepageathttp://docs.mongodb.org/manual/core/bulk-inserts/givesmoredetailsonbulkinsertsandvariouswriteconcernparametersthatcanalterthebehavior.
findAndRemove/findAllAndRemove
Boththeseoperationsareusedtofindandthenremovethedocument(s).ThefindAndRemovemethodfindsonedocumentandthenreturnsthedeleteddocument.Thisoperationisatomic.However,thefindAllAndRemovemethodfindsallthedocumentsandremovesthembeforereturningthelistofalltheentitiesofallthedocumentsdeleted.
findAndModify
ThismethodisfunctionallysimilartofindAndModifythatwehavewiththeMongoclientlibrary.Itwillatomicallyfindandmodifythedocument.Ifthequerymatchesmorethanonedocument,onlythefirstmatchwillbeupdated.Thefirsttwoparametersofthismethodarethequeryandtheupdatetoexecute.Thenextparameteriseithertheentityclassorthecollectionnametoexecutetheoperationon.Also,thereisaspecialclasscalledorg.springframework.data.mongodb.core.FindAndModifyOptionsthatmakessenseonlyforthefindAndModifyoperation.Thisinstancetellsuswhetherwearelookingforthenewinstanceortheoldinstanceaftertheoperationisperformed,andwhethertheupsertoperationistobeperformedanditisrelevantonlyifthedocumentwiththematchingquerydoesn’texist.ThereisanadditionalBooleanflagtotelltheclientwhetherthisisafindandremoveoperation.Infact,thefindAndRemoveoperationwesawearlierisjustaconveniencefunctionthatdelegatestofindAndModifywiththisremoveflagset.
Intheprecedingtable,wementionedtheQueryandUpdateclasseswhentalkingaboutupdate.Thesearespecialconvenienceclassesinspring-data-mongodb;theyletusbuildMongoDBqueriesusingasyntaxthatiseasytounderstandandimprovesreadability.Forinstance,thequerytocheckwhetherlastNameisJohnsonintheMongoquerylanguageis{'lastName':'Johnson'}.Thesamequerycanbeconstructedinspring-data-mongodbasfollows:
newQuery(Criteria.where("lastName").is("Johnson"))
ThissyntaxlooksneatascomparedtogivingthequeryinJSON.Let’stakeanotherexamplewherewewanttofindallfemalesunder30yearsofageinourdatabase.Thequerywillnowbebuiltasfollows:
newQuery(Criteria.where("age").lt(30).and("gender").is("Female"))
Similarly,forupdate,wewanttosetayoungCustomerBooleanflagtobetrueforsomeofthecustomers,basedonsomeconditions.Tosetthisflaginthedocument,theMongoDBformatwillbeasfollows:
{'$set':{'youngCustomer':true}}
Inspring-data-mongodb,thesamewillbeachievedasfollows:
newUpdate().set("youngCustomer",true)
RefertotheJavadocforallthepossiblemethodsthatareavailabletobuildthequeryandupdatesinspring-data-mongodbthataretobeusedwithMongoTemplate.
ThemethodsmentionedearlierarebynomeanstheonlyonesavailableintheMongoTemplateclass.Therearealotofothermethodsforgeospatialindexes,conveniencemethodstogetthecountofdocumentsinthecollection,aggregation,andMapReducesupport,andsoon.RefertotheJavadocofMongoTemplateformoredetailsandmethods.
Speakingofaggregation,wealsohaveatestcasemethodcalledaggregationTesttoperformtheaggregationoperationonthecollection.WehaveapostalCodescollectioninMongoDB;thiscollectioncontainsthepostalcodedetailsofvariouscities.Anexampledocumentinthecollectionisasfollows:
{
"_id":ObjectId("539743b26412fd18f3510f1b"),
"postOfficeName":"ASDMelloRoadFullerMarg",
"pincode":400001,
"districtsName":"Mumbai",
"city":"Mumbai",
"state":"Maharashtra"
}
Ouraggregationoperationintendstofindthetopfivestatesbythenumberofdocumentsinthecollection.InMongo,theaggregationpipelinewilllookasfollows:
[
{'$project':{'state':1,'_id':0}},
{'$group':{'_id':'$state','count':{'$sum':1}}}
{'$sort':{'count':-1}},
{'$limit':5}
]
Inspring-data-mongodb,weinvokedtheaggregationoperationusingtheMongoTemplateclassasfollows:
Aggregationaggregation=newAggregation(
project("state","_id"),
group("state").count().as("count"),
sort(Direction.DESC,"count"),
limit(5)
);
AggregationResults<DBObject>results=mongoTemplate.aggregate(
aggregation,
"postalCodes",
DBObject.class);
Thekeyisincreatinganinstanceoftheorg.springframework.data.mongodb.core.aggregation.Aggregationclass.ThenewAggregationmethodisstaticallyimportedfromthesameclassandacceptsvarargsfordifferentinstancesoforg.springframework.data.mongodb.core.aggregation.AggregationOperationthatcorrespondtotheoneoperationinthechain.TheAggregationclasshasvariousstaticmethodstocreatetheinstancesofAggregationOperation.Wehaveusedafewofthem,suchasproject,group,sort,andlimit.Formoredetailsandavailablemethods,refertotheJavadoc.TheaggregatemethodinMongoTemplatetakesthreearguments.ThefirstoneistheinstanceoftheAggregationclass,thesecondoneisthenameofthecollection,andthethirdoneisthereturntypeoftheaggregationresult.Refertotheaggregationoperationtestcaseformoredetails.
SeealsoTheJavadocathttp://docs.spring.io/spring-data/mongodb/docs/current/api/formoredetailsandAPIdocumentationThereferencemanualforthespring-data-mongodbprojectathttp://docs.spring.io/spring-data/data-mongodb/docs/current/reference/
AccessingMongoDBusingJavaPersistenceAPIInthisrecipe,wewilluseaJPAproviderthatallowsustouseJPAentitiestoachieveobject-to-documentmappingwithMongoDB.
GettingreadyStartthestandaloneserverinstancethatlistenstoport27017.ThisisaJavaprojectusingJPA.FamiliaritywithJPAanditsannotationsisexpected,thoughwhatwewillbelookingatisfairlybasic.RefertotheConnectingtoasinglenodefromaJavaclientrecipeinChapter1,InstallingandStartingtheMongoDBServer,toknowhowtosetupMavenifyouarenotawareofit.DownloadtheDataNucleusMongoJPAprojectfromthecodebundleprovidedwiththebook.Thoughwewillexecutethetestcasesfromthecommandprompt,youmayimporttheprojectinyourfavoriteIDEtoviewthesourcecode.
Howtodoit…1. GototherootdirectoryoftheDataNucleusMongoJPAprojectandexecutethe
followingcommandintheshell:
$mvncleantest
2. Thiswilldownloadthenecessaryartifactsneededtobuildandruntheproject;then,executethetestcasessuccessfully.
3. Oncethetestcasesgetexecuted,openaMongoshellandconnecttothelocalinstance.
4. Executethefollowingqueryintheshell:
>usetest
>db.personJPA.find().pretty()
Howitworks…First,let’slookatasampledocumentthatgotcreatedinthepersonJPAcollection:
{
"_id":NumberLong(2),
"residentialAddress":{
"residentialAddress_zipCode":"400101",
"residentialAddress_state":"Maharashtra",
"residentialAddress_country":"India",
"residentialAddress_city":"Mumbai",
"residentialAddress_addressLineOne":"20,Centralstreet"
},
"lastName":"Sharma",
"gender":"Male",
"firstName":"Amit",
"age":25
}
Thestepsweexecutedareprettysimple;let’slookattheclassesusedonebyone.Wewillstartwiththecom.packtpub.mongo.cookbook.domain.Personclass.Atthetopoftheclass(afterthepackageandimports),wehavethefollowing:
@Entity
@Table(name="personJPA")
publicclassPerson{
ThisdenotesthatthePersonclassisanentity,andthecollectiontowhichitwillpersistispersonJPA.NotethatJPAwasdesignedprimarilyasanORMtool;thus,theterminologiesusedaremoreforarelationaldatabase.AtableinRDBMSissynonymouswithacollectioninMongoDB.Therestoftheclasscontainstheattributesofthepersonandthecolumnsannotatedwith@Columnand@Idfortheprimarykey.ThesearesimpleJPAannotations.Whatisinterestingtolookatisthecom.packtpub.mongo.cookbook.domain.ResidentialAddressclass,whichisstoredastheresidentialAddressvariableinthePersonclass.Ifwelookatthepersondocumentwegaveearlier,allthevaluesgiveninthe@Columnannotationarethenamesofthekeysforperson.Also,noticehowtheenumtoogetsconvertedtothestringvalue.However,theresidentialAddressfieldisthenameofthevariableinthePersonclassagainstwhichtheaddressinstanceisstored.IfwelookattheResidentialAddressclass,wewillseethe@Embeddableannotationontop,abovetheclassname.ThisisagainaJPAannotationthatdenotesthatthisinstanceisnotanentityuntoitselfbutisembeddedinanotherentityoranotherembeddableclass.Notethenamesofthefieldsinthedocument.Theyhavethefollowingformat:
<nameofthevariableinpersonclass>_<valueofthevariablenamein
ResidentialAddressclass>
Thereisoneproblemwenoticehere.Thenamesofthefieldsaretoolong,thusconsumingunnecessaryspace.Thesolutionistohaveashortervalueinthe@Columnannotation.Forinstance,havethefollowingannotation:
@Column(name="ln")insteadof@Column(name="lastName")
Thiswillcreatethekeywiththenamelninthedocument.Unfortunately,thisdoesn’tworkwiththeembeddedResidentialAddressclass;inthiscaseyouwillhavetodealwithshortervariablenames.Nowthatwehaveseentheentityclasses,let’slookatthepersistence.xmlfile:
<persistence-unitname="DataNucleusMongo">
<class>com.packtpub.mongo.cookbook.domain.Person</class>
<properties>
<propertyname="javax.persistence.jdbc.url"
value="mongodb:localhost:27017/test"/>
</properties>
</persistence-unit>
Wehavejustgotthepersistence-unitdefinitionhere,withthenameasDataNucleusMongo.Thereisoneclassnode,whichistheentitythatwewilluse.Notethattheembeddedaddressclassisnotmentionedhereasitisnotanindependententity.Intheproperties,wementionedtheURLofthedatastoretoconnectto.Inthiscase,weconnectedtotheinstanceonthelocalhost,whichisport27017,andthetestdatabase.
Now,let’slookattheclassthatqueriesandinsertsthedata.Thisisourtestclass,com.packtpub.mongo.cookbook.DataNucleusJPATest.Wewillcreatejavax.persistence.EntityManagerFactoryasPersistence.createEntityManagerFactory("DataNucleusMongo").Thisisathread-safeclass,anditsinstanceissharedacrossthreads.Also,thestringargumentisthesameasthenameofthepersistenceunitweusedinpersistence.xml.Allotherinvocationsonjavax.persistence.EntityManagertopersistorquerythecollectionrequireustocreateaninstanceusingEntityManagerFactory,useit,andthencloseitoncetheoperationiscomplete.AlltheoperationsperformedareaspertheJPAspecification.Thetestcaseclasspersistsentitiesandalsoqueriesthem.
Finally,wewilllookatpom.xml,particularly,theenhancerpluginweused;itisasfollows:
<plugin>
<groupId>org.datanucleus</groupId>
<artifactId>datanucleus-maven-plugin</artifactId>
<version>4.0.0-release</version>
<configuration>
<log4jConfiguration>${basedir}/src/main/resources/log4j.properties</log4jCo
nfiguration>
<verbose>true</verbose>
</configuration>
<executions>
<execution>
<phase>process-classes</phase>
<goals>
<goal>enhance</goal>
</goals>
</execution>
</executions>
</plugin>
TheentitieswewroteneedtobeenhancedtobeusedasJPAentitiesthatuseadatanucleus.Thisprecedingpluginwillbeattachedtotheprocess-classesphaseandthencalltheplugin’senhance.
Seealsohttp://www.datanucleus.org/products/datanucleus/jdo/enhancer.htmlforpossibleoptions.ThereisalsoapluginforEclipsetoallowentityclassestobeenhanced/instrumentedforadatanucleus.TheJPA2.1specificationathttps://www.jcp.org/aboutJava/communityprocess/final/jsr338/index.html.
AccessingMongoDBoverRESTInthisrecipe,wewillseehowtoaccessMongoDBandperformCRUDoperationsusingRESTAPIs.Wewillusespring-data-restforRESTaccessandspring-data-mongodbtoperformtheCRUDoperations.Beforeyoucontinuewiththisrecipe,itisimportanttoknowhowtoimplementCRUDrepositoriesusingspring-data-mongodb.RefertotheDevelopingusingspring-data-mongodbrecipetoknowhowtousethisframework.
Thequestionthatonemusthaveis,whyaRESTAPIisneeded?Therearescenarioswherethereisadatabasethatisbeingsharedbymanyapplications,possiblywrittenindifferentlanguages.WritingJPADAOorusingspring-data-mongodbisgoodenoughforJavaclients,butnotforclientsinotherlanguages.HavingAPIslocallywiththeapplicationdoesn’tevengiveusacentralizedwaytoaccessthedatabase.ThisiswhereRESTAPIscomeintoplay.Wecandeveloptheserver-sidedataaccesslayer,whichistheCRUDrepositoryinJava(spring-data-mongodbtobeprecise),andthenexposeitoveraRESTinterfaceforaclientwritteninanylanguagetoinvokeit.Now,wewillinvokeourAPIinaplatform-independentwayandthiswillalsogiveusasinglepointofentryintoourdatabase.
GettingreadyApartfromtheprerequisitesoftheDevelopingusingspring-data-mongodbrecipe,wehaveafewmoreforthisrecipe.ThefirstthingistodownloadtheSpringDataRestTestprojectfromthebook’swebsiteandimportitintoyourIDEasaMavenproject.Alternatively,ifyoudonotwishtoimportintotheIDE,youcanruntheserverthatservicestherequestsfromthecommandprompt,whichwewillseeinthenextsection.ThereisnospecificclientapplicationusedtoperformtheCRUDoperationsoverREST.IwilldemonstratetheconceptsusingtheChromebrowseranduseaspecialpluginofthebrowsercalledAdvancedRESTClienttosendHTTPPOSTrequeststotheserver.ThetoolscanbefoundundertheDeveloperToolssectionontheChromewebstore.
Howtodoit…1. IfyouhaveimportedtheprojectinyourIDEasaMavenproject,executethe
com.packtpub.mongo.cookbook.rest.RestServerclass,whichisthebootstrapclass,andlocallystarttheserverthatwillacceptclientconnections.
2. IftheprojectistobeexecutedfromthecommandpromptasaMavenproject,gototherootdirectoryoftheprojectandrunthefollowingcommand:
mvnspring-boot:run
3. Thefollowingoutputwillbeseeninthecommandpromptifallgoeswellandtheserverisstarted:
[INFO]Attachingagents:[]
4. Afterstartingtheserverineitherway,enterhttp://localhost:8080/peopleinthebrowser’saddressbar,andwewillseethefollowingJSONresponse.Thefollowingresponseisseenbecausetheunderlyingcollection,person,isempty:
{
"_links":{
"self":{
"href":"http://localhost:8080/people{?page,size,sort}",
"templated":true
},
"search":{
"href":"http://localhost:8080/people/search"
}
},
"page":{
"size":20,
"totalElements":0,
"totalPages":0,
"number":0
}
}
5. WewillnowinsertanewdocumentinthepersoncollectionusingtheHTTPPOSTrequesttohttp://localhost:8080/people.WewillsendaPOSTrequesttotheserverusingtheAdvancedRESTClientchromeextension.Thedocumentpostedisasfollows:
{"lastName":"Cruise","firstName":"Tom","age":52,"id":1}
Therequest’scontenttypeisapplication/json
6. ThefollowingscreenshotshowsthePOSTrequestsenttotheserverandtheresponsefromtheserver:
7. Wewillnowquerythisdocumentfromthebrowserusingthe_idfield,whichis1inthiscase.Enterhttp://localhost:8080/people/1inthebrowser’saddressbar.Youwillseethedocumentweinsertedinstep5.
8. Nowthatwehaveonedocumentinthecollection(youmighttrytoinsertmoredocumentsforpeoplewithdifferentnamesand,moreimportantly,auniqueID),wewillquerythedocumentusingthelastname.However,firsttypeinhttp://localhost:8080/people/searchinthebrowser’saddressbartoviewallthesearchoptionsavailable.Wewillseeonesearchmethod,findByLastName,thatacceptsacommand-lineparameter,lastName.
9. Tosearchbythelastname,Cruiseinourcase,enterhttp://localhost:8080/people/search/findByLastName?lastName=Cruiseinthebrowser’saddressbar.
10. WewillnowupdatethelastnameandageofthepersonwithID1;TomCruiseitisfornow.Let’supdatethelastnametoHanksandageto58.Todothis,wewillusetheHTTPPATCHrequest,andtherequestwillbesenttohttp://localhost:8080/people/1,whichuniquelyidentifiesthedocumenttoupdate.ThebodyoftheHTTPPATCHrequestis{"lastName":"Hanks","age":58}.Refertothefollowingscreenshotfortherequestwesentoutforupdate:
11. Tovalidatewhetherourupdatewentthroughsuccessfullyornot(weknowitdidaswegotaresponsestatus204afterthePATCHrequest),enterhttp://localhost:8080/people/1againinthebrowser’saddressbar.
12. Finally,wewilldeletethedocument.Thisisstraightforward,andwewillsimplysendaDELETErequesttohttp://localhost:8080/people/1.OncetheDELETErequestissuccessful,sendanHTTPGETrequestfromthebrowsertohttp://localhost:8080/people/1,andwewillnotgetanydocumentinreturn.
Howitworks…Wewillnotbereiteratingthespring-data-mongodbconceptsinthisrecipe,butwewilllookatsomeoftheannotationsweaddedspecificallyfortheRESTinterfacetotherepositoryclass.Thefirstoneisontopoftheclassnameasfollows:
@RepositoryRestResource(path="people")
publicinterfacePersonRepositoryextends
PagingAndSortingRepository<Person,Integer>{
ThisisusedtoinstructtheserverthatthisCRUDrepositorycanbeaccessedusingthepeopleresource.ThisisthereasonwhywealwaysmakeHTTPGETandPOSTrequestsonhttp://localhost:8080/people/.
ThesecondannotationisinthefindByLastNamemethod.Wehavethefollowingmethodsignature:
PersonfindByLastName(@Param("lastName")StringlastName);
Here,thelastNamemethodparameterisannotatedwiththe@Paramannotation,whichisusedtoannotatethenameoftheparameterthatwillhavethevalueofthelastNameparameterthatwillbepassedwhileinvokingthismethodontherepository.Ifwelookatstep9intheprevioussection,wewillseethatfindByLastNameisinvokedusinganHTTPGETrequest,andthevalueoftheURLparameter,lastName,isusedasthestringvaluepassedwhileinvokingtherepositorymethod.
Ourexamplehereisprettysimplewithjustoneparameterusedforthesearchoperation.Wecanhavemultipleparametersfortherepositorymethodand,accordingly,anequalnumberofparametersintheHTTPrequest,whichwillbemappedtotheseparametersforthemethodtobeinvokedontheCRUDrepository.Forsomedatatypesuchasdatestobesentout,usethe@DateTimeFormatannotation,whichwillbeusedtospecifythedateandtimeformat.Formoreinformationonthisannotationanditsusage,refertotheSpringJavadocathttp://docs.spring.io/spring/docs/current/javadoc-api/.
ThatwasallabouttheGETrequestwemaketotheRESTinterfacetoqueryandsearchdata.Initially,wecreateddocumentdatasendinganHTTPPOSTrequesttotheserver.Tocreatenewdocuments,wewillalwayssendaPOSTrequestwiththedocumenttobecreatedasabodyoftherequesttotheURLthatidentifiestheRESTendpoint,inourcase,http://localhost:8080/people/.AlldocumentspostedtothiscollectionwillmakeuseofPersonRepositorytopersistapersoninthecorrespondingcollection.
Ourfinalthreestepsweretoupdateanddeletetheperson.TheHTTPrequesttypestoperformtheseoperationsarePATCHandDELETE,respectively.Instep10,weupdatedthedocumentfortheperson,TomCruise,andupdatedhislastnameandage.Toachievethis,ourPATCHrequestwassenttoaURLhttp://localhost:8080/people/1;thisURLidentifiesaspecificpersoninstance.Notethat,whenwewantedtocreateanewperson,ourPOSTrequestwasalwayssenttohttp://localhost:8080/peopleasagainstthePATCHandDELETErequests,wherewesenttheHTTPrequesttoaURLthatrepresentsthespecificpersonwewanttoupdateordelete.Inthecaseofupdate,thebodyofthePATCH
requestisaJSONdocumentwhoseprovidedfieldswillreplacethecorrespondingfieldsinthetargetdocumenttoupdate.Alltheotherfieldswillbeleftastheyare.Inourcase,lastNameandageofthetargetdocumentwereupdated,andfirstNamewasleftuntouched.Inthecaseofdelete,themessagebodywasnotempty,andtheDELETErequestitselfinstructsthatthetargettowhichtherequestwassentshouldbedeleted.
YoumightalsosendaPUTrequestinsteadofPATCHtoaURLthatidentifiesaspecificperson;inthiscase,theentiredocumentinthecollectionwillgetupdatedorreplacedwiththedocumentprovidedaspartofthePUTrequest.
SeealsoThespring-data-resthomepageathttp://projects.spring.io/spring-data-rest/,whereyoucanfindlinkstoitsGitrepository,referencemanual,andJavadocURLs
InstallingtheGUI-basedclient,MongoVUE,forMongoDBInthisrecipe,wewilllookataGUI-basedclientforMongoDB.Throughoutthebook,wehaveusedtheMongoshelltoperformvariousoperationsweneed.Itsadvantagesareasfollows:
ItcomespackagedwiththeMongoDBinstallationBeinglightweight,youneednotworryaboutittakingupyoursystem’sresourcesOnserverswhereGUI-basedinterfacesarenotpresent,theshellistheonlyoptiontoconnect,query,andadministertheserverinstance
Havingsaidthis,ifyouarenotonaserverandwanttoconnecttoadatabaseinstancetoquery,viewtheplanofaquery,administer,andsoon,itisnicetohaveaGUIwiththesefeaturestoletyoudothingsataclickofabutton.Asadeveloper,wealwaysqueryourrelationaldatabasewithaGUI-basedthickclient;sowhynotdothesameforMongoDB?
Inthisrecipe,wewillseehowtoinstallsomefeaturesofaMongoDBclient,MongoVUE.ThisclientisonlyavailableonWindowsmachines.Thisproducthasbothapaidversion(withvariouslevelsoflicensingpernumberofusers)andafreeversionthathassomelimitations.Forthisrecipe,we’lllookatthefreeversion.
GettingreadyForthisrecipe,thefollowingstepsarenecessary:
1. StartasingleinstanceoftheMongoDBserver.Theportonwhichtheconnectionsareacceptedwillbethedefaultone,27017.
2. ImportthefollowingtwocollectionsfromthecommandpromptaftertheMongoDBserverisstarted:
$mongoimport--typejsonpersonTwo.json-cpersonTwo-dtest--drop
$mongoimport--typecsv-cpostalCodes-dtestpincodes.csv--
headerline--drop
Howtodoit…1. DownloadtheinstallerZIPfortheMongoVUEfrom
http://www.mongovue.com/downloads/.Oncedownloaded,itisamatterofafewclicksandthesoftwaregetsinstalled.
2. Opentheinstalledapplication.Asthisisafreeversion,wewillhaveallthefeaturesavailableforthefirst14days,afterwhichsomeofthefeatureswillnotbeavailable.Thedetailsofthiscanbeseenathttp://www.mongovue.com/purchase/.
3. Thefirstthingwewilldoisaddadatabaseconnectionasfollows:
1. Oncethefollowingwindowisopened,clickonthe+buttontoaddanewconnection.
2. Onceopened,wewillgetanotherwindowinwhichwewillfillintheserver-connectiondetails.FillinthefollowingdetailsinthenewwindowandclickonTest.Thisshouldsucceediftheconnectionworks.Finally,clickonSave,asshowninthefollowingscreenshot:
3. Onceadded,connecttotheinstance.
4. Intheleft-hand-sidenavigationpanel,wewillseetheinstancesaddedandthedatabasesinthem,asshowninthefollowingscreenshot:
Asweseeintheprecedingscreenshot,hoveringthemouseoverthenameofthecollectionshowsusthesizeandcountofthedocumentsinthecollection
5. Let’sseehowtoqueryacollectionandgetallthedocuments.WewillusethepostalCodescollectionforourtest.Right-clickonthecollectionnameandthenclickonView.WewillseethecontentsofthecollectionshownasTreeView,wherewecanexpandandseethecontents;TableView,whichshowsthecontentsinatabulargrid;orTextView,whichshowsthecontentsasnormalJSONtext.
6. Let’sseewhathappenswhenwequeryacollectionwithnesteddocuments;personTwoisacollectionwiththefollowingsampledocumentinit:
{
"_id":1,
"_class":"com.packtpub.mongo.cookbook.domain.Person2",
"firstName":"Steve",
"lastName":"Johnson",
"age":30,
"gender":"Male",
"residentialAddress":{
"addressLineOne":"20,Centralstreet",
"city":"Sydney",
"state":"NSW",
"country":"Australia"
}
}
7. Whenwequerytoseeallthedocumentsinthecollection,wewillseethefollowingscreenshot:
8. TheresidentialAddresscolumnshowsthatthevalueisanesteddocumentwiththegivennumberoffieldspresentinit.Hoveringyourmouseoveritshowsthenesteddocument.Alternatively,youcanclickonthecolumntoshowthecontentsinthisdocumentagainasagrid.Oncethenesteddocument(s)areshown,youcanclickonthetopofthegridtocomebackonelevel.
Let’sseehowtowritequeriestoretrievetheselecteddocuments:
1. Right-clickonthepostalCodescollectionandclickonFind.Wewilltypethefollowingqueryinthe{Find}textboxandinthe{Sort}field,andclickontheFindbutton:
2. Wecanchoosefromthetabthetypeofviewwewant,suchasTreeView,TableView,orTextView.Theplanofthequeryisalsoshown.Wheneveranyoperationisrun,theLearnshellatthebottomshowstheactualMongoqueryexecuted.Inthiscase,wewillseethefollowingquery:
[11:17:07PM]
db.postalCodes.find({"city":/Mumbai/i}).limit(50);
db.postalCodes.find({"city":/Mumbai/i}).limit(50).explain();
3. Theplanofaqueryisalsoshowneverytimeand,asofthecurrentversion1.6.9.0,thereisnowaytodisablethesettingthatshowsthequeryplanwiththequery.
4. IntheTreeViewtab,right-clickingonadocumentwillgiveyoumoreoptionssuchasexpandingit,copyingtheJSONcontents,addingkeystothisdocument,andremovingthedocument.Trytoremoveadocumentfromthiscollectionwitharight-clickandalsotryaddinganyadditionalkeystothedocument.
5. YoumightchoosetorestorethedocumentsbyreimportingthedatafromthepostalCodescollection.
Toinsertadocumentinthecollection,performthefollowingsteps.WewillinsertadocumentinthepersonTwocollection.
1. Right-clickonthepersonTwocollectionnameandclickonInsert/ImportDocuments…,asshowninthefollowingscreenshot:
2. Anotherpop-upwindowwillcomeupwhereyoucanchoosetoenterasingleJSONdocumentoravalidtextfilewithJSONdocumentstobeimported.Wewillimportthefollowingdocumentbyimportingasingledocument:
{
"_id":4,
"firstName":"Jack",
"lastName":"Jones",
"age":35,
"gender":"Male"
}
3. Querythecollectiononcethedocumentisimportedsuccessfully;wewillseethenewlyimporteddocumentalongwiththeoldones.
Let’sseehowtoupdatethedocument:
1. Youcaneitherright-clickonthecollectionnameontheleft-handsideandclickonUpdateorselecttheUpdateoptionatthetop.Ineithercase,wewillhavethefollowingwindow.Here,wewillupdatetheageofthepersonweinsertedinthepreviousstepasfollows:
SomethingstonoteinthisGUIarethequerytextboxontheleft-handsidetofindthedocumenttobeupdated,andtheupdateJSONontheright-handside,whichwillbeappliedtotheselecteddocument(s).
2. Beforeyouupdate,youmightchoosetohittheCountbuttontoseethenumberofdocumentsthatcanbeupdated(inthiscase,one).OnclickingonFind,wecanseethedocumentsinthetreeform.Ontheright-handside,belowtheupdateJSONtext,wehavetheoptiontoupdateonedocumentandmultipledocumentsbyclickingonUpdate1orUpdateAll,respectively.
YoumightchoosewhethertheUpsertoperationistobedoneornotifthedocument(s)forthegivenfindconditionarenotfoundTheradiobuttonsinthebottom-rightcornerofthescreen,asshownintheprecedingscreenshot,eithershowtheoutputofthegetLastErroroperationortheresultafterupdate;inthiscase,aquerywillbeexecutedtofindthedocument(s)updatedHowever,thefindqueryisnotfoolproofandmightreturndifferentresultsfromthosetrulyupdatedasaseparatequery,thesameasinthe{Find}textbox,andtheupdateandfindoperationsarenotatomic
Wehavequeriedonsmallcollectionssofar.Asthesizeofthecollectionincreases,queriesthatperformfullcollectionscansarenotacceptable,andweneedtocreateindexesasfollows:
1. TocreateanindexbylastNameintheascendingorderandageinthedescendingorderforinstance,wewillinvokedb.personTwo.ensureIndex({'lastName':1,'age':-1}).
2. UsingMongoVUE,thereisawaytovisuallycreateanindexbyright-clickingonthecollectionnameontheleft-handsideofthescreenandselectingAddIndex….
3. Inthenewpop-upwindow,enterthenameoftheindexandselecttheVisualtab,asshowninthefollowingscreenshot.SelectthelastNameandagefieldswiththe
Ascending(1)andDescending(-1)values,respectively.
4. Oncetheprecedingdetailsarefilledin,clickonCreate.ThiswillcreatetheindexforusbyfiringtheensureIndexcommand,aswecanseeintheLearnShellbelow.
5. Youcanoptfortheindextobeuniqueanddropduplicates(whichwillbeenabledwhenUniqueisselected)orevencreatebig,long-runningindexcreationsinthebackground.NotetheJsontabnexttotheVisualtab.ThatistheplacewhereyoucantypeintheensureIndexcommandasyoudofromtheshelltocreatetheindex.
Now,wewillseehowtodropanindex:
1. Simplyexpandthetreeontheleft-handside.2. Onexpandingthecollection,wewillseealltheindexescreatedonit.3. Exceptforthedefaultindexonthe_idfield,allotherindexescanbedropped.4. Simplyright-clickonthenameandselectDropindextodroporclickonProperties
toviewitsproperties.
AfterseeinghowtodothebasicCRUDoperationsandaftercreatingtheindex,let’slookathowtoexecutetheaggregationoperations.
1. Therearenovisualtoolssuchasindexcreationincaseofaggregationbutsimplyatextareawherewecanenterouraggregationpipeline.Inthefollowingsample,wewillperformaggregationonthepostalCodescollectiontofindthetopfivestatesbythenumberoftimestheyappearinthecollection:
{'$project':{'state':1,'_id':0}},
{'$group':{'_id':'$state','count':{'$sum':1}}},
{'$sort':{'count':-1}},
{'$limit':5}
2. Wewillhavethefollowingaggregationpipelineentered:
3. Oncethepipelineisentered,hittheAggregatebuttontogettheaggregationresults.
ExecutingMapReduceisevencooler.Theusecasethatwewillexecuteissimilartotheoneweusedearlier,butwewillseehowtoimplementtheMapReduceoperationusingMongoVUE:
1. ToexecuteaMapReducejob,right-clickonthecollectionnameintheleft-hand-sidemenuandclickonMapReduce.
2. ThisoptionisrightabovetheAggregateoption,asseeninthepreviousscreenshot.ThisgivesusaprettyneatGUItoentertheMap,Reduce,Finalize,andIn&Outoptions,asshowninthefollowingscreenshot:
3. TheMapfunctionissimplyasfollows:
functionMap(){
emit(this.state,1)
}
4. TheReducefunctionisasfollows:
functionReduce(key,values){
returnArray.sum(values)
}
5. LeavetheFinalizemethodunimplementedandintheIn&Outsection,fillinthefollowingdetails:
6. ClickonGotostartexecutingtheMapReducejob.7. WewillhavetheoutputtothemongoVue_mrcollection.QuerythemongoVue_mr
collectionusingthefollowingquery:
>db.mongoVue_mr.find().sort({value:-1}).limit(5)
8. Verifytheresultsagainstthosewegotusingaggregation.TheformatofMapReducewaschosenasReduce.Formoreoptionsandtheirbehavior,visithttp://docs.mongodb.org/manual/reference/command/mapReduce/#mapreduce-out-cmd.
MonitoringtheserverinstancesisnoweasilypossibleusingMongoVUE.Todothis,performthefollowingsteps:
1. Tomonitoraninstance,navigatetoTools|Monitoringinthetopmenu.2. Bydefault,noserverwillbeadded,andwewillhavetoclickon+AddServerto
addaserverinstance.3. Selectthelocalinstanceaddedoranyserveryouwanttomonitorandclickon
Connect.4. Wewillseequitealotofmonitoringdetails.MongoVUEusesthedb.serverStatus
commandtoservethesestats.Thus,tolimitthefrequencyatwhichweexecutethiscommandonbusyserverinstances,wecanchooseRefreshIntervalatthetopofthescreen,asshowninthefollowingscreenshot:
Howitworks…Whatwecoveredintheprevioussectionswasprettystraightforward;usingthisinformation,wecanperformthemajorityofouractivitiesasadeveloperandadministrator.
SeealsoChapter4,AdministrationChapter6,MonitoringandBackupshttp://www.mongovue.com/tutorials/forvarioustutorialsonMongoVUE
AppendixA.ConceptsforReferenceThisappendixcontainssomeadditionalinformationthatwillhelpyouunderstandtherecipesbetter.Wewilldiscusswriteconcernandreadpreferenceinasmuchdetailaspossible.
WriteconcernanditssignificanceWriteconcernistheminimumguaranteethattheMongoDBserverprovideswithrespecttothewriteoperationdonebytheclient.Therearevariouslevelsofwriteconcernthataresetbytheclientapplication,togetaguaranteefromtheserverthatacertainstagewillbereachedinthewriteprocessontheserverside.
Thestrongertherequirementforaguarantee,thegreaterthetimetaken(potentially)togetaresponsefromtheserver.Withwriteconcern,wedon’talwaysneedtogetanacknowledgementfromtheserveraboutthewriteoperationbeingcompletelysuccessful.Forsomelesscrucialdatasuchaslogs,wemightbemoreinterestedinsendingmorewritespersecondoveraconnection.Ontheotherhand,whenwearelookingtoupdatesensitiveinformation,suchascustomerdetails,wewanttobesureofthewritebeingsuccessful(consistentanddurable);dataintegrityiscrucialandtakesprecedenceoverthespeedofthewrites.
Anextremelyusefulfeatureofwriteconcernistheabilitytocompromisebetweenoneofthefactors:thespeedofwriteoperationsandtheconsistencyofthedatawritten,onacase-to-casebasis.However,itneedsadeepunderstandingoftheimplicationsofsettingupaparticularwriteconcern.Thefollowingdiagramrunsfromtheleftandgoestotheright,andshowstheincreasinglevelofwriteguarantees:
AswemovefromItoIV,theguaranteefortheperformedwritegetsstrongerandstronger,butthetimetakentoexecutethewriteoperationislongerfromaclient’sperspective.AllwriteconcernsareexpressedhereasJSONobjects,usingthreedifferentkeys,namely,w,j,andfsync.Additionally,anotherkeycalledwtimeoutisusedtoprovidetimeoutvaluesforthewriteoperation.Let’sseethethreekeysindetail:
w:Thisisusedtoindicatewhethertowaitfortheserver’sacknowledgementornot,
whethertoreportwriteerrorsduetodataissuesornot,andaboutthedatabeingreplicatedtosecondary.Itsvalueisusuallyanumberandaspecialcasewherethevaluecanbemajority,whichwewillseelater.j:ThisisrelatedtojournalinganditsvaluecanbeaBoolean(true/falseor1/0).fsync:ThisisaBooleanvalueandisrelatedtowhetherthewriteshouldwaittillthedataisflushedtodiskornotbeforeresponding.wtimeout:Thisspecifiesthetimeoutforwriteoperations,wherebythedriverthrowsanexceptiontotheclientiftheserverdoesn’trespondbackinsecondswithintheprovidedtime.Wewillseetheoptioninsomedetailsoon.
InpartI,whichwehavedemarcatedtilldriver,wehavetwowriteconcerns,namely,{w:-1}and{w:0}.Boththesewriteconcernsarecommon,inasensethattheyneitherwaitfortheserver’sacknowledgementuponreceivingthewriteoperation,nordotheyreportanyexceptionontheserversidecausedbyuniqueindexviolation.Theclientwillgetanokresponseandwilldiscoverthewritefailureonlywhentheyquerythedatabaseatsomelaterpointoftimeandfindthedatamissing.Thedifferenceisinthewayboththeserespondonthenetworkerror.Whenweset{w:-1},theoperationdoesn’tfailandawriteresponseisreceivedbytheuser.However,itwillcontainaresponsestatingthatanetworkerrorpreventedthewriteoperationfromsucceedingandnoretriesforwritemustbeattempted.Ontheotherhand,with{w:0},ifanetworkerroroccurs,thedrivermightchoosetoretrytheoperationandthrowanexceptiontotheclientifthewritefailsduetonetworkerror.Boththesewriteconcernsgiveaquickresponsebacktotheinvokingclientatthecostofdataconsistency.Thesewriteconcernsareokforusecasessuchaslogging,whereoccasionallogwritemissesarefine.InolderversionsofMongoDB,{w:0}wasthedefaultwriteconcernifnonewasmentionedbytheinvokingclient.Atthetimeofwritingthisbook,thishaschangedto{w:1}bydefaultandtheoption{w:0}isdeprecated.
InpartIIofthediagram,whichfallsbetweenthedriverandtheserver,thewriteconcernwearetalkingaboutis{w:1}.Thedriverwaitsforanacknowledgementfromtheserverforthewriteoperationtocomplete.Notethattheserverrespondingdoesn’tmeanthatthewriteoperationwasmadedurable.Itmeansthatthechangejustgotupdatedintothememory,alltheconstraintswerechecked,andanyexceptionwillbereportedtotheclient,unliketheprevioustwowriteconcernswesaw.Thisisarelativelysafewriteconcernmode,whichwillbefast,butthereisstillaslimchanceofthedatabeinglostifitcrashesinthosefewmillisecondswhenthedatawaswrittentothejournalfromthememory.Formostusecases,thisisagoodoptiontoset.Hence,thisisthedefaultwriteconcernmode.
Movingon,wecometopartIIIofthediagram,whichisfromtheentrypointintotheserverasfarasthejournal.Thewriteconcernwearelookingforhereisat{j:1}or{j:true}.Thiswriteconcernensuresaresponsetotheinvokingclientonlywhenthewriteoperationiswrittentothejournal.Whatisajournalthough?ThisissomethingthatwesawindepthinChapter4,Administration,butfornow,wewilljustlookatamechanismthatensuresthatthewritesaremadedurableandthedataonthediskdoesn’tgetcorruptedintheeventofservercrashes.
Finally,let’scometopartIVofthediagram;thewriteconcernwearetalkingaboutis
{fsync:true}.Thisrequiresthatthedatabeflushedtodisktogetbeforesendingtheresponsebacktotheclient.Inmyopinion,whenjournalingisenabled,thisoperationdoesn’treallyaddanyvalue,asjournalingensuresdatapersistenceevenonservercrash.Onlywhenjournalingisdisableddoesthisoptionensurethatthewriteoperationissuccessfulwhentheclientreceivesasuccessresponse.Ifthedataisreallyimportant,journalingshouldneverbedisabledinthefirstplaceasitalsoensuresthatthedataonthediskdoesn’tgetcorrupted.
Wehaveseensomebasicwriteconcernsforasingle-nodeserverorthoserelevanttotheprimarynodeonlyinareplicaset.
NoteAninterestingthingtodiscussis,whatifwehaveawriteconcernsuchas{w:0,j:true}?Wedonotwaitfortheserver’sacknowledgementandalsoensurethatthewritehasbeenmadetothejournal.Inthiscase,journalingflagtakesprecedenceandtheclientwaitsfortheacknowledgementofthewriteoperation.Oneshouldavoidsettingsuchambiguouswriteconcernstoavoidunpleasantsurprises.
Wewillnowtalkaboutwriteconcernwhenitinvolvessecondarynodesofareplicasetaswell.Let’stakealookatthefollowingdiagram:
Anywriteconcernwithawvaluegreaterthanoneindicatesthatsecondarynodestooneedtoacknowledgebeforesendingaresponseback.Asseenintheprecedingdiagram,whenaprimarynodegetsawriteoperation,itpropagatesthatoperationtoallsecondarynodes.Assoonasitgetsaresponsefromapredeterminednumberofsecondarynodes,itacknowledgestheclientthatthewritehasbeensuccessful.Forexample,whenwehaveawriteconcern{w:3},itmeansthattheclientshouldbesentaresponseonlywhenthreenodesintheclusteracknowledgethewrite.Thesethreenodesincludetheprimarynode.Hence,itisnowdowntotwosecondarynodestorespondbackforasuccessfulwriteoperation.
However,thereisaproblemwithprovidinganumberforthewriteconcern.Weneedto
knowthenumberofnodesintheclusterandaccordinglysetthevalueofw.Alowvaluewillsendanacknowledgementtoafewnodesreplicatingthedata.Avaluetoohighmayunnecessarilyslowtheresponsebacktotheclient,orinsomecases,mightnotsendaresponseatall.Supposeyouhaveathree-nodereplicasetandwehave{w:4}asthewriteconcern,theserverwillnotsendanacknowledgementtillthedataisreplicatedtothreesecondarynodes,whichdonotexistaswehavejusttwosecondarynodes.Thus,theclientwaitsforaverylongtimetohearfromtheserveraboutthewriteoperation.Thereareacoupleofwaystoaddressthisproblem:
Usethewtimeoutkeyandspecifythetimeoutforthewriteconcern.Thiswillensurethatawriteoperationwillnotblockforlongerthanthetimespecified(inmilliseconds)forthewtimeoutfieldofthewriteconcern.Forexample,{w:3,wtimeout:10000}ensuresthatthewriteoperationwillnotblockmorethan10seconds(10,000ms),afterwhichanexceptionwillbethrowntotheclient.InthecaseofJava,aWriteConcernExceptionwillbethrownwiththerootcausemessagestatingthereasonastimeout.Notethatthisexceptiondoesnotrollbackthewriteoperation.Itjustinformstheclientthattheoperationdidnotgetcompletedinthespecifiedamountoftime.Itmightlaterbecompletedontheserverside,sometimeaftertheclientreceivesthetimeoutexception.Itisuptotheapplicationprogramtodealwiththeexceptionandprogrammaticallytakethecorrectivesteps.Themessageforthetimeoutexceptiondoesconveysomeinterestingdetails,whichwewillseeonexecutingthetestprogramforthewriteconcern.Abetterwaytospecifythevalueofw,inthecaseofreplicasets,isbyspecifyingthevalueasmajority.Thiswriteconcernautomaticallyidentifiesthenumberofnodesinareplicasetandsendsanacknowledgementbacktotheclientwhenthedataisreplicatedtoamajorityofnodes.Forexample,ifthewriteconcernis{w:"majority"}andthenumberofnodesinareplicasetisthree,thenmajoritywillbe2.Whereas,atthelaterpointintime,whenwechangethenumberofnodestofive,themajoritywillbe3nodes.Thenumberofnodestoformamajorityautomaticallygetscomputedwhenthewriteconcern’svalueisgivenasmajority.
Now,letusputtheconceptswediscussedintouseandexecuteatestprogramthatwilldemonstratesomeoftheconceptswejustsaw.
SettingupareplicasetTosetupareplicaset,youshouldknowhowtostartthebasicreplicasetwiththreenodes.RefertotheStartingmultipleinstancesaspartofareplicasetrecipeinChapter1,InstallingandStartingtheMongoDBServer.Thisrecipeisbuiltonthatrecipebecauseitneedsanadditionalconfigurationwhilestartingthereplicaset,whichwewilldiscussinthenextsection.Notethatthereplicausedherehasaslightchangeinconfigurationtotheoneyouhaveusedearlier.
Here,wewilluseaJavaprogramtodemonstratevariouswriteconcernsandtheirbehavior.TheConnectingtoasinglenodefromaJavaclientrecipeinChapter1,InstallingandStartingtheMongoDBServer,shouldbevisiteduntilMavenissetup.Thiscanbeabitinconvenientifyouarecomingfromanon-Javabackground.
NoteTheJavaprojectnamedMongoJavaisavailablefordownloadatthebook’swebsite.Ifthesetupiscomplete,youcantesttheprojectjustbyexecutingthefollowingcommand:
mvncompileexec:java-
Dexec.mainClass=com.packtpub.mongo.cookbook.FirstMongoClient
Thecodeforthisprojectisavailablefordownloadatthebook’swebsite.DownloadtheprojectnamedWriteConcernTestandkeepitonalocaldrivereadyforexecution.
So,let’sgetstarted:
1. Preparethefollowingconfigurationfileforthereplicaset.ThisisidenticaltotheconfigfilethatwesawintheStartingmultipleinstancesaspartofareplicasetrecipeinChapter1,InstallingandStartingtheMongoDBServer,wherewesetupthereplicaset,asfollows,withjustonedifference,slaveDelay:5,priority:0:
cfg={
_id:'repSetTest',
members:[
{_id:0,host:'localhost:27000'},
{_id:1,host:'localhost:27001'},
{_id:2,host:'localhost:27002',slaveDelay:5,priority:0}
]
}
2. Usethisconfigtostartathree-nodereplicaset,withonenodelisteningtoport27000.Theotherscanbeanyportsofyourchoice,butstickto27001and27002ifpossible(weneedtoupdatetheconfigaccordinglyifwedecidetouseadifferentportnumber).Also,remembertosetthenameofthereplicasetasreplSetTestforthereplSetcommand-lineoptionwhilestartingthereplicaset.Givesometimetothereplicasettocomeupbeforegoingaheadwithnextstep.
3. Atthispoint,thereplicasetwiththeearliermentionedspecificationsshouldbeupandrunning.WewillnowexecutethetestcodeprovidedinJava,toobservesomeinterestingfactsandbehaviorsofdifferentwriteconcerns.NotethatthisprogramalsotriestoconnecttoaportwherenoMongoprocessislisteningforconnections.
Theportchosenis20000;ensurethatbeforerunningthecode,noserverisupandrunningandlisteningtoport20000.
4. GototherootdirectoryoftheWriteConcernTestprojectandexecutethefollowingcommand:
mvncompileexec:java-
Dexec.mainClass=com.packtpub.mongo.cookbook.WriteConcernTests
Itshouldtakesometimetoexecutecompletely,dependingonyourhardwareconfiguration.Roughlyaround35to40secondsweretakenonmymachine,whichhasaspinningdiskdrivewitha7200RPM.
Beforewecontinueanalyzingthelogs,letusseewhatthosetwoadditionalfieldsaddedtotheconfigfiletosetupthereplicawere.TheslaveDelayfieldindicatesthattheparticularslave(theonelisteningonport27002inthiscase)willlagbehindtheprimaryby5seconds.Thatis,thedatabeingreplicatedcurrentlyonthisreplicanodewillbetheonethatwasaddedontotheprimary5secondsago.Secondly,thisnodecanneverbeaprimaryandhence,thepriorityfieldhastobeaddedwiththevalue0.WehavealreadyseenthisindetailinChapter4,Administration.
Letusnowanalyzetheoutputfromtheprecedingcommand’sexecution.TheJavaclassprovidedneednotbelookedathere;theoutputontheconsoleissufficient.Someoftherelevantportionsoftheoutputconsoleareasfollows:
[INFO]---exec-maven-plugin:1.2.1:java(default-cli)@mongo-cookbook-
wctest---
Tryingtoconnecttoserverrunningonport20000
Tryingtowritedatainthecollectionwithwriteconcern{w:-1}
ErrorreturnedintheWriteResultisNETWORKERROR
Tryingtowritedatainthecollectionwithwriteconcern{w:0}
CaughtMongoException.Networktryingtowritetocollection,messageis
Writeoperationtoserverlocalhost/127.0.0.1:20000failedondatabasetest
Connectedtoreplicasetwithonenodelisteningonport27000locally
Insertingduplicatekeyswith{w:0}
Noexceptioncaughtwhileinsertingdatawithduplicate_id
Nowinsertingthesamedatawith{w:1}
CaughtDuplicateException,exceptionmessageis{"serverUsed":
"localhost/127.0.0.1:27000","err":"E11000duplicatekeyerrorindex:
test.writeConcernTest.$_id_dupkey:{:\"a\"}","code":11000,"n":
0,"lastOp":{"$ts":1386009990,"$inc":2},"connectionId":157,
"ok":1.0}
AveragerunningtimewithWriteConcern{w:1,fsync:false,j:false}is0ms
AveragerunningtimewithWriteConcern{w:2,fsync:false,j:false}is12ms
AveragerunningtimewithWriteConcern{w:1,fsync:false,j:true}is40ms
AveragerunningtimewithWriteConcern{w:1,fsync:true,j:false}is44ms
AveragerunningtimewithWriteConcern{w:3,fsync:false,j:false}is5128
ms
CaughtWriteConcernexceptionfor{w:5},withfollowingmessage{
"serverUsed":"localhost/127.0.0.1:27000","n":0,"lastOp":{"$ts":
1386009991,"$inc":18},"connectionId":157,"wtimeout":true,
"waited":1004,"writtenTo":[{"_id":0,"host":"localhost:27000"}
,{"_id":1,"host":"localhost:27001"}],"err":"timeout","ok":
1.0}
[INFO]-------------------------------------------------------------------
-----
[INFO]BUILDSUCCESS
[INFO]--------------------------------------------------------------------
----
[INFO]Totaltime:36.671s
[INFO]Finishedat:TueDec0300:16:57IST2013
[INFO]FinalMemory:13M/33M
[INFO]--------------------------------------------------------------------
----
ThefirststatementinthelogstatesthatwetrytoconnecttoaMongoprocesslisteningonport20000.AsthereshouldnotbeaMongoserverrunningandlisteningtothisportforclientconnections,allourwriteoperationstothisservershouldnotsucceed,andthiswillnowgiveusachancetoseewhathappenswhenweusethewriteconcerns{w:-1}and{w:0}andwritetothisnonexistentserver.
Thenexttwolinesintheoutputshowthatwhenwehavethewriteconcern{w:-1},wedogetawriteresultback,butitcontainstheerrorflagsettoindicateanetworkerror.However,noexceptionisthrown.Inthecaseofthewriteconcern{w:0},wedogetanexceptionintheclientapplicationforanynetworkerrors.Ofcourse,allotherwriteconcernsensuringastrictguaranteewillthrowanexceptioninthiscasetoo.
Nowwecometotheportionofthecodethatconnectstothereplicasetwhereoneofthenodesislisteningtoport27000(ifnot,thecodewillshowtheerrorontheconsoleandterminate).Now,weattempttoinsertadocumentwithaduplicate_idfield({'_id':'a'})intoacollection,oncewiththewriteconcern{w:0}andoncewith{w:1}.Asweseeintheconsole,theformer({w:0})didn’tthrowanexceptionandtheinsertwentthroughsuccessfullyfromtheclient’sperspective,whereasthelatter({w:1})threwanexceptiontotheclient,indicatingaduplicatekey.Theexceptioncontainsalotofinformationabouttheserver’shostnameandport,atthetimewhentheexceptionoccurred:thefieldforwhichtheuniqueconstraintfailed;theclientconnectionID;errorcode;andthevaluethatwasnotuniqueandcausedtheexception.Thefactisthat,evenwhentheinsertwasperformedusing{w:0}asthewriteconcern,itfailed.However,asthedriverdidn’twaitfortheserver’sacknowledgement,itwasnevercommunicatedaboutthefailure.
Movingon,wenowtrytocomputethetimetakenforthewriteoperationtocomplete.Thetimeshownhereistheaverageofthetimetakentoexecutethesameoperationwithagivenwriteconcernfivetimes.Notethatthesetimeswillvaryondifferentinstancesofexecutionoftheprogram,andthismethodisjustmeanttogivesomeroughestimatesforourstudy.Wecanconcludefromtheoutputthatthetimetakenforthewriteconcern{w:1}islessthanthatof{w:2}(askingforanacknowledgementfromonesecondarynode)andthetimetakenfor{w:2}islessthan{j:true},whichinturnislessthan{fsync:true}.Thenextlineoftheoutputshowsusthattheaveragetimetakenforthewriteoperationtocompleteisroughly5secondswhenthewriteconcernis{w:3}.Anyguessesonwhythatisthecase?Whydoesittakesolong?Thereasonis,whenwis3,we
sendanacknowledgementtotheclientonlywhentwosecondarynodesacknowledgethewriteoperation.Inourcase,oneofthenodesisdelayedfromtheprimarybyabout5seconds,andthus,itcanacknowledgethewriteonlyafter5seconds,andhence,theclientreceivesaresponsefromtheserverinroughly5seconds.
Letusdoaquickexercisehere.Whatdoyou’llthinkwouldbetheapproximateresponsetimewhenwehavethewriteconcernas{w:'majority'}?Thehinthereis,forareplicasetofthreenodes,twoisthemajority.
Finallyweseeatimeoutexception.Timeoutissetusingthewtimeoutfieldofthedocumentandisspecifiedinmilliseconds.Inourcase,wegaveatimeoutof1000ms,thatis1second,andthenumberofnodesinthereplicasettogetanacknowledgementfrombeforesendingtheresponsebacktotheclientis5(foursecondaryinstances).Thus,wehavethewriteconcernas{w:5,wtimeout:1000}.Asourmaximumnumberofnodesisthree,theoperationwiththevalueofwsetto5willwaitforaverylongtimeuntiltwomoresecondaryinstancesareaddedtothecluster.Withthetimeoutset,theclientreturnsandthrowsanerrortotheclient,conveyingsomeinterestingdetails.ThefollowingistheJSONsentasanexceptionmessage:
{"serverUsed":"localhost/127.0.0.1:27000","n":0,"lastOp":{"$ts"
:1386015030,"$inc":1},"connectionId":507,"wtimeout":true,
"waited":1000,"writtenTo":[{"_id":0,"host":"localhost:27000"}
,{"_id":1,"host":"localhost:27001"}],"err":"timeout","ok":
1.0}
Letuslookattheinterestingfields.Westartwiththenfield.Thisindicatesthenumberofdocumentsupdated.Asinthiscaseitisaninsertandnotanupdate,itstays0.Thewtimeoutandwaitedfieldstelluswhetherthetransactiondidtimeoutandtheamountoftimeforwhichtheclientwaitedforaresponse;inthiscase1000ms.ThemostinterestingfieldiswrittenTo.Inthiscase,theinsertwassuccessfulonthesetwonodesofthereplicasetwhentheoperationtimedout,andhence,itisseeninthearray.ThethirdnodehasaslaveDelayvalueof5secondsand,hence,thedataisstillnotwrittentoit.Thisprovesthatthetimeoutdoesn’trollbacktheinsertanditdoesgothroughsuccessfully.Infact,thenodewithslaveDelaywillalsohavethedataafter5seconds,eveniftheoperationtimesout,andthismakesperfectsenseasitkeepstheprimaryandsecondaryinstancesinsync.Itistheresponsibilityoftheapplicationtodetectsuchtimeoutsandhandlethem.
ReadpreferenceforqueryingIntheprevioussection,wesawwhatawriteconcernisandhowitaffectsthewriteoperations(insert,update,anddelete).Inthissection,wewillseewhatareadpreferenceisandhowitaffectsqueryoperations.We’lldiscusshowtouseareadpreferenceinseparaterecipes,tousespecificprogramminglanguagedrivers.
Whenconnectedtoanindividualnode,queryoperationswillbeallowedbydefaultwhenconnectedtoaprimary,andincaseifitisconnectedtoasecondarynode,weneedtoexplicitlystatethatitisoktoqueryfromsecondaryinstancesbyexecutingrs.slaveOk()fromtheshell.
However,considerconnectingtoaMongoreplicasetfromanapplication.Itwillconnecttothereplicasetandnotasingleinstancefromtheapplication.Dependingonthenatureoftheapplication,itmightalwayswanttoconnecttoaprimary;alwaystoasecondary;preferconnectingtoaprimarynodebutwouldbeoktoconnecttoasecondarynodeinsomescenariosandviceversaandfinally,itmightconnecttotheinstancegeographicallyclosetoit(well,mostofthetime).
Thus,thereadpreferenceplaysanimportantrolewhenconnectedtoareplicasetandnottoasingleinstance.Inthefollowingtable,wewillseethevariousreadpreferencesthatareavailableandwhattheirbehaviorisintermsofqueryingareplicaset.Therearefiveofthemandthenamesareself-explanatory:
Readpreference Description
primary
Thisisthedefaultmodeanditallowsqueriestobeexecutedonlyonprimaryinstances.Itistheonlymodethatguaranteesthemostrecentdata,asallwriteshavetogothroughaprimaryinstance.Readoperationshoweverwillfailifnoprimaryisavailable,whichhappensforafewmomentswhenaprimarygoesdownandcontinuestillanewprimaryischosen.
primaryPreferred
Thisisidenticaltotheprecedingprimaryreadpreference,exceptthatduringafailover,whennoprimaryisavailable,itwillreaddatafromthesecondaryandthosearethetimeswhenitpossiblydoesn’treadthemostrecentdata.
secondary
Thisisexactlytheoppositetothedefaultprimaryreadpreference.Thismodeensuresthatreadoperationsnevergotoaprimaryandasecondaryischosenalways.Thechancesofreadinginconsistentdatathatisnotupdatedtothelatestwriteoperationaremaximalinthismode.It,however,isok(infact,preferred)forapplicationsthatdonotfaceendusersandareusedforsomeinstancestogethourlystatisticsandanalyticsjobsusedforin-housemonitoring,wheretheaccuracyofthedataisleastimportant,butnotaddingaloadtotheprimaryinstanceiskey.Ifnosecondaryinstanceisavailableorreachable,andonlyaprimaryinstanceis,thereadoperationwillfail.
secondaryPreferredThisissimilartotheprecedingsecondaryreadpreference,inallaspectsexceptthatifnosecondaryisavailable,thereadoperationswillgototheprimaryinstance.
nearest
This,unlikealltheprecedingreadpreferences,canconnecteithertoaprimaryorasecondary.Theprimaryobjectiveforthisreadpreferenceisminimumlatencybetweentheclientandaninstanceofareplicaset.Inthemajorityofthecases,owingtothenetworklatencyandwithasimilarnetworkbetweentheclientandallinstances,theinstancechosenwillbeonethatisgeographicallyclose.
Similartohowwriteconcernscanbecoupledwithshardtags,readpreferencescanalsobeusedalongwithshardtags.AstheconceptoftagshasalreadybeenintroducedinChapter4,Administration,youcanrefertoitformoredetails.
Wejustsawwhatthedifferenttypesofreadpreferencesare(exceptforthoseusingtags)butthequestionis,howdoweusethem?WehavecoveredPythonandJavaclientsinthisbookandwillseehowtousethemintheirrespectiverecipes.Wecansetreadpreferencesatvariouslevels:attheclientlevel,collectionlevel,andquerylevel,withtheonespecifiedatthequeryleveloverridinganyotherreadpreferencesetpreviously.
Letusseewhatthenearestreadpreferencemeans.Conceptually,itcanbevisualizedassomethinglikethefollowingdiagram:
AMongoreplicasetissetupwithonesecondary,whichcanneverbeaprimary,inaseparatedatacenterandtwo(oneprimaryandasecondary)inanotherdatacenter.Anidenticalapplicationdeployedinboththedatacenters,withaprimaryreadpreference,willalwaysconnecttotheprimaryinstanceinDataCenterI.Thismeans,fortheapplicationinDataCenterII,thetrafficgoesoverthepublicnetwork,whichwillhavehighlatency.However,iftheapplicationisokwithslightlystaledata,itcansetthereadpreferenceasthenearest,whichwillautomaticallylettheapplicationinDataCenterIconnecttoaninstanceinDataCenterIandwillallowanapplicationinDataCenterIItoconnecttothesecondaryinstanceinDataCenterII.
Butthenthenextquestionis,howdoesthedriverknowwhichoneisthenearest?Theterm“geographicallyclose”ismisleading;itisactuallytheonewiththeminimumnetworklatency.Theinstancewequerymightbegeographicallyfurtherthananotherinstanceinthereplicaset,butitcanbechosenjustbecauseithasanacceptableresponsetime.Generally,betterresponsetimemeansgeographicallycloser.
Thefollowingsectionisforthoseinterestedininternaldetailsfromthedriveronhowthenearestnodeischosen.Ifyouarehappywithjusttheconceptsandnottheinternaldetails,youcansafelyskiptherestofthecontents.
KnowingtheinternalsLetusseesomepiecesofcodefromaJavaclient(driver2.11.3isusedforthispurpose)andmakesomesenseoutofit.Ifwelookatthecom.mongodb.TaggableReadPreference.NearestReadPreference.getNodemethod,weseethefollowingimplementation:
@Override
ReplicaSetStatus.ReplicaSetNodegetNode(ReplicaSetStatus.ReplicaSetset){
if(_tags.isEmpty())
returnset.getAMember();
for(DBObjectcurTagSet:_tags){
List<ReplicaSetStatus.Tag>tagList=getTagListFromDBObject(curTagSet);
ReplicaSetStatus.ReplicaSetNodenode=set.getAMember(tagList);
if(node!=null){
returnnode;
}
}
returnnull;
}
Fornow,ifweignorethecontentswheretagsarespecified,allitdoesisexecuteset.getAMember().
Thenameofthismethodtellsusthatthereisasetofreplicasetmembersandwereturnedoneofthemrandomly.Thenwhatdecideswhetherthesetcontainsamemberornot?Ifwedigabitfurtherintothismethod,weseethefollowinglinesofcodeinthecom.mongodb.ReplicaSetStatus.ReplicaSetclass:
publicReplicaSetNodegetAMember(){
checkStatus();
if(acceptableMembers.isEmpty()){
returnnull;
}
returnacceptableMembers.get(random.nextInt(acceptableMembers.size()));
}
Ok,soallitdoesispickonefromalistofreplicasetnodesmaintainedinternally.Now,therandompickcanbeasecondary,evenifaprimarycanbechosen(becauseitispresentinthelist).Thus,wecannowsaythatwhenthenearestischosenasareadpreference,andevenifaprimaryisinthelistofcontenders,itmightnotnecessarilybechosenrandomly.
Thequestionnowis,howistheacceptableMemberslistinitialized?Weseeitisdoneintheconstructorofthecom.mongodb.ReplicaSetStatus.ReplicaSetclassasfollows:
this.acceptableMembers
=Collections.unmodifiableList(calculateGoodMembers(all,
calculateBestPingTime(all,true),acceptableLatencyMS,true));
ThecalculateBestPingTimelinejustfindsthebestpingtimeofall(wewillseewhatthispingtimeislater).
AnotherparameterworthmentioningisacceptableLatencyMS.Thisgetsinitializedin
com.mongodb.ReplicaSetStatus.Updater(thisisactuallyabackgroundthreadthatupdatesthestatusofthereplicasetcontinuously),andthevalueforacceptableLatencyMSisinitializedasfollows:
slaveAcceptableLatencyMS=
Integer.parseInt(System.getProperty("com.mongodb.slaveAcceptableLatencyMS",
"15"));
Aswecansee,thiscodesearchesforthesystemvariablecalledcom.mongodb.slaveAcceptableLatencyMS,andifnoneisfound,itinitializestothevalue15,whichis15ms.
Thiscom.mongodb.ReplicaSetStatus.Updaterclassalsohasarunmethodthatperiodicallyupdatesthereplicasetstats.Withoutgettingtoomuchintoit,wecanseethatitcallsupdateAll,whicheventuallyreachestheupdatemethodincom.mongodb.ConnectionStatus.UpdatableNode:
longstart=System.nanoTime();
CommandResultres=_port.runCommand(_mongo.getDB("admin"),isMasterCmd);
longend=System.nanoTime()
Allitdoesisexecutethe{isMaster:1}commandandrecordtheresponsetimeinnanoseconds.Thisresponsetimeisconvertedtomillisecondsandstoredasthepingtime.So,comingbacktothecom.mongodb.ReplicaSetStatus.ReplicaSetclassitstores,allcalculateGoodMembersdoesisfindandaddthemembersofareplicasetthatarenomorethanacceptableLatencyMSmillisecondsmorethanthebestpingtimefoundinthereplicaset.
Forexample,inareplicasetwiththreenodes,thepingtimesfromtheclienttothethreenodes(node1,node2,andnode3)are2ms,5ms,and150msrespectively.Aswesee,thebesttimeis2msandhence,node1goesintothesetofgoodmembers.Now,fromtheremainingnodes,allthosewithalatencythatisnomorethanacceptableLatencyMSmorethanthebest,whichis2+15ms=17ms,as15msisthedefaultthatwillbeconsidered.Thus,node2isalsoacontender,leavingoutnode3.Wenowhavetwonodesinthelistofgoodmembers(goodintermsoflatency).
Now,puttingtogetherallthatwesawonhowitwouldworkforthescenariowesawintheprecedingdiagram,theleastresponsetimewillbefromoneoftheinstancesinthesamedatacenter(fromtheprogramminglanguagedriver’sperspectiveinthesetwodatacenters),astheinstance(s)inotherdatacentersmightnotrespondwithin15ms(thedefaultacceptablevalue)morethanthebestresponsetimeduetopublicnetworklatency.Thus,theacceptablenodesinDataCenterIwillbetwoofthereplicasetnodesinthatdatacenter,andoneofthemwillbechosenatrandom,andforDataCenterII,onlyoneinstanceispresentandistheonlyoption.Hence,itwillbechosenbytheapplicationrunninginthatdatacenter.
IndexA
advancedpackagingtool(apt)/Howtodoit…aggregationoperations,inMongo
executing,withPyMongo/AggregationinMongousingPyMongo,Howitworks…executing,withJavaclient/AggregationinMongousingaJavaclient,Gettingready,Howitworks…
alertssettingup,onMMS/MonitoringMongoDBinstancesonMMS,Howtodoit…,Howitworks…,There’smore…URL/Seealso
AmazonURL/Gettingready
AmazonEC2MongoDB,settingupwithMongoDBAMI/SettingupMongoDBonAmazonEC2usingtheMongoDBAMI,Gettingready,Howtodoit…,Howitworks…MongoDB,settingupwithoutMongoDBAMI/SettingupMongoDBonAmazonEC2withoutusingtheMongoDBAMI,Gettingready,Howtodoit…,Howitworks…
AmazonEMRMapReducejob,runningon/RunningaMapReducejobonAmazonEMR,Gettingready,Howtodoit…,Howitworks…URL/Seealso
AmazonmarketplaceURL/Howtodoit…
AmazonS3URL/RunningaMapReducejobonAmazonEMR
AmazonWebService(AWS)/IntroductionAMI/SettingupMongoDBonAmazonEC2usingtheMongoDBAMI
URL/SettingupMongoDBonAmazonEC2usingtheMongoDBAMIApacheHadoop
URL/Gettingreadyar|awcolumn/Howitworks…atomiccounters
implementing,inMongoDB/ImplementingatomiccountersinMongoDB,Howtodoit…,Howitworks…
avgObjectSizefield,databasestats/Howitworks…avgObjSizefield,collectionstats/Howitworks…AWSconsole
URL/Howtodoit…
Bbackups
managing,inMMSbackupservice/ManagingbackupsintheMMSbackupservice,Howtodoit…,Howitworks…
binarydatastoring,inMongoDB/StoringbinarydatainMongoDB,Howitworks…
buildIndexesoption/Hidden,votes,slavedelayed,andbuildindexconfigurationsbuilt-in-roles
URL/Seealsobulkinserts
URL/Howitworks…
Ccappedcollection
about/CreatingandtailingcappedcollectioncursorsinMongoDBnormalcollection,convertingto/Convertinganormalcollectiontoacappedcollection,Howitworks…
cappedcollectioncursorstailing,inMongoDB/CreatingandtailingcappedcollectioncursorsinMongoDB,Gettingready,Howtodoit…,Howitworks…creating,inMongoDB/CreatingandtailingcappedcollectioncursorsinMongoDB,Howtodoit…,Howitworks…
chunkssplitting/Manuallysplittingandmigratingchunks,Howtodoit…,Howitworks…manualmigration/Manuallysplittingandmigratingchunks,Howitworks…
clientfield,db.currentOp()operation/Howitworks…clientfield,operations/Howitworks…cloudcomputing
URL/Introductioncloudformation
URL/Seealsocollection
renaming/Renamingacollection,Howtodoit…,Howitworks…collectionbehavior
modifying,withcollModcommand/ModifyingcollectionbehaviorusingthecollModcommand,Howtodoit…,Howitworks…
collectionsfield,databasestats/Howitworks…collectionstats
viewing/Viewingcollectionstats,Howitworks…collModcommand
used,formodifyingcollectionbehavior/ModifyingcollectionbehaviorusingthecollModcommand,Howtodoit…,Howitworks…
command-lineoptionsused,forstartingsinglenodeinstance/Startingasinglenodeinstanceusingcommand-lineoptions,Howitworks…,Seealso—help/-h/Howitworks…—config/-f/Howitworks…—verbose/-v/Howitworks…—quiet/Howitworks…—port/Howitworks…—logpath/Howitworks…—logappend/Howitworks…—dbpath/Howitworks…—smallfiles/Howitworks…
—replSet/Howitworks…—configsvr/Howitworks…—shardsvr/Howitworks…—oplogSize/Howitworks…
commandcolumn/Howitworks…configdatabase
exploring,inshardedsetup/Exploringtheconfigdatabaseinashardedsetup,Howtodoit…,Howitworks…
configfileoptionsused,forsinglenodeinstallationofMongoDB/SinglenodeinstallationofMongoDBwithoptionsfromtheconfigfile,Howitworks…
conncolumn/Howitworks…connectionIdfield,operations/Howitworks…countfield,collectionstats/Howitworks…coveredindexes
about/Improvementusingcoveredindexesusing/Improvementusingcoveredindexes
customuserrolesURL/Seealso
D-doption/Howitworks…data
deleting,fromMongoshell/Gettingready,Howtodoit…updating,fromMongoshell/Gettingready,Howtodoit…storing,toGridFSfromJavaclient/StoringdatatoGridFSfromaJavaclient,Howtodoit…,Howitworks…storing,toGridFSfromPythonclient/StoringdatatoGridFSfromaPythonclient,Howtodoit…,Howitworks…backingup,without-of-theboxtoolsinMongoDB/BackingupandrestoringdatainMongousingout-of-theboxtools,Howitworks…restoring,without-of-theboxtoolsinMongoDB/BackingupandrestoringdatainMongousingout-of-theboxtools,Howitworks…
databasestatsviewing/Viewingdatabasestats,Howtodoit…,Howitworks…
DataCenterI/Readpreferenceforquerying,KnowingtheinternalsDataCenterII/Readpreferenceforquerying,Knowingtheinternalsdatafilepreallocation
disabling/Disablingthepreallocationofdatafiles,Howtodoit…datanucleus
URL/SeealsodataSizefield,databasestats/Howitworks…dbaddresscommand-lineoption,values
mydb/There’smore…mongo.server.host/mydb/There’smore…mongo.server.host*27000/mydb/There’smore…mongo.server.host*27000/There’smore…
dbfield,databasestats/Howitworks…deletecolumn/Howitworks…deleteoperations
executing,withPyMongo/Gettingready,Howtodoit…,Howitworks…executing,withJavaclient/Howtodoit…,Howitworks…,Seealso
descfield,operations/Howitworks…document
manualpadding/Manuallypaddingadocument,Howtodoit…,Howitworks…
domain-drivenshardingperforming,withtags/Performingdomain-drivenshardingusingtags,Gettingready,Howitworks…
driver/Introductionduplicatedata
deleting/Creatinguniqueindexesoncollectionanddeletingtheexistingduplicatedataautomatically,Gettingready,Howitworks…
EEC2
URL/GettingreadyElasticBlockStore(EBS)/SettingupMongoDBonAmazonEC2usingtheMongoDBAMIElasticCloudCompute(E2C)/SettingupMongoDBonAmazonEC2usingtheMongoDBAMIElasticsearch
integrating,withMongoDBforfulltextsearch/IntegratingMongoDBwithElasticsearchforafull-textsearch,Gettingready,Howtodoit…,Howitworks…,There’smore…URL/Gettingready,Seealso
EMRconsoleURL/Howtodoit…
evalfunction/Howitworks…
F—fieldsoption/Howitworks…faultscolumn/Howitworks…fields,collectionstats
ns/Howitworks…count/Howitworks…size/Howitworks…avgObjSize/Howitworks…storageSize/Howitworks…numExtents/Howitworks…nindexes/Howitworks…lastExtentSize/Howitworks…paddingFactor/Howitworks…totalIndexSize/Howitworks…indexSizes/Howitworks…
fields,databasestatsdb/Howitworks…collections/Howitworks…objects/Howitworks…avgObjectSize/Howitworks…dataSize/Howitworks…storageSize/Howitworks…numExtents/Howitworks…indexes/Howitworks…indexSize/Howitworks…fileSize/Howitworks…nsSizeMB/Howitworks…
fields,db.currentOp()operationop/Howitworks…ns/Howitworks…query/Howitworks…nscanned/Howitworks…numYields/Howitworks…lockStats/Howitworks…nreturned/Howitworks…responseLength/Howitworks…millis/Howitworks…ts/Howitworks…client/Howitworks…
fields,operationsopid/Howitworks…active/Howitworks…secs_running/Howitworks…
op/Howitworks…ns/Howitworks…insert/Howitworks…query/Howitworks…client/Howitworks…desc/Howitworks…connectionId/Howitworks…locks/Howitworks…waitingForLock/Howitworks…msg/Howitworks…progress/Howitworks…numYields/Howitworks…lockStats/Howitworks…
fieldsparameter/Howitworks…fileSizefield,databasestats/Howitworks…findAndModifymethod/Howitworks…findAndModifymethod,MongoTemplateclass/Howitworks…findAndRemove/findAllAndRemovemethod,MongoTemplateclass/Howitworks…findByAgeBetweenmethod/Howitworks…findByAgeGreaterThanEqualmethod/Howitworks…findByAgeGreaterThanmethod/Howitworks…findByFirstNameAndCountrymethod/Howitworks…findByResidentialAddressCountrymethod/Howitworks…findoperation
about/Atomicfindandmodifyoperations,Howtodoit…working/Howitworks…
findPeopleByLastNameLikemethod/Howitworks…FirstIn,FirstOut(FIFO)pattern/Howitworks…firstinfirstout(FIFO)/Howitworks…flatplane(2D)geospatialqueries
executing,inMongoDBwithgeospatialindexes/Executingflatplane(2D)geospatialqueriesinMongousinggeospatialindexes,Gettingready,Howtodoit…,Howitworks…
flushescolumn/Howitworks…fulltextsearch
implementing,inMongoDB/Implementingafull-textsearchinMongoDB,Howtodoit…,Howitworks…URL/Howitworks…MongoDB,integratingwithElasticsearch/IntegratingMongoDBwithElasticsearchforafull-textsearch,Gettingready,Howtodoit…,Howitworks…,There’smore…
GGeoJSON
URL/SphericalindexesandGeoJSON-compliantdatainMongoDBGeoJSON-compliantdata,MongoDB
about/SphericalindexesandGeoJSON-compliantdatainMongoDB,Howtodoit…working/Howitworks…
geospatialindexesused,forexecutingflatplane(2D)geospatialqueriesinMongoDB/Executingflatplane(2D)geospatialqueriesinMongousinggeospatialindexes,Gettingready,Howtodoit…,Howitworks…
geospatialoperatorsURL/Howitworks…
getmorecolumn/Howitworks…Git
URL/GettingreadyGridFS
used,forstoringlargedatainMongoDB/StoringlargedatainMongoDBusingGridFS,Howtodoit…,Howitworks…,There’smore…URL/There’smore…
GridFS,fromJavaclientdata,storingto/StoringdatatoGridFSfromaJavaclient,Howtodoit…,Howitworks…
GridFS,fromPythonclientdata,storingto/StoringdatatoGridFSfromaPythonclient,Howtodoit…,Howitworks…
groupsmanaging,onMMSconsole/Gettingready,Howtodoit…,Howitworks…
GUI-basedclientinstalling,forMongoDB/InstallingtheGUI-basedclient,MongoVUE,forMongoDB,Howtodoit…
HHadoop
about/IntroductionURL/Howtodoit…streaming,usedforrunningMapReducejobson/RunningMapReducejobsonHadoopusingstreaming,Howitworks…,Howtodoit…
HadoopMapReducejobwriting/WritingourfirstHadoopMapReducejob,Howtodoit…,Howitworks…
HadoopstreamingURL/RunningMapReducejobsonHadoopusingstreaming
hostinstancemonitoring/MonitoringMongoDBinstancesonMMS,Howtodoit…,Howitworks…,There’smore…
II/Ooperationspersecond(IOPS)/SettingupMongoDBonAmazonEC2usingtheMongoDBAMIidxmiss%column/Howitworks…in-builtprofiler
used,toprofileoperations/Usingprofilertoprofileoperations,Howitworks…index
creating/Gettingready,Howitworks…creating,gotchas/Somegotchasofindexcreation
indexcreation,fromMongoshellpitfalls,avoiding/Backgroundandforegroundindexcreationfromtheshell,Gettingready,Howitworks…
indexesfield,databasestats/Howitworks…indexSizefield,databasestats/Howitworks…indexSizesfield,collectionstats/Howitworks…insertcolumn/Howitworks…insertfield,operations/Howitworks…insertmethod,MongoTemplateclass/Howitworks…insertoperations
executing,withPyMongo/ExecutingqueryandinsertoperationsusingPyMongo,Howtodoit…,Howitworks…executing,withJavaclient/ExecutingqueryandinsertoperationsusingaJavaclient,Howitworks…
interprocesssecurityinMongoDB/UnderstandinginterprocesssecurityinMongoDB,Howtodoit…,There’smore…
IntrastructureasaService(IaaS)/There’smore…
JJavaclient
singlenodeconnection,establishing/ConnectingtoasinglenodefromaJavaclient,Howtodoit…,Howitworks…replicasetconnection,forqueryingdata/Gettingready,Howtodoit…,Howitworks…replicasetconnection,forinsertingdata/Gettingready,Howtodoit…,Howitworks…used,forexecutingqueryoperations/ExecutingqueryandinsertoperationsusingaJavaclient,Howitworks…used,forexecutinginsertoperations/ExecutingqueryandinsertoperationsusingaJavaclient,Howitworks…used,forexecutingupdateoperations/ExecutingupdateanddeleteoperationsusingaJavaclient,Howitworks…used,forexecutingdeleteoperations/ExecutingupdateanddeleteoperationsusingaJavaclient,Howitworks…used,forexecutingaggregationoperationsinMongo/AggregationinMongousingaJavaclient,Gettingready,Howitworks…used,forexecutingMapReduceoperationsinMongo/MapReduceinMongousingaJavaclient,Howitworks…data,storingtoGridFSfrom/StoringdatatoGridFSfromaJavaclient,Howtodoit…,Howitworks…
JavaDatabaseConnectivity(JDBC)/Howitworks…Javadoc
URL/SeealsoJavadocuments
URL/Howitworks…JavaPersistenceAPI
used,foraccessingMongoDB/AccessingMongoDBusingJavaPersistenceAPI,Howtodoit…,Howitworks…URL/Seealso
JavaPersistenceAPI(JPA)about/Introduction
JIRAURL/Howitworks…,Howitworks…
LlastExtentSizefield,collectionstats/Howitworks…leastrecentlyused(LRU)/Howitworks…localdatabase,replicaset
exploring/Exploringthelocaldatabaseofareplicaset,Howtodoit…,Howitworks…
lockedcolumn/Howitworks…locksfield,operations/Howitworks…lockStatsfield,db.currentOp()operation/Howitworks…lockStatsfield,operations/Howitworks…
M-moption/Howitworks…mappedcolumn/Howitworks…MapReduce
URL/Gettingready,Howtodoit…MapReducejob
executing,withmongo-hadoopconnector/ExecutingourfirstsampleMapReducejobusingthemongo-hadoopconnector,Gettingready,Howtodoit…,Howitworks…,There’smore…running,onHadoopwithstreaming/RunningMapReducejobsonHadoopusingstreaming,Howitworks…,Howtodoit…running,onAmazonEMR/RunningaMapReducejobonAmazonEMR,Gettingready,Howtodoit…,Howitworks…
MapReduceoperationimplementing,MongoVUEused/Howtodoit…
MapReduceoperations,inMongoexecuting,withPyMongo/MapReduceinMongousingPyMongo,Howtodoit…,Howitworks…,Seealsoexecuting,withJavaclient/MapReduceinMongousingaJavaclient,Howitworks…
MavenURL,fordownloading/Howtodoit…URL,fordocumentation/Howitworks…
millisfield,db.currentOp()operation/Howitworks…MMS
about/Introductionsigningup/SigningupforMMSandsettinguptheMMSmonitoringagent,Howtodoit…,Howitworks…monitoringagent,settingup/Gettingready,Howtodoit…,Howitworks…URL/Howtodoit…,Seealsoalerts,settingup/MonitoringMongoDBinstancesonMMS,Howtodoit…,Howitworks…,There’smore…monitoringalerts,settingup/SettingupmonitoringalertsonMMS,Howitworks…
MMSbackupserviceconfiguring/ConfiguringtheMMSbackupservice,Howtodoit…,Howitworks…backups,managing/ManagingbackupsintheMMSbackupservice,Howtodoit…,Howitworks…
MMSconsoleusers,managing/ManagingusersandgroupsontheMMSconsole,Howtodoit…,Howitworks…groups,managing/ManagingusersandgroupsontheMMSconsole,Howtodo
it…,Howitworks…modifyoperation
about/Gettingready,Howtodoit…working/Howitworks…
MongoURL/Introductionaggregationoperations,executingwithPyMongo/AggregationinMongousingPyMongo,Howitworks…MapReduceoperations,executingwithPyMongo/MapReduceinMongousingPyMongo,Gettingready,Howitworks…aggregationoperations,executingwithJavaclient/AggregationinMongousingaJavaclient,Howtodoit…,Howitworks…MapReduceoperations,executingwithJavaclient/MapReduceinMongousingaJavaclient,Howitworks…
mongo-connectorURL/Howitworks…
mongo-hadoopconnectorused,forexecutingMapReducejob/ExecutingourfirstsampleMapReducejobusingthemongo-hadoopconnector,Gettingready,Howtodoit…,Howitworks…,There’smore…URL/Howitworks…,Seealso
Mongoclient,options—help/-h/There’smore…—shell/There’smore…—port/There’smore…—host/There’smore…—username/-u/There’smore…—password/-p/There’smore…
MongoconnectorURL/IntegratingMongoDBwithElasticsearchforafull-textsearch
MongoDBtriggers,implementingwithoplog/ImplementingtriggersinMongoDBusingoplog,Gettingready,Howitworks…
MongoDBsinglenodeinstallation/SinglenodeinstallationofMongoDBURL,fordownloading/Gettingreadysinglenodeinstallation,withconfigfileoptions/SinglenodeinstallationofMongoDBwithoptionsfromtheconfigfile,Howitworks…users,settingup/SettingupusersinMongoDB,Gettingready,Howtodoit…,Howitworks…interprocesssecurity/UnderstandinginterprocesssecurityinMongoDB,Gettingready,Howtodoit…,There’smore…settingup,asWindowsService/SettingupMongoDBasaWindowsService,Howtodoit…
atomiccounters,implementing/ImplementingatomiccountersinMongoDB,Howitworks…cappedcollectioncursors,creating/CreatingandtailingcappedcollectioncursorsinMongoDB,Howtodoit…,Howitworks…cappedcollectioncursors,tailing/CreatingandtailingcappedcollectioncursorsinMongoDB,Howtodoit…,Howitworks…binarydata,storingin/StoringbinarydatainMongoDB,Howitworks…largedata,storingwithGridFS/StoringlargedatainMongoDBusingGridFS,Howtodoit…,Howitworks…,There’smore…geospatialindexes,usedforexecutingflatplane(2D)geospatialqueries/Executingflatplane(2D)geospatialqueriesinMongousinggeospatialindexes,Gettingready,Howtodoit…,Howitworks…fulltextsearch,implementingin/Implementingafull-textsearchinMongoDB,Howtodoit…,Howitworks…,There’smore…integrating,withElasticsearchforfulltextsearch/IntegratingMongoDBwithElasticsearchforafull-textsearch,Gettingready,Howtodoit…,Howitworks…URL/There’smore…,ConfiguringtheMMSbackupservice,Howitworks…,Gettingreadydata,backingupwithout-of-theboxtools/BackingupandrestoringdatainMongousingout-of-theboxtools,Howitworks…data,restoringwithout-of-theboxtools/BackingupandrestoringdatainMongousingout-of-theboxtools,Howitworks…operations,performingfromMongoLabGUI/PerformingoperationsonMongoDBfromMongoLabGUI,Howtodoit…,Howitworks…settingup,onAmazonEC2withMongoDBAMI/SettingupMongoDBonAmazonEC2usingtheMongoDBAMI,Gettingready,Howtodoit…,Howitworks…settingup,onAmazonEC2withoutMongoDBAMI/SettingupMongoDBonAmazonEC2withoutusingtheMongoDBAMI,Howtodoit…,Howitworks…accessing,JavaPersistenceAPIused/AccessingMongoDBusingJavaPersistenceAPI,Howitworks…accessing,overREST/AccessingMongoDBoverREST,Howtodoit…,Howitworks…GUI-basedclient,installingfor/InstallingtheGUI-basedclient,MongoVUE,forMongoDB,Howtodoit…MongoVUE,installingfor/InstallingtheGUI-basedclient,MongoVUE,forMongoDB,Howtodoit…queries,writing/Howtodoit…document,insertingincollection/Howtodoit…document,updating/Howtodoit…indexes,creating/Howtodoit…index,dropping/Howtodoit…
aggregationoperations,executing/Howtodoit…MongoDBAMI
used,forsettingupMongoDBonAmazonEC2/SettingupMongoDBonAmazonEC2usingtheMongoDBAMI,Gettingready,Howtodoit…,Howitworks…
MongoDBAPIURL/Howitworks…
MongoDBdriverURL/Seealso
MongoLabURL/SettingupandmanagingtheMongoLabaccountsandboxMongoDBinstance,settingup/SettingupasandboxMongoDBinstanceonMongoLab,Howtodoit…,Howitworks…
MongoLabaccountsettingup/SettingupandmanagingtheMongoLabaccount,Howtodoit…managing/SettingupandmanagingtheMongoLabaccount,Howtodoit…,Howitworks…
MongoLabGUIoperations,performingonMongoDBfrom/PerformingoperationsonMongoDBfromMongoLabGUI,Howtodoit…,Howitworks…
MongoMonitoringService/IntroductionMongoshell
singlenodeconnection,withpreloadedJavaScript/ConnectingtoasinglenodefromtheMongoshellwithapreloadedJavaScript,Howtodoit…,There’smore…shardconnection,creatingfrom/ConnectingtoashardfromtheMongoshellandperformingoperations,Howitworks…pagination,performing/Performingsimplequerying,projections,andpaginationfromtheMongoshell,Howtodoit…,Howitworks…querying/Performingsimplequerying,projections,andpaginationfromtheMongoshell,Howtodoit…,Howitworks…projections,performing/Gettingready,Howitworks…data,deleting/Updatinganddeletingdatafromtheshell,Howtodoit…data,updating/Gettingready,Howtodoit…
/Introductionmongostatutility
about/Understandingthemongostatandmongotoputilities,Howtodoit…working/Howitworks…
MongoTemplateclass,methodssave/Howitworks…remove/Howitworks…updateMulti/Howitworks…updateFirst/Howitworks…insert/Howitworks…
findAndRemove/findAllAndRemove/Howitworks…findAndModify/Howitworks…
mongotoputilityabout/Understandingthemongostatandmongotoputilities,Gettingreadyworking/Howitworks…
MongoVUEinstalling,forMongoDB/InstallingtheGUI-basedclient,MongoVUE,forMongoDB,Howtodoit…URL/Howtodoit…,Seealsoused,forimplementingMapReduceoperation/Howtodoit…used,formonitoringserverinstances/Howtodoit…
monitoringagentURL/Seealso
monitoringalertssettingup,onMMS/SettingupmonitoringalertsonMMS,Howitworks…
msgfield,operations/Howitworks…
N-noption/Howitworks…nearest,readpreference/ReadpreferenceforqueryingnetIncolumn/Howitworks…netOutcolumn/Howitworks…nindexesfield,collectionstats/Howitworks…nonshardedcollections
shard,configuring/Configuringthedefaultshardfornonshardedcollections,Howtodoit…,Howitworks…
normalcollectionconverting,tocappedcollection/Convertinganormalcollectiontoacappedcollection,Howitworks…
nreturnedfield,db.currentOp()operation/Howitworks…nscannedfield,db.currentOp()operation/Howitworks…nsfield,collectionstats/Howitworks…nsfield,db.currentOp()operation/Howitworks…nsfield,operations/Howitworks…nsSizeMB,databasestats/Howitworks…nsSizeMBfield,databasestats/Howitworks…numExtentsfield,collectionstats/Howitworks…numExtentsfield,databasestats/Howitworks…numYieldsfield,db.currentOp()operation/Howitworks…numYieldsfield,operations/Howitworks…
Oobjectrelationalmapping(ORM)
about/Introductionobjectsfield,databasestats/Howitworks…operations
killing/Gettingready,Howtodoit…,Howitworks…viewing/Gettingready,Howtodoit…,Howitworks…profiling,within-builtprofiler/Usingprofilertoprofileoperations,Howitworks…
opfield,db.currentOp()operation/Howitworks…opfield,operations/Howitworks…opidfield,operations/Howitworks…oplog
about/Understandingandanalyzingoplogs,Gettingreadyanalyzing/Howtodoit…working/Howitworks…used,forimplementingtriggersinMongoDB/ImplementingtriggersinMongoDBusingoplog,Gettingready,Howitworks…
options,mongodumputility—help/Howitworks…—host(-h)/Howitworks…—port/Howitworks…—username(-u)/Howitworks…—password(-p)/Howitworks…—authenticationDatabase/Howitworks…—db(-d)/Howitworks…—collection(-c)/Howitworks…—out(-o)/Howitworks…—dbpath/Howitworks…—oplog/Howitworks…
options,Mongoimportutility—type/Howitworks…-d/Howitworks…-c/Howitworks…—headerline/Howitworks…—drop/Howitworks…
options,mongorestoreutility—dbpath/Howitworks…—drop/Howitworks…—oplogReplay/Howitworks…—oplogLimit/Howitworks…
out-of-theboxtoolsused,forbackingupdatainMongoDB/Gettingready,Howitworks…
PpaddingFactorfield,collectionstats/Howitworks…pagination
performing,fromMongoshell/Performingsimplequerying,projections,andpaginationfromtheMongoshell,Howitworks…
pipforWindowsURL/Gettingready
postalcodedataURL/Howitworks…
primary,readpreference/ReadpreferenceforqueryingprimaryPreferred,readpreference/Readpreferenceforqueryingprogressfield,operations/Howitworks…projections
performing,fromMongoshell/Performingsimplequerying,projections,andpaginationfromtheMongoshell,Howitworks…
proofofconcept(POC)/There’smore…PuTTY
URL/GettingreadyPyMongo
about/InstallingPyMongoinstalling/Gettingready,Howtodoit…,There’smore…URL/Howtodoit…used,forinsertingoperations/ExecutingqueryandinsertoperationsusingPyMongo,Howtodoit…,Howitworks…used,forexecutingquery/Gettingready,Howtodoit…,Howitworks…used,forexecutingupdateoperations/Gettingready,Howtodoit…,Howitworks…used,forexecutingdeleteoperations/Howtodoit…,Howitworks…used,forexecutingaggregationoperationsinMongo/AggregationinMongousingPyMongo,Howitworks…used,forexecutingMapReduceoperationsinMongo/MapReduceinMongousingPyMongo,Howtodoit…,Howitworks…,Seealso
PythonURL/Gettingready
Pythonclientdata,storingtoGridFS/Gettingready,Howtodoit…,Howitworks…
PythonPackageIndex(PyPI)tool/Howtodoit…
Qqr|qwcolumn/Howitworks…querycolumn/Howitworks…queryexecutiontime
improving/Improvingthequeryexecutiontimequeryfield,db.currentOp()operation/Howitworks…queryfield,operations/Howitworks…querying
performing,fromMongoshell/Performingsimplequerying,projections,andpaginationfromtheMongoshell,Howtodoit…,Howitworks…
queryoperationsexecuting,withPyMongo/Gettingready,Howtodoit…,Howitworks…executing,withJavaclient/ExecutingqueryandinsertoperationsusingaJavaclient,Howitworks…
queryparameter/Howitworks…queryplan
viewing/Gettingready,Howitworks…analysis/Analyzingtheplanexecutiontime,improving/Improvingthequeryexecutiontimeimproving,withindexesusage/Improvementusingindexesimproving,withcoveredindexesusage/Improvementusingcoveredindexes
RRAID
URL/Seealsoreadpreference
about/Readpreferenceforqueryingforquerying/Readpreferenceforqueryingprimary/ReadpreferenceforqueryingprimaryPreferred/Readpreferenceforqueryingsecondary/ReadpreferenceforqueryingsecondaryPreferred/Readpreferenceforqueryingnearest/Readpreferenceforqueryinginternals/Knowingtheinternals
removemethod,MongoTemplateclass/Howitworks…removeparameter/Howitworks…replicaset
creating/Startingmultipleinstancesaspartofareplicasetabout/Startingmultipleinstancesaspartofareplicasetconfiguring/Gettingready,Howtodoit…,Howitworks…,Configuringareplicaset,Gettingreadystandaloneinstance,convertingto/There’smore…URL,forstandaloneinstanceconversion/There’smore…elections/Electionsinareplicasetconfiguration/Basicconfigurationforareplicasetconfiguration,steps/Howtodoit…member,asarbiter/Areplicasetmemberasanarbiterhiddenterm/Hidden,votes,slavedelayed,andbuildindexconfigurationsvotesoption/Hidden,votes,slavedelayed,andbuildindexconfigurationsindexconfiguration,building/Hidden,votes,slavedelayed,andbuildindexconfigurationssteppingdown,asprimaryinstance/Steppingdownasaprimaryinstancefromthereplicaset,Howtodoit…localdatabase,exploring/Exploringthelocaldatabaseofareplicaset,Howtodoit…,Howitworks…indexcreation,URL/Howitworks…taggedreplicasets,building/BuildingtaggedreplicasetsWriteConcern/WriteConcernintaggedreplicasetsReadPreference/ReadPreferenceintaggedreplicasets
replicaset,writeconcernsettingup/Settingupareplicaset
replicasetconnectionestablishing,fromshellforqueryingdata/Connectingtothereplicasetfromtheshelltoqueryandinsertdata,Howtodoit…,Howitworks…establishing,fromshellforinsertingdata/Connectingtothereplicasetfromthe
shelltoqueryandinsertdata,Howtodoit…,Howitworks…establishing,forinsertingdatafromJavaclient/Gettingready,Howtodoit…,Howitworks…establishing,forqueryingdatafromJavaclient/Gettingready,Howtodoit…,Howitworks…
replicasetmemberasarbiter/Areplicasetmemberasanarbiterpriority/Priorityofreplicasetmembers
rescolumn/Howitworks…responseLengthfield,db.currentOp()operation/Howitworks…REST
MongoDB,accessingover/AccessingMongoDBoverREST,Howtodoit…,Howitworks…
returnNewparameter/Howitworks…rs.stepDown()method/Howitworks…
SsandboxMongoDBinstance
settingup,onMongoLab/SettingupasandboxMongoDBinstanceonMongoLab,Howtodoit…,Howitworks…
savefunction/Howitworks…savemethod,MongoTemplateclass/Howitworks…secondary,readpreference/ReadpreferenceforqueryingsecondaryPreferred,readpreference/Readpreferenceforqueryingsecs_runningfield,operations/Howitworks…server-sidescripts
implementing/Implementingserver-sidescripts,Howtodoit…,Howitworks…
serverinstancesmonitoring,MongoVUEused/Howtodoit…
sh.addShardTagmethod/Howitworks…sh.removeShardTagmethod/Howitworks…sh.splitAtfunction/Howitworks…shard
configuring,fornonshardedcollections/Configuringthedefaultshardfornonshardedcollections,Howtodoit…,Howitworks…
shardconnectioncreating,fromMongoshell/ConnectingtoashardfromtheMongoshellandperformingoperations,Howitworks…creating,fordataoperations/ConnectingtoashardfromtheMongoshellandperformingoperations,Howitworks…
shardedsetupconfigdatabase,exploring/Exploringtheconfigdatabaseinashardedsetup,Howtodoit…,Howitworks…
simpleshardedenvironmentstarting,oftwoshards/Startingasimpleshardedenvironmentoftwoshards,Howtodoit…,Howitworks
SimpleStorageService(S3)/Howtodoit…,RunningaMapReducejobonAmazonEMRsinglenodeconnection
establishing,fromMongoshellwithpreloadedJavaScript/ConnectingtoasinglenodefromtheMongoshellwithapreloadedJavaScript,Howtodoit…,There’smore…establishing,fromJavaclient/ConnectingtoasinglenodefromaJavaclient,Howtodoit…,Howitworks…prerequisites,fromJavaclient/Gettingready
singlenodeinstallation,MongoDBabout/SinglenodeinstallationofMongoDB
singlenodeinstance
starting,command-lineoptionsused/Startingasinglenodeinstanceusingcommand-lineoptions,Howitworks…,Seealso
sizefield,collectionstats/Howitworks…slaveDelayoption/Howtodoit…,Hidden,votes,slavedelayed,andbuildindexconfigurationssocialsecuritynumber
about/Howitworks…sortparameter/Howitworks…sparseindexes
about/Creatingandunderstandingsparseindexescreating/Howtodoit…,Howitworks…
sphericalindexes,MongoDBabout/SphericalindexesandGeoJSON-compliantdatainMongoDB,Howtodoit…working/Howitworks…
spring-data-mongodbused,fordevelopment/Developingusingspring-data-mongodb,Howtodoit…,Howitworks…project,URL/Seealso
spring-data-restURL/Seealso
SpringJavadocURL/Howitworks…
standarderror(stderr)/Howtodoit…Stemming
URL/Howitworks…storageSizefield,collectionstats/Howitworks…storageSizefield,databasestats/Howitworks…streaming
used,forrunningMapReducejobsonHadoop/RunningMapReducejobsonHadoopusingstreaming,Howitworks…,Howtodoit…
T$textoperator
URL/Seealso-toption/Howitworks…taggedreplicasets
building/Buildingtaggedreplicasets,Gettingready,Howtodoit…building,usecases/Buildingtaggedreplicasets
tagsused,fordomain-drivenshardingperformance/Performingdomain-drivenshardingusingtags,Howtodoit…,Howitworks…
testdatacreating/Creatingtestdata,Howtodoit…,Howitworks…
timecolumn/Howitworks…totalIndexSizefield,collectionstats/Howitworks…triggers
implementing,inMongoDBwithoplog/ImplementingtriggersinMongoDBusingoplog,Gettingready,Howitworks…
tsfield,db.currentOp()operation/Howitworks…TTLindex
about/ExpiringdocumentsafterafixedintervalusingtheTTLindex,Howtodoit…used,fordocumentexpiringafterfixedinterval/Gettingready,Howitworks…,There’smore…used,fordocumentexpiringatgiventime/ExpiringdocumentsatagiventimeusingtheTTLindex,Howitworks…
Uuniqueindexes,oncollection
creating/Creatinguniqueindexesoncollectionanddeletingtheexistingduplicatedataautomatically,Howtodoit…,Howitworks…
updatecolumn/Howitworks…updateFirstmethod,MongoTemplateclass/Howitworks…updateMultimethod,MongoTemplateclass/Howitworks…updateoperations
executing,withPyMongo/Gettingready,Howtodoit…,Howitworks…executing,withJavaclient/Howtodoit…,Howitworks…
updateparameter/Howitworks…upsertparameter/Howitworks…users
settingup,inMongoDB/SettingupusersinMongoDB,Howtodoit…,Howitworks…managing,onMMSconsole/ManagingusersandgroupsontheMMSconsole,Howtodoit…,Howitworks…
WwaitingForLockfield,operations/Howitworks…WindowsService
MongoDB,settingupas/SettingupMongoDBasaWindowsService,Howtodoit…
workingsetestimating/Estimatingtheworkingset,Howitworks…
WritableinterfaceURL/Seealso
writeconcernabout/Writeconcernanditssignificancesignificance/Writeconcernanditssignificancewkey/Writeconcernanditssignificancejkey/WriteconcernanditssignificanceFsynckey/Writeconcernanditssignificancewtimeoutoption/Writeconcernanditssignificancereplicaset,settingup/Settingupareplicaset
wtimeoutkey/Writeconcernanditssignificance