# MongoDB Basic Cluster Administration<br />ch3 Sharding ###### tags: `MongoDB University M103` ## What is Sharding? * Shards * store distributed collections * Config Server * store metadata about each shards * Mongos * routes queries to shards ## When to Shard * When we reach the most powerful servers available, maximizing our vertical scale options. * Sharding can provide an alternative to vertical scaling. * Data sovereignty laws require data to be located in a specific geography. * Sharding allows us to store different pieces of data in specific countries or regions. * When holding more than 5TB per server and operational costs increase dramatically. * Generally, when our deployment reaches 2-5TB per server, we should consider sharding. ## Setting Up a Sharded Cluster ### Config Server config file ```yaml= sharding: clusterRole: configsvr replication: replSetName: m103-csrs security: keyFile: /var/mongodb/pki/m103-keyfile net: bindIp: 127.0.0.1,192.168.103.100 port: 27000 systemLog: destination: file path: /var/mongodb/db/csrs1.log logAppend: true processManagement: fork: true storage: dbPath: /var/mongodb/db/csrs1 ``` ### Mongos config file ```yaml= sharding: configDB: m103-csrs/192.168.103.100:26001,192.168.103.100:26002,192.168.103.100:26003 security: keyFile: /var/mongodb/pki/m103-keyfile net: bindIp: 127.0.0.1,192.168.103.100 port: 26000 systemLog: destination: file path: /var/mongodb/db/mongos.log logAppend: true processManagement: fork: true ``` ## Config DB * Never write any data to it * informations * shards * chunks * mongos ## Shard Keys * Shard key field must be indexed * must be present in every document in the collection * Shard key are immutable * You cannot change the shard key field post-sharding * before 4.2, You cannot change or update the value of the shard key field value post-sharding * Shard key are permanent * You cannot unshard a sharded collection * If the collection is empty, sh.shardCollection() creates the index on the shard key if such an index does not already exists. * If the collection is not empty, you must create the index first before using sh.shardCollection(). ### How to shard 1. Use ==sh.enableSharding("<database>")== to enable sharding for the specified database 2. Use ==db.collection.createIndex()== to create the index for your shard key fields 3. Use ==sh.shardCollection( "<database>.<collection>", { shard key } )== to shard the collection ## Picking a Good Shard Key ### Cardinality * High cardinality - many possible unique value gives more chunk ### Frequency * Low frequency - low repetition of values, distribution is going to be more even. ### Monotonic Change * Avoid shard key that change monotonically ## Hashed Shard Keys * shard key with hashed index * Even distribution of shard keys on monotonically changing fields like dates * Lost fast sorts, targeted queries on ranges of shard key values, or geographically isolated workloads. * Hashed indexes are single field, non-array ### Using hashed shard key 1. Enable sharding for the specified database ==sh.enableSharding("<database>")== 2. Create the index for shard key field ==db.collection.createIndex("<field>": "hashed")== 3. Shard the collection sh.shardCollection("<database>.<collection>", { <shard key field>: "hashed" }) ## Chunks * ChunkSize = 64MB by default * 1MB <= ChunkSize <= 1024MB ### Jumbo Chunks * shard key 單向遞增或遞減 * 無法分出而過於肥大的 chunk ## Balancing * sh.startBalancer(timeout, interval) * sh.stopBalancer(timeout, interval) * sh.setBalncerState(boolean) ### Config Server :::info Starting in MongoDB 3.4, the balancer runs on the primary of the config server replica set (CSRS) ::: * Must have zero arbiters. * Must have no delayed members. * Must build indexes * Users **should avoid writing directly to the config database** in the course of normal operation or maintenance. * How to split chunks manually ```javascript= db.adminCommand( { split: "myapp.users", middle: { email : prefix } } ); ``` * How to merge chunks manually ```javascript= db.adminCommand( { mergeChunks: "test.members", bounds: [ { "username" : "user69816" }, { "username" : "user96401" } ] } ) ``` :::warning if the operations do not require running on the database’s primary shard, these operations will route the results to a **random shard** to merge the results to avoid overloading the primary shard for that database. The **$out** stage and the ==$lookup== stage require running on the database’s **primary shard**. ::: ### Remove an exist shard ```javascript= db.adminCommand( { removeShard: "mongodb0" } ) ```