Try   HackMD

MongoDB Basic Cluster Administration
ch3 Sharding

tags: MongoDB University M103

What is Sharding?

  • Shards
    • store distributed collections
  • Config Server
    • store metadata about each shards
  • Mongos
    • routes queries to shards

When to Shard

  • When we reach the most powerful servers available, maximizing our vertical scale options.

    • Sharding can provide an alternative to vertical scaling.
  • Data sovereignty laws require data to be located in a specific geography.

    • Sharding allows us to store different pieces of data in specific countries or regions.
  • When holding more than 5TB per server and operational costs increase dramatically.

    • Generally, when our deployment reaches 2-5TB per server, we should consider sharding.

Setting Up a Sharded Cluster

Config Server config file

sharding: clusterRole: configsvr replication: replSetName: m103-csrs security: keyFile: /var/mongodb/pki/m103-keyfile net: bindIp: 127.0.0.1,192.168.103.100 port: 27000 systemLog: destination: file path: /var/mongodb/db/csrs1.log logAppend: true processManagement: fork: true storage: dbPath: /var/mongodb/db/csrs1

Mongos config file

sharding: configDB: m103-csrs/192.168.103.100:26001,192.168.103.100:26002,192.168.103.100:26003 security: keyFile: /var/mongodb/pki/m103-keyfile net: bindIp: 127.0.0.1,192.168.103.100 port: 26000 systemLog: destination: file path: /var/mongodb/db/mongos.log logAppend: true processManagement: fork: true

Config DB

  • Never write any data to it
  • informations
    • shards
    • chunks
    • mongos

Shard Keys

  • Shard key field must be indexed
  • must be present in every document in the collection
  • Shard key are immutable
    • You cannot change the shard key field post-sharding
    • before 4.2, You cannot change or update the value of the shard key field value post-sharding
  • Shard key are permanent
    • You cannot unshard a sharded collection
  • If the collection is empty, sh.shardCollection() creates the index on the shard key if such an index does not already exists.
  • If the collection is not empty, you must create the index first before using sh.shardCollection().

How to shard

  1. Use sh.enableSharding("<database>") to enable sharding for the specified database
  2. Use db.collection.createIndex() to create the index for your shard key fields
  3. Use sh.shardCollection( "<database>.<collection>", { shard key } ) to shard the collection

Picking a Good Shard Key

Cardinality

  • High cardinality - many possible unique value gives more chunk

Frequency

  • Low frequency - low repetition of values, distribution is going to be more even.

Monotonic Change

  • Avoid shard key that change monotonically

Hashed Shard Keys

  • shard key with hashed index
  • Even distribution of shard keys on monotonically changing fields like dates
  • Lost fast sorts, targeted queries on ranges of shard key values, or geographically isolated workloads.
  • Hashed indexes are single field, non-array

Using hashed shard key

  1. Enable sharding for the specified database sh.enableSharding("<database>")
  2. Create the index for shard key field db.collection.createIndex("<field>": "hashed")
  3. Shard the collection sh.shardCollection("<database>.<collection>", { <shard key field>: "hashed" })

Chunks

  • ChunkSize = 64MB by default
  • 1MB <= ChunkSize <= 1024MB

Jumbo Chunks

  • shard key 單向遞增或遞減
  • 無法分出而過於肥大的 chunk

Balancing

  • sh.startBalancer(timeout, interval)
  • sh.stopBalancer(timeout, interval)
  • sh.setBalncerState(boolean)

Config Server

Starting in MongoDB 3.4, the balancer runs on the primary of the config server replica set (CSRS)

  • Must have zero arbiters.

  • Must have no delayed members.

  • Must build indexes

  • Users should avoid writing directly to the config database in the course of normal operation or maintenance.

  • How to split chunks manually

db.adminCommand( { split: "myapp.users", middle: { email : prefix } } );
  • How to merge chunks manually
db.adminCommand( { mergeChunks: "test.members", bounds: [ { "username" : "user69816" }, { "username" : "user96401" } ] } )

if the operations do not require running on the database’s primary shard, these operations will route the results to a random shard to merge the results to avoid overloading the primary shard for that database. The **

outstageandthe==lookup== stage require running on the database’s primary shard.

Remove an exist shard

db.adminCommand( { removeShard: "mongodb0" } )