MongoDB Basic Cluster Administration
ch3 Sharding
What is Sharding?
- Shards
- store distributed collections
- Config Server
- store metadata about each shards
- Mongos
When to Shard
-
When we reach the most powerful servers available, maximizing our vertical scale options.
- Sharding can provide an alternative to vertical scaling.
-
Data sovereignty laws require data to be located in a specific geography.
- Sharding allows us to store different pieces of data in specific countries or regions.
-
When holding more than 5TB per server and operational costs increase dramatically.
- Generally, when our deployment reaches 2-5TB per server, we should consider sharding.
Setting Up a Sharded Cluster
Config Server config file
Mongos config file
Config DB
- Never write any data to it
- informations
Shard Keys
- Shard key field must be indexed
- must be present in every document in the collection
- Shard key are immutable
- You cannot change the shard key field post-sharding
- before 4.2, You cannot change or update the value of the shard key field value post-sharding
- Shard key are permanent
- You cannot unshard a sharded collection
- If the collection is empty, sh.shardCollection() creates the index on the shard key if such an index does not already exists.
- If the collection is not empty, you must create the index first before using sh.shardCollection().
How to shard
- Use sh.enableSharding("<database>") to enable sharding for the specified database
- Use db.collection.createIndex() to create the index for your shard key fields
- Use sh.shardCollection( "<database>.<collection>", { shard key } ) to shard the collection
Picking a Good Shard Key
Cardinality
- High cardinality - many possible unique value gives more chunk
Frequency
- Low frequency - low repetition of values, distribution is going to be more even.
Monotonic Change
- Avoid shard key that change monotonically
Hashed Shard Keys
- shard key with hashed index
- Even distribution of shard keys on monotonically changing fields like dates
- Lost fast sorts, targeted queries on ranges of shard key values, or geographically isolated workloads.
- Hashed indexes are single field, non-array
Using hashed shard key
- Enable sharding for the specified database sh.enableSharding("<database>")
- Create the index for shard key field db.collection.createIndex("<field>": "hashed")
- Shard the collection sh.shardCollection("<database>.<collection>", { <shard key field>: "hashed" })
Chunks
- ChunkSize = 64MB by default
- 1MB <= ChunkSize <= 1024MB
Jumbo Chunks
- shard key 單向遞增或遞減
- 無法分出而過於肥大的 chunk
Balancing
- sh.startBalancer(timeout, interval)
- sh.stopBalancer(timeout, interval)
- sh.setBalncerState(boolean)
Config Server
Starting in MongoDB 3.4, the balancer runs on the primary of the config server replica set (CSRS)
-
Must have zero arbiters.
-
Must have no delayed members.
-
Must build indexes
-
Users should avoid writing directly to the config database in the course of normal operation or maintenance.
-
How to split chunks manually
- How to merge chunks manually
if the operations do not require running on the database’s primary shard, these operations will route the results to a random shard to merge the results to avoid overloading the primary shard for that database. The **lookup== stage require running on the database’s primary shard.
Remove an exist shard