# <center><i class="fa fa-edit"></i> Navigating MongoDB: Indexing and Aggregation Pipeline </center> ###### tags: `Internship` :::info **Goal:** To gain a basic understanding of MongoDB fundamentals. Focus on indexing and aggregation pipeline. - [x] Aggregation Framework - [x] `sort()` and `limit()` - [x] Introduction to Indexes - [x] Introduction to Data Modeling - [x] Upsert - Update or Insert? **Resources:** [MongoDB Basics Online Course](https://university.mongodb.com/courses/M001/about) [MongoDB 常用Query指令](/b2Y7lgR1RaeQyMHp9RGHdQ) [Database Commands](https://www.mongodb.com/docs/manual/reference/command/) ::: ### Aggregation Framework - Exceeds filtering capabilities of MQL - compute - reshape - reoganize - Non-filtering stages do not modify the data. Work with the data in the cursor - Syntax is in form of a pipleine where stages are excuted in the order they are listed in **Operators** - `$group`: takes the incoming stream of data, and siphons it into multiple distinct reservoirs **Procedures** - Connect to Atlas Cluster ``` mongo "mongodb+srv://<username>:<password>@<cluster>.mongodb.net/admin" ``` - Navigate to database ``` use sample_airbnb ``` - Find all documents that have `Wifi` as one of the amenities. Only include `price` and `address` in the resulting cursor: ``` db.listingsAndReviews.find({ "amenities": "Wifi" }, { "price": 1, "address": 1, "_id": 0 }).pretty() ``` - **SYNTAX**: Using the aggregation framework find all documents that have `Wifi` as one of the amenities. *Only include* `price` and `address` in the resulting cursor: ``` db.listingsAndReviews.aggregate([ { "$match": { "amenities": "Wifi" } }, { "$project": { "price": 1, "address": 1, "_id": 0 }}]).pretty() ``` - Find one document in the collection and only include the `address` field in the resulting cursor ``` db.listingsAndReviews.findOne({ },{ "address": 1, "_id": 0 }) ``` - Project only the address field value for each document, then group all documents into one document per `address.country` value ``` db.listingsAndReviews.aggregate([ { "$project": { "address": 1, "_id": 0 }}, { "$group": { "_id": "$address.country" }}]) ``` - Project only the `address` field value for each document, then group all documents into one document per `address.country` value, and count one for each document in each group ``` db.listingsAndReviews.aggregate([ { "$project": { "address": 1, "_id": 0 }}, { "$group": { "_id": "$address.country", "count": { "$sum": 1 } } } ]) ``` ### `sort()` and `limit()` - Both cursor methods (applied to the results set that lives in the cursor) - MongoDB assumes that you first sort then limit regardless of the order of these methods typed **Procedures** - Connect to Atlas Cluster ``` mongo "mongodb+srv://<username>:<password>@<cluster>.mongodb.net/admin" ``` - Commands ``` use sample_training db.zips.find().sort({ "pop": 1 }).limit(1) db.zips.find({ "pop": 0 }).count() db.zips.find().sort({ "pop": -1 }).limit(1) db.zips.find().sort({ "pop": -1 }).limit(10) db.zips.find().sort({ "pop": 1, "city": -1 }) ``` ### Introduction to Indexes - Analogous to the index of a book - Special data structure that stores a small portion of the collection's data set in an easy to traverse form - Optimizes queries -> avoids sorting **Procedures** - Connect to Atlas Cluster ``` mongo "mongodb+srv://<username>:<password>@<cluster>.mongodb.net/admin" ``` - Commands ``` use sample_training db.trips.find({ "birth year": 1989 }) db.trips.find({ "start station id": 476 }).sort( { "birth year": 1 } ) db.trips.createIndex({ "birth year": 1 }) db.trips.createIndex({ "start station id": 1, "birth year": 1 }) ``` ### Introduction to Data Modeling - A way to organize fields in a document to support your application performance and querying capabilities - Evolving application implies an evolving data model ::: warning Rule of Thumb: Data is stored in the way that it is used ::: Example: two patients. Patient 1 has medical records and prescriptions, prefer to be contacted by email. Patient 2 does not. Want to optimize collection for docotors to use. What doctors need: - current prescriptions - diagnoses - patient contact info - ability to cross reference medication info - side effects - allergies - additional info Solution: - Patient Records collection: structure data that is most helpful for our application - Medications collections: for various medications - Everything that is reguarly queried together is stores together for faster retieval ### Upsert - Update or Insert? - Hybrid of `update` and `insert` commands - Syntax: `db.<collection>.updateOne({<query>}, {<update>}, {"upsert": true})` - By default, upsert is false - When set to true it will either `update` or `insert` - `update`: Documents that match the criteria for the update operation - `insert`: No documents that match the same criteria **Procedures** - Common Commands ``` db.iot.updateOne({ "sensor": r.sensor, "date": r.date, "valcount": { "$lt": 48 } }, { "$push": { "readings": { "v": r.value, "t": r.time } }, "$inc": { "valcount": 1, "total": r.value } }, { "upsert": true }) ```