M220P: MongoDB for Python Developers

# M220P: MongoDB for Python Developers #### MongoDB Atlas Cluste MFlix uses MongoDB to persist all of its data. > 先在Atlas新建帳號後，新增免費的資料庫。 > 加入可讀寫的帳號 > 加入可進入的IP > 連結方式選擇 shell Go to your cluster Overview -> Connect -> Connect Your Application. Select the option corresponding to your local MongoDB version and copy the mongo connection command. 先到資料的目錄裡 > #navigate to mflix-python directory > $cd mflix-python 上傳資料到atlas的資料庫 >#import data into Atlas ``` $ mongorestore --drop --gzip --uri "mongodb+srv://renny:1234@mflix-vc4fl.gcp.mongodb.net/test?retryWrites=true&w=majority" data ``` Rename this file to .ini with the following command: > $mv dotini_unix .ini # on Unix > $ren dotini_win .ini # on Windows you need to edit it, you can open it from there: > $vi .ini # on Unix > $notepad .ini # on Windows To start MFlix, run the following command: > $pipenv install -r requirements.txt > $python run.py To open and run the notebooks in this course, navigate to the notebooks directory and launch the Jupyter notebook server: > $cd mflix-python/notebooks > $jupyter notebook ---- > Good tool :+1: > [Jupyter套件管理-搭配Pipenv](https://blog.thecodingday.com/2019/02/jupyter%E5%A5%97%E4%BB%B6%E7%AE%A1%E7%90%86-%E6%90%AD%E9%85%8Dpipenv/) > ### First write > http://api.mongodb.com/python/current/tutorial.html ```python= import pymongo uri = "mongodb+srv://renny:1234@mflix-vc4fl.gcp.mongodb.net/test?retryWrites=true&w=majority" client = pymongo.MongoClient(uri) ``` ```python= client.stats ``` > Database(MongoClient(host=['mflix-shard-00-02-vc4fl.gcp.mongodb.net:27017', 'mflix-shard-00-00-vc4fl.gcp.mongodb.net:27017', 'mflix-shard-00-01-vc4fl.gcp.mongodb.net:27017'], document_class=dict, tz_aware=False, connect=True, authsource='admin', replicaset='mflix-shard-0', ssl=True, retrywrites=True, w='majority'), 'stats') ```python= client.list_database_names() ``` > ['mflix', 'admin', 'local'] ```python= mflix = client.mflix mflix.list_collection_names() ``` > ['theaters', 'comments', 'users', 'sessions', 'movies'] ```python= movies = mflix.movies movies.count_documents({}) ``` > 45993 ```python= client = pymongo.MongoClient(uri, connectTimeoutMS=200, retryWrites=True) client.stats ``` > Database(MongoClient(host=['mflix-shard-00-01-vc4fl.gcp.mongodb.net:27017', 'mflix-shard-00-02-vc4fl.gcp.mongodb.net:27017', 'mflix-shard-00-00-vc4fl.gcp.mongodb.net:27017'], document_class=dict, tz_aware=False, connect=True, authsource='admin', replicaset='mflix-shard-0', ssl=True, retrywrites=True, w='majority', connecttimeoutms=200), 'stats') ### first read ```python= import pymongo uri = "mongodb+srv://renny:1234@mflix-vc4fl.gcp.mongodb.net/test?retryWrites=true&w=majority" client = pymongo.MongoClient(uri) mflix = client.mflix movies = mflix.movies ``` ``` They are used to query documents in a collection. This is true, both find() and find_one() are used for this purpose. They accept a query predicate. This is true, both methods allow a query predicate to target specific documents. They accept a field projection. This is true, both methods allow a field projection to suppress certain fields. Incorrect : They return a cursor. The method find() will always return a cursor. However, the find_one() method will never return a cursor. It will either return a document (in a Python dictionary) if one was found, or None if no document was not found. ``` 在pymongo中使用find是得到Cursor，若想要像MongoDB shell中find操作 > db.test.find() { "_id" : ObjectId("5838531e0f3577fc9178b834"), "name" : "zhangsan" } 在pymongo中需要使用find_one方法而不是find方法: > $print db.test.find_one() {u'_id': ObjectId('5838531e0f3577fc9178b834'), u'name': u'zhangsan'} > $ print db.test.find() <pymongo.cursor.Cursor at 0x7f4ac789e450> ```python= result = [] for x in db.test.find(): result.append(x) print(result) #[{u'_id': ObjectId('5838531e0f3577fc9178b834'), u'name': u'zhangsan'},... ``` 其它方式 ```python= array = list(posts.find())#posts是我的collection type(array) #list #list to datafram import pandas as pd df = pd.DataFrame(array) ``` from bson.json_util import dumps dumps ```python= print 'DATA:', repr(data) print 'repr(data) :', len(repr(data)) print 'dumps(data) :', len(json.dumps(data)) print 'dumps(data, indent=2) :', len(json.dumps(data, indent=4)) print 'dumps(data, separators):', len(json.dumps(data, separators=(',',':'))) ``` ```python= cursor = movies.find( { "cast": "Salma Hayek" } ) from bson.json_util import dumps print(dumps(cursor, indent=2)) cursor = movies.find( { "cast": "Salma Hayek" }, { "title": 1, "_id": 0 } ) print(dumps(cursor, indent=2)) ``` #### Limiting > And the find() method is always gonna return a cursor to us. But before assigning that cursor to a variable, we've transformed it with the limit() method, to make sure no more than 2 documents are returned by this cursor. ```python= import pymongo from bson.json_util import dumps uri = "mongodb+srv://renny:1234@mflix-vc4fl.gcp.mongodb.net/test?retryWrites=true&w=majority" client = pymongo.MongoClient(uri) mflix = client.mflix movies = mflix.movies limited_cursor = movies.find( { "directors": "Sam Raimi" }, { "_id": 0, "title": 1, "cast": 1 } ).limit(2) print(dumps(limited_cursor, indent=2)) pipeline = [ { "$match": { "directors": "Sam Raimi" } }, { "$project": { "_id": 0, "title": 1, "cast": 1 } }, { "$limit": 2 } ] limited_aggregation = movies.aggregate( pipeline ) print(dumps(limited_aggregation, indent=2)) ``` #### Sorthing > This is an example of the sort() (point) cursor method. sort() takes two parameters, the key we're sorting on and the sorting order. In this example we're sorting on year (point), in increasing (point) order. > > ASCENDING and DESCENDING are values from the pymongo library to specify sort direction, but they're really just the integers 1 and -1. ```python= from pymongo import DESCENDING, ASCENDING sorted_cursor = movies.find( { "directors": "Sam Raimi" }, { "_id": 0, "year": 1, "title": 1, "cast": 1 } ).sort("year", DESCENDING) print(dumps(sorted_cursor, indent=2)) # aggregate pipeline = [ { "$match": { "directors": "Sam Raimi" } }, { "$project": { "_id": 0, "year": 1, "title": 1, "cast": 1 } }, { "$sort": { "year": ASCENDING } } ] sorted_aggregation = movies.aggregate( pipeline ) print(dumps(sorted_aggregation, indent=2)) #When sorting on two or more keys, the sort() method takes a single argument, an array of tuples. And each tuple has a key and a sort order. sorted_cursor = movies.find( { "cast": "Tom Hanks" }, { "_id": 0, "year": 1, "title": 1, "cast": 1 } ).sort([("year", ASCENDING), ("title", DESCENDING)]) print(dumps(sorted_cursor, indent=2)) #aggregate pipeline = [ { "$match": { "cast": "Tom Hanks" } }, { "$project": { "_id": 0, "year": 1, "title": 1, "cast": 1 } }, { "$sort": { "year": ASCENDING, "title": ASCENDING } } ] sorted_aggregation = movies.aggregate( pipeline ) print(dumps(sorted_aggregation, indent=2)) ``` #### Skipping > The skip() method allows us to skip documents in a collection, so only documents we did not skip appear in the cursor. Because we only have 15 documents, skipping 14 of them should only leave us with 1. ```python= pipeline = [ { "$match": { "directors": "Sam Raimi" } }, { "$project": { "_id": 0, "title": 1, "cast": 1 } }, { "$count": "num_movies" } ] sorted_aggregation = movies.aggregate( pipeline ) print(dumps(sorted_aggregation, indent=2)) ''' [ { "num_movies": 15 } ] ''' ``` ```python= skipped_sorted_cursor = movies.find( { "directors": "Sam Raimi" }, { "_id": 0, "title": 1, "year": 1, "cast": 1 } ).sort("year", 1).skip(14) print(dumps(skipped_sorted_cursor, indent=2)) pipeline = [ { "$match": { "directors": "Sam Raimi" } }, { "$project": { "_id": 0, "year": 1, "title": 1, "cast": 1 } }, { "$sort": { "year": ASCENDING } }, { "$skip": 10 } ] sorted_skipped_aggregation = movies.aggregate( pipeline ) print(dumps(sorted_skipped_aggregation, indent=2)) ``` #### Summary > .limit() == $limit > .sort() == $sort > .skip() == $skip > So just to recap, in this lesson we covered some cursor methods and their aggregation equivalents. Remember that there won't always be a 1 to 1 mapping, because the aggregation framework can do a lot more than cursors can. > > But these three methods exist as both aggregation stages and cursor methods. #### Basic Aggregation ![](https://i.imgur.com/RXyt6w8.png) Using Compass' Aggregation Pipeline Builder feature ![](https://i.imgur.com/R6lkzyY.png) > $project 選擇集合中要的欄位，並可進行修改。 > $match 篩選操作，可以減少不需要的資料。 > $group 可以欄位進行分組。 > $unwind 拆開，可以將陣列欄位拆開成多個document。 > $sort 可針對欄位進行排序。 > $limit 可針對回傳結果進行數量限制。 > $skip 略過前n筆資料，在開始回傳。 #### Write Concerns > https://medium.com/@sj82516/mongodb-isolation-%E8%88%87-transaction-132ab29731c2 > w: 0 > > This will not ask for an acknowledgement from any of the nodes in the set. > > w: 1 > > This will only ask for an acknowledgement from one of the nodes in the set. > > w: majority > > This will ask for an acknowledgement from a majority of nodes in the set. #### Basic Updates Using faker create fake data > https://github.com/joke2k/faker ###### tags: `mongodb` `python`