Mongo DB 정리

# Mongo DB 정리 ### embedded vs reference - 1:1, 1:N 관계를 가질 때 embedded 방식을 사용하면 좋다. - M:N 관계를 가질 때 reference 방식을 사용하면 좋다. ### embedded ``` { _id: "joe", name: "Joe Bookreader", address: { street: "123 Fake Street", city: "Faketon", state: "MA", zip: "12345" } } // Address { pataron_id: "joe", street: "123 Fake Street", city: "Faketon", state: "MA", zip: "12345" } ``` ### reference ``` // Publisher { _id: "oreilly", name: "O'Reilly Media", founded: 1980, location: "CA" } // Book { _id: 123456789, title: "MongoDB: The Definitive Guide", author: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher_id: "oreilly" // <- Publisher._id } ``` ### 모델링 패턴 1. Attribute 패턴 ``` { title: "Star Wars", director: "George Lucas", ... release_US: ISODate("1977-05-20T01:00:00+01:00"), release_France: ISODate("1977-10-19T01:00:00+01:00"), release_Italy: ISODate("1977-10-20T01:00:00+01:00"), release_UK: ISODate("1977-12-27T01:00:00+01:00"), ... } { title: "Star Wars", director: "George Lucas", … releases: [ { location: "USA", date: ISODate("1977-05-20T01:00:00+01:00") }, { location: "France", date: ISODate("1977-10-19T01:00:00+01:00") }, { location: "Italy", date: ISODate("1977-10-20T01:00:00+01:00") }, { location: "UK", date: ISODate("1977-12-27T01:00:00+01:00") }, … ], … } ``` 위의 예에서 각 출시 날짜를 검색하려면 여러 필드를 검색해야 한다. 그러나 Attribute 패턴을 사용하여 배열로 관리하므로 배열에 대해 하나의 인덱스를 만들어 인덱싱을 쉽게 관리할 수 있다. 2. Extended Reference 패턴 여러개의 콜렉션으로 나누어 관리하면 성능적인 관점에서 join에 대해 성능적 문제가 발생한다. 자주 사용하는 필드만 복사하여 우선순위가 높거나 하는 필드를 포함시킨다 ``` ///customer { _id:1, name : "abc", street: "123 main st", city : "some where", country: "nation", ... } /// order { _id: 2 date : "2019-02-18", customer_id: 1, sipping_addr: { name: "abc", street: "123 main st", city : "some where", country: "nation", }, ... } ``` 데이터의 중복이 되는것을 신경 써야한다. 데이터가 자주 변경되지 않는 필드를 사용하면 좋다. 3. Subset 패턴 자주 사용되는 정보와 사용되지 않는 정보를 분리하여 작업 세트의 전체 크기를 줄인다. ``` user { name, id, level, address, ratings : [ { user_id, rating: "응답이 빨라요", // 평가 항목 8개 } ], image // url // 판매 목록 saleList : [ post_id, ], // 구매 목록 buyList: [ post_id, ], // 구매 요청 목록 } buyRequestList: { user_id, post_id } categories { name: string } ratings { rating : string } post { writer: user_id, title : string, content : string, images: [url: string], category: string, buyers : [ user_id ], state : string, // 구매가 되었는지를 나타내는 상태 bool으로 하면 예약 중 상태를 표현할 수 없어서 확장성을 고려해서 string으로 cost: number, uploadTime, // createdAt으로 자동으로 생성되면 안 써도 된다. } comment { user_id, content, post_id } ``` ### Replica Set - 안정성을 위해서 사용한다 - db 여러개를 생성후 Primary 와 Secondary 로 두어 Primary DB가 죽었을 경우 Secondary DB 가 PrimaryDB가 되면서 정상 작동 하는 법 - db 여러개를 띄운다 (아래와 같은 방법으로 3개의 DB를 만든다) ``` mongod --replSet replica --dbpath C:\Users\user\Desktop\test\db1 --port 30001 ``` - 그 후, 한 군데 접속하여 아래의 명령어를 통해 세 개의 DB를 replicaSet 설정을 해준다. ``` config = { _id : "replica" // option으로 넣은 태그명, members : [{_id:0,host:"localhost:30001"},{_id:1,host:"localhost:30002"},{_id:2,host:"localhost:30003"}] rs.initiate(config) 설정 으로 완료된다 ``` # 상황별 몽고디비 모델링 ## 1:1 관계 ### Embedded Document Pattern 하나의 document가 다른 하나를 포함하는 관계일 때 사용한다. ```javascript= { _id: "joe", name: "Joe Bookreader", address: { street: "123 Fake Street", city: "Faketon", state: "MA", zip: "12345" } } ``` ### Subset Pattern embedded document pattern의 잠재적인 문제점은 어플리케이션이 필요로 하지 않는 필드들을 포함해 document 크기가 커진다는 것이다. 대신에 subset pattern을 사용해서 한 데이터베이스의 call에서 가장 많이 접근 되는 subset을 가져오도록 하는 방법을 쓸 수 있다. 먄약 movie collection이 다음과 같을 때 ```javascript= { "_id": 1, "title": "The Arrival of a Train", "year": 1896, "runtime": 1, "released": ISODate("01-25-1896"), "poster": "http://ia.media-imdb.com/images/M/MV5BMjEyNDk5MDYzOV5BMl5BanBnXkFtZTgwNjIxMTEwMzE@._V1_SX300.jpg", "plot": "A group of people are standing in a straight line along the platform of a railway station, waiting for a train, which is seen coming at some distance. When the train stops at the platform, ...", "fullplot": "A group of people are standing in a straight line along the platform of a railway station, waiting for a train, which is seen coming at some distance. When the train stops at the platform, the line dissolves. The doors of the railway-cars open, and people on the platform help passengers to get off.", "lastupdated": ISODate("2015-08-15T10:06:53"), "type": "movie", "directors": [ "Auguste Lumière", "Louis Lumière" ], "imdb": { "rating": 7.3, "votes": 5043, "id": 12 }, "countries": [ "France" ], "genres": [ "Documentary", "Short" ], "tomatoes": { "viewer": { "rating": 3.7, "numReviews": 59 }, "lastUpdated": ISODate("2020-01-09T00:02:53") } } ``` 여기에는 simple overview에는 필요가 없는 정보들이 포함되어 있다. 이런 정보들을 하나의 collection에 다 모아놓기 보다는 collection을 나눠놓을 수 있다. ```javascript= // movie collection { "_id": 1, "title": "The Arrival of a Train", "year": 1896, "runtime": 1, "released": ISODate("1896-01-25"), "type": "movie", "directors": [ "Auguste Lumière", "Louis Lumière" ], "countries": [ "France" ], "genres": [ "Documentary", "Short" ], } // movie_details collection { "_id": 156, "movie_id": 1, // reference to the movie collection "poster": "http://ia.media-imdb.com/images/M/MV5BMjEyNDk5MDYzOV5BMl5BanBnXkFtZTgwNjIxMTEwMzE@._V1_SX300.jpg", "plot": "A group of people are standing in a straight line along the platform of a railway station, waiting for a train, which is seen coming at some distance. When the train stops at the platform, ...", "fullplot": "A group of people are standing in a straight line along the platform of a railway station, waiting for a train, which is seen coming at some distance. When the train stops at the platform, the line dissolves. The doors of the railway-cars open, and people on the platform help passengers to get off.", "lastupdated": ISODate("2015-08-15T10:06:53"), "imdb": { "rating": 7.3, "votes": 5043, "id": 12 }, "tomatoes": { "viewer": { "rating": 3.7, "numReviews": 59 }, "lastUpdated": ISODate("2020-01-29T00:02:53") } } ``` 이렇게 나누는 것이 read 성능을 높인다. 대부분의 request를 충족하는 data의 크기가 작아지기 때문이다. ### subset pattern의 trade off subset pattern을 사용하면 전체적인 working set의 크기를 줄인다. 또한 읽기 성능을 늘린다. 하지만 만약 적절하지 못하게 데이터를 여러개의 collection으로 나누면, 어플리케이션이 데이터를 가져올 때 multiple trip을 해야할 수도 있다. # 1:many 관계 [참고링크](https://docs.mongodb.com/manual/tutorial/model-embedded-one-to-many-relationships-between-documents/) 한명의 patron이 있고 여러개의 주소와 관계가 있을 때를 생각해보자. 하나의 context 속에서 다른 entity를 봐야할 때 embedding이 좋다. ```javascript= { "_id": "joe", "name": "Joe Bookreader", "addresses": [ { "street": "123 Fake Street", "city": "Faketon", "state": "MA", "zip": "12345" }, { "street": "1 Some Other Street", "city": "Boston", "state": "MA", "zip": "12345" } ] } ``` ## subset pattern e-commerce 사이트에서 상품에 대한 리뷰 데이터를 갖고 있다고 하자. ```javascript= { "_id": 1, "name": "Super Widget", "description": "This is the most useful item in your toolbox.", "price": { "value": NumberDecimal("119.99"), "currency": "USD" }, "reviews": [ { "review_id": 786, "review_author": "Kristina", "review_text": "This is indeed an amazing widget.", "published_date": ISODate("2019-02-18") }, { "review_id": 785, "review_author": "Trina", "review_text": "Nice product. Slow shipping.", "published_date": ISODate("2019-02-17") }, ... { "review_id": 1, "review_author": "Hans", "review_text": "Meh, it's okay.", "published_date": ISODate("2017-12-06") } ] } ``` 리뷰는 생성된 시간의 반대로 정렬되어 있다. 사용자가 상품 페이지에 접근했을 때 최근 10개의 리뷰를 가져온다. 모든 리뷰를 product와 함게 저장하지 않고 두 개의 collection으로 나눌 수 있다. product collection에는 10개의 최신 리뷰만 가지고 있다. ```javascript= { "_id": 1, "name": "Super Widget", "description": "This is the most useful item in your toolbox.", "price": { "value": NumberDecimal("119.99"), "currency": "USD" }, "reviews": [ { "review_id": 786, "review_author": "Kristina", "review_text": "This is indeed an amazing widget.", "published_date": ISODate("2019-02-18") } ... { "review_id": 776, "review_author": "Pablo", "review_text": "Amazing!", "published_date": ISODate("2019-02-16") } ] } ``` review collection에는 모든 리뷰가 저장되어 있다. ``` { "review_id": 786, "product_id": 1, "review_author": "Kristina", "review_text": "This is indeed an amazing widget.", "published_date": ISODate("2019-02-18") } { "review_id": 785, "product_id": 1, "review_author": "Trina", "review_text": "Nice product. Slow shipping.", "published_date": ISODate("2019-02-17") } ... { "review_id": 1, "product_id": 1, "review_author": "Hans", "review_text": "Meh, it's okay.", "published_date": ISODate("2017-12-06") } ``` ### subset pattern의 trade off data 복제가 생길 수 있다. 예를들어, 리뷰가 product collection과 reivews collection 모두에 있을 수 있다. 두 collection 사이에 데이터 일관성이 있도록 처리를 해주어야한다. 또한 product collection이 가장 최근의 collection이 되도록 유지해주어야한다. ## 1:many with references [참고링크](https://docs.mongodb.com/manual/tutorial/model-referenced-one-to-many-relationships-between-documents/) 출판사와 책의 관계를 생각해보자 publisher 정보의 반복을 피하기 위해서 referencing이 좋다. embedding 방식을 쓰면 다음과 같이 된다. ```javascript= { title: "MongoDB: The Definitive Guide", author: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher: { name: "O'Reilly Media", founded: 1980, location: "CA" } } { title: "50 Tips and Tricks for MongoDB Developer", author: "Kristina Chodorow", published_date: ISODate("2011-05-06"), pages: 68, language: "English", publisher: { name: "O'Reilly Media", founded: 1980, location: "CA" } } ``` publisher 데이터의 중복을 줄이기 위해서 reference를 쓴다. reference 방식을 쓸 때 관계의 증가가 어디에 reference를 저장할지를 결정한다. 만약 한 출판사에 대한 책의 수가 작고 증가 폭이 제한되어 있다면 출판사 document 안에 책 reference를 넣는 것이 효율적이다. 하지만 출판사에 대한 책의 수가 크고 제한이 없다면 다음과 같이 mutable, growing array가 된다. ```javascript= { name: "O'Reilly Media", founded: 1980, location: "CA", books: [123456789, 234567890, ...] } { _id: 123456789, title: "MongoDB: The Definitive Guide", author: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English" } { _id: 234567890, title: "50 Tips and Tricks for MongoDB Developer", author: "Kristina Chodorow", published_date: ISODate("2011-05-06"), pages: 68, language: "English" } ``` > 숫자가 제한이 되어있지 않은 array는 왜 안 좋은가? document의 사이즈가 예상하지 못한 사이즈로 커질 수 있다. 배열이 계속해서 증가하면 array에 index를 지정하는 것이 점점 성능이 나빠질 수 있다. 나중에는 BSON document size 한계를 넘을 수 있다. [참고](https://docs.atlas.mongodb.com/schema-suggestions/avoid-unbounded-arrays/) [참고: blog와 growing comments](https://stackoverflow.com/questions/9306815/mongodb-performance-with-growing-data-structure) mutable하고 growing array를 피하기 위해 출판사 reference를 책 document 안에 저장한다. ```javascript= // publisher { _id: "oreilly", name: "O'Reilly Media", founded: 1980, location: "CA" } //book { _id: 123456789, title: "MongoDB: The Definitive Guide", author: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher_id: "oreilly" } { _id: 234567890, title: "50 Tips and Tricks for MongoDB Developer", author: "Kristina Chodorow", published_date: ISODate("2011-05-06"), pages: 68, language: "English", publisher_id: "oreilly" } ``` ## keyword 검색이 가능한 데이터 모델 [참고링크](https://docs.mongodb.com/manual/tutorial/model-data-for-keyword-search/) keyword가 document에 저장이 되어있는 경우에 사용한다. keyword base query를 지원하기 위한 구조를 추가하기 위해서 array field를 만들고 문자열을 array에 넣어놓는다. 그리고 array에 multi-key index를 추가할 수 있다. ```javascript= { title : "Moby-Dick" , author : "Herman Melville" , published : 1851 , ISBN : 0451526996 , topics : [ "whaling" , "allegory" , "revenge" , "American" , "novel" , "nautical" , "voyage" , "Cape Cod" ] } ``` ```javascript= db.volumes.createIndex( { topics: 1 } ) db.volumes.findOne( { topics : "voyage" }, { title: 1 } ) ``` ``` users { name, id, level, address, image // url ratings: [1,23,4,5,3,2,65,3]; // 평 } ~~sale_list~~ { user_id, post_id } ~~buy_list~~ { user_id, post_id } // sale list와 buy list 를 하나의 거래완료 collection으로 나타낼 수 있다. buy_request_list { user_id, post_id } ~~user_ratings:~~ { user_id, //평가 당한 사람 user_id, // 평가 한 사람 post_id, } // 게시물당 하나의 평가만 할 수 있기 때문에 post안으로 평가를 했는지를 나타내는 것을 넣어도 될 것 같다. ~~user_rating_histories~~: { user_id, // 평가당한 사람 id histories: [1,23,4,5,3,2,65,3]; // 평 } // rating 정보를 유저 안으로 넣어도 될 것 같다. 크기가 크지 않고 고정 크기의 배열이라서. ===== user_post: { user_id, // 판매자 user_id, // 구매자 post_id: // 게시글 } // 요청을 따로 하는 경우도 많이 있어서 collection을 분리하는 것이 더 좋을 것 같다. // buyer request 리스트가 따로 있으면 writer, buyer, post가 1:1 관계가 되어서 따로 분리할 필요가 없을 것 같아서 post 안에 넣어주는 것이 좋을 것 같다. categories { name: string } post { writer: user_id, buyer_id: user_id title : string, content : string, images: [url: string], // 개수 제한 category: string, ~~recent_comments~~ : [] // 최근 10개. 최근 댓글로 업데이트 하는 문제. 여러 사람이 동시에 댓글을 작성할 때 lock이 걸려서 post 읽기가 느려지는 문제. ~~state~~ : string, // 구매가 되었는지를 나타내는 상태 bool으로 하면 예약 중 상태를 표현할 수 없어서 확장성을 고려해서 string으로 cost: number, uploadTime, // createdAt으로 자동으로 생성되면 안 써도 된다 } post_comments { post_id, writer_id, content } ``` ## 궁금한점 #### 1. buy list와 sale list를 users 안에 넣는 것이 좋을까? ``` users { name, id, level, address, image // url ratings: [1,23,4,5,3,2,65,3]; // 평 } sale_list { user_id, post_id } buy_list { user_id, post_id } ``` - 장점 join을 없애서 읽기 성능을 높일 수 있다. - 단점 sale list와 buy list의 크기가 계속해서 커져서 document 크기가 커질 수 있다. 답변 : 거래 완료내역만 추가되는 별개의 테이블이 있으면 어떨까요? no sql이 느리다곤 하지만 join을 없애서 읽기 성능이 얼마만큼 좋아지는지… 의문입니다 ㅎㅎ - sale list, buy list를 합쳐서 collection을 만들고 index를 해놓으면 읽기를 하는 속도가 빨라질 수 있을 것 같다. - 그런데 buy request list는 삭제를 많이 해야하는데 이 때 index가 있으면 삭제 성능이 나빠지기 때문에 buy request list는 따로 두는 것이 좋을 것 같다. #### 2. recent comments를 post 안에 넣는 것이 좋을까? ``` post { writer: user_id, title : string, content : string, images: [url: string], // 개수 제한 category: string, recent_comments : [] // 최근 10개. 최근 댓글로 업데이트 하는 문제. 여러 사람이 동시에 댓글을 작성할 때 lock이 걸려서 post 읽기가 느려지는 문제. state : string, // 구매가 되었는지를 나타내는 상태 bool으로 하면 예약 중 상태를 표현할 수 없어서 확장성을 고려해서 string으로 cost: number, uploadTime, // createdAt으로 자동으로 생성되면 안 써도 된다. } post_comments { post_id, writer_id, content } ``` - 장점 post를 읽을 때 최근 댓글을 빠르게 읽을 수 있다. - 단점 댓글을 여러명이 한번에 작성하면 document에 lock이 걸려서 해당 document의 읽기 성능이 나빠진다. 검색했을 때 가져오는 post data에 댓글 데이터는 포함이 안돼도 된다. 최신 댓글 10개를 유지하는 문제가 있다. ###### tags: `tech sharing`