Machine Learning Newsletter

MongoDB Notes

Some Properties:

  • MongoDB instances act as high-level container.
  • Collection is a synonym of table in sql.
  • Collections are made by documents. (document => row)
  • Document is made by fields. (field => column)
  • Indices are similar to sql databases.
  • Cursor can count or skip ahead without actually pulling down data.

A collection does not have a schema to follow. Therefore, fields are tracked with each individual document.

_id field is automatically generated by MongoDB, and every document must have a unique _id field.

Adding queries

db.matrix.insert({name : 'Agent Smith', gender : 'm', age : 44})

and db.matrix.findOne() results in;

{
    "_id" : ObjectId("4fcf8b5321f61bf6f63ddd78"),
    "name" : "Agent Smith",
    "gender" : "m",
    "age" : 44
}

If we add another person:

db.matrix.insert({name : 'Neo', gender : 'm', age : 40})

and call db.matrix.find() then we get;

{ "_id" : ObjectId("4fcf8b5321f61bf6f63ddd78"), "name" : "Agent Smith", "gender" : "m", "age" : 44 }
{ "_id" : ObjectId("4fcf8ce921f61bf6f63ddd79"), "name" : "Neo", "gender" : "m", "age" : 40 }

Similarly add two more people:

db.matrix.insert({name : 'Trinity', gender : 'f', age : 35})
db.matrix.insert({name : 'Morpheus', gender : 'm', age : 45})

And we have total of four people in the database and we could print by using find() method as db.matrix.find() and return the list of documents:

{ "_id" : ObjectId("4fcf8b5321f61bf6f63ddd78"), "name" : "Agent Smith", "gender" : "m", "age" : 44 }
{ "_id" : ObjectId("4fcf8ce921f61bf6f63ddd79"), "name" : "Neo", "gender" : "m", "age" : 40 }
{ "_id" : ObjectId("4fcf8df921f61bf6f63ddd7a"), "name" : "Trinity", "gender" : "f", "age" : 35 }
{ "_id" : ObjectId("4fcf8e0821f61bf6f63ddd7b"), "name" : "Morpheus", "gender" : "m", "age" : 45 }

Architecture seems to be male, but he is considered to be ageless. Therefore, when we insert the document in database, we can simply ignore his age information as such:

db.matrix.insert({name : 'Architecture', gender : 'm'})

and when we return documents, mongo would not cause any problem at all.

{ "_id" : ObjectId("4fcf8b5321f61bf6f63ddd78"), "name" : "Agent Smith", "gender" : "m", "age" : 44 }
{ "_id" : ObjectId("4fcf8ce921f61bf6f63ddd79"), "name" : "Neo", "gender" : "m", "age" : 40 }
{ "_id" : ObjectId("4fcf8df921f61bf6f63ddd7a"), "name" : "Trinity", "gender" : "f", "age" : 35 }
{ "_id" : ObjectId("4fcf8e0821f61bf6f63ddd7b"), "name" : "Morpheus", "gender" : "m", "age" : 45 }
{ "_id" : ObjectId("4fcfa18821f61bf6f63ddd7c"), "name" : "Architecture", "gender" : "m" }

After creating the database, we need to be able to select some queries based on their fields. In order to do it, we need to use selectors which are very similart to where clause of Sql statement. simplest one is {} which returns all documents in the collection. null also does the same thing as such:

db.matrix.find({})
db.matrix.find(null)

returns all the documents in the collection. and statement is accomplished in the form of:

{field1 : value1, field2 : value2}

which is very intuitive. In this example, if we want to return males whose ages are more or equal to 44, then we need to write a selector as such:

db.matrix.find({gender: 'm', age: {$gte: 44}})

Some of common operations are: \(lt => less than \(lte=> less or equal \(gt=> greater than \(gte=> greater or equal \(ne => not equal

If we take harder example like a field defines an array, in a particular example, let it be sports which students like, create the database as such:

{ "_id" : ObjectId("4fcfce79fd6230c28d817740"), "name" : "john", "likes" : [ "basketball", "football", "baseball", "swimming" ] }
{ "_id" : ObjectId("4fcfcf02fd6230c28d817742"), "name" : "mike", "likes" : [ "basketball", "football", "tennis", "rugby" ] }
{ "_id" : ObjectId("4fcfcf08fd6230c28d817743"), "name" : "cassandra", "likes" : [ "golf", "table-tennis", "tennis" ] }
{ "_id" : ObjectId("4fcfcf11fd6230c28d817744"), "name" : "paul", "likes" : [ "golf", "table-tennis", "tennis", "rugby" ] }

If we would like to retrieve students who likes golf or tennis:

db.students.find({$or : [{likes : 'tennis'},{likes : 'golf'}]})

Then we get queries;

{ "_id" : ObjectId("4fcfcf02fd6230c28d817742"), "name" : "mike", "likes" : [ "basketball", "football", "tennis", "rugby" ] }
{ "_id" : ObjectId("4fcfcf08fd6230c28d817743"), "name" : "cassandra", "likes" : [ "golf", "table-tennis", "tennis" ] }
{ "_id" : ObjectId("4fcfcf11fd6230c28d817744"), "name" : "paul", "likes" : [ "golf", "table-tennis", "tennis", "rugby" ] }

as expected. In these array fields, it is very easy to combine some of the fields let alone one of the field query returning. It becomes extremely useful as time goes by. One operation is \(in which tries to determine whether the values are in the array.

db.scores.find({a:{'$in':[2,3,4]}})

Another similar and useful operation is exists operation which checks whether the value matches to any value in the database checking every field in the collection. Id's of documents can also be selected using _id field in the collection as such:

db.matrix.find({_id: ObjectId("4fd0685b4e0fa619963db3b3")})

and it results in the respective document:

{ "_id" : ObjectId("4fd0685b4e0fa619963db3b3"), "name" : "Morpheus", "gender" : "m", "age" : 45 }

If we have document which have common fields and want to count them, we could do so by using count operation.

db.matrix.count({name: "Morpheus"})

and it returns 2.(I added the same item twice). If it does not find it, it returns 0 as expected.

Removing Queries

We could also erase the documents based on their properties as such:

db.matrix.remove({name: "Morpheus"})

If we want to remove all the entries, we could simply do not give any field information or put null in remove operation:

db.matrix.remove()

For updates, let's first create a database in a different syntax:

db.users.save({name: 'John', languages: ['ruby', 'c','java', 'javascript']});
db.users.save({name: 'Sue', languages: ['haskell', 'lisp','python','lush']});

In order to update, we need to first select the document based on its one of the field, in this example it would be the names of people and using update() operation, we could update as such:

db.users.update({name: 'John'}, {name: 'Johnny', languages: ['scala','java','python']});

Instead of updating the entire document, we could update only the fields:

db.users.update({name:'Sue'},{'$set': {age:25}})

And when we try to print out the collection:

{ "_id" : ObjectId("4fd7d5aba46929bd0bbd56f7"), "name" : "Johnny", "languages" : [ "scala", "java", "python" ] }
{ "_id" : ObjectId("4fd7d5ada46929bd0bbd56f8"), "age" : 25, "languages" : [ "scala", "lisp" ], "name" : "Sue" }

In order to update array elements, we could use \(pull and \(push operations. For example, if we want to remove haskell language from Sue's languages:

db.users.update({name: 'Sue'}, {'$pull': {'languages': 'haskell'} });

and if we want to add a language say java:

db.users.update({name: 'Sue'}, {'$push': {'languages': 'java'} });

Upsert(Update + Insert)

Mongo supports so called upserts which is nothing more than a fancy combination of update and insert. That is, if item that we want to update is not in the collection, it automatically creates it. If it does exist in the collection, it updates by default. However, in order to enable this feature of Mongo, we need to enable the third parameter of update operation as true. Say, we need to create a website hit counter, and in order to do so we increment the number of hits every time the name of website is updated. If we do not have the website name in the collection, we do not have to create it beforehand. We could just use upsert as such:

db.hits.update({page: 'yahoo'},{$inc: {hits: 1}},true)

If we do not have the third parameter or set to false, above statement does not change anything in the collection.

Multiple Updates

If we want to multiple updates in the collection, we need to enable the fourt parameter in the update operation. For example, we want to reset the counter of websites as such:

db.hits.update({},{$set: {hits: 0}},false,true)

By doing so, we update all the documents in the collection. However, if we do not enable the fourt parameter as true, then only the first element of the collection will be updated.

Deeper in find() operation

If we want to retrieve specific fields of the documents we could use a second parameter in find(),e.g. only the names of the webpages as such:

db.hits.find(null,{page:1})

then, it results in only page fields of documents, namely google and yahoo.

Ordering

Say, we have a database as such:

{ "_id" : ObjectId("4fd8cf5bbd6be5d371385b9a"), "hits" : 7, "page" : "yahoo" }
{ "_id" : ObjectId("4fd8d0fabd6be5d371385b9b"), "hits" : 5, "page" : "google" }
{ "_id" : ObjectId("4fd91925da57fdfbb68d7848"), "page" : "microsoft", "hits" : 10 }
{ "_id" : ObjectId("4fd91930da57fdfbb68d7849"), "page" : "facebook", "hits" : 15 }
{ "_id" : ObjectId("4fd91938da57fdfbb68d784a"), "page" : "apple", "hits" : 30 }

and we want to order this collection by the number of hits in a descending order. We could do this by using sort() operation as such:

db.hits.find().sort({hits: -1})

If we want to also return some specific ranks in the sort, we could use limit() and skip() operations. For example, we want to return second and third queries only.

db.hits.find().sort({hits: -1}).limit(2).skip(1)

Counting count()

count() can be used itself as an independent operation similar to update and insert. However, its origin is to follow find().count(). Therefore, count() operation can be considered as a syntactic sugar for find and count. If we want to count the webpages which have higher than 9 hits;

db.hits.find({hits: {$gt: 9}}).count()

MongoDB Difference and Similarities to RDBMS

There is no join opearation in Mongo contrary to RDBMS,but we could connect by using a foreign key in the documents of collections. In order to show the embedded and relational part of database, we start with employees example:

{ "_id" : ObjectId("4d85c7039ab0fd70a117d730"), "name" : "Paul" }
{ "_id" : ObjectId("4d85c7039ab0fd70a117d731"), "name" : "Duncan", "manager" : ObjectId("4d85c7039ab0fd70a117d730") }
{ "_id" : ObjectId("4d85c7039ab0fd70a117d732"), "name" : "Moneo", "manager" : ObjectId("4d85c7039ab0fd70a117d730") }

We set Paul as a manager of Duncan and Moneo's. In the collection, in order to find the manager, we could do as such:

db.employees.find({manager: ObjectId("4d85c7039ab0fd70a117d730")})

If we have two managers in the company, then we could add two managers as such:

db.employees.insert({_id: ObjectId("4d85c7039ab0fd70a117d733"),
name: 'Siona',manager: [ObjectId("4d85c7039ab0fd70a117d730"), ObjectId("4d85c7039ab0fd70a117d732")] })

We could again return the manager using above operation.

db.employees.find({manager: ObjectId("4d85c7039ab0fd70a117d730")})

If we want to embed more documents into the collection like a nested document, we could do so as such:

db.employees.insert({_id: ObjectId("4d85c7039ab0fd70a117d734"), name: 'Ghanima',
family: {mother: 'Chani', father: 'Paul', brother: ObjectId("4d85c7039ab0fd70a117d730")}})

We could return the respective field of the nested document by using dot notation.

db.employees.find({'family.mother': 'Chani'})

MongoDB Repair in Ubuntu

First, try to repair.

sudo rm /var/lib/mongodb/mongod.lock
sudo chown -R mongodb:mongodb /var/lib/mongodb/
sudo -u mongodb mongod -f /etc/mongodb.conf --repair
sudo service mongodb start

Then, restart the database.

sudo service mongod start
mongo
comments powered by Disqus