Hive + MongoDB

In this article I'll show you how to connect mongoDB and hive using MongoStorageHandler.

Suppose you want to load mongoDB data into hive. So that you can run any hive query (HQL) using hadoop map reduce.

First you have to setup hadoop , hive and mongoDB.

Then download the mongo-hadoop release based on your hadoop version.

extract it and copy all 3 jar files to hadoop/lib and hive/lib folders.

First we create a sample records in mongoDB.

 use hive-test  
 db.books.insert({ "_id": 1, "name": "Java 7", "author": "author1" });  
 db.books.insert({ "_id": 2, "name": "Hadoop", "author": "author2" });  
 db.books.insert({ "_id": 3, "name": "Hive", "author": "author3" });  

Then start hive console and run following commands.

 create schema books;  
 use books;  
 CREATE TABLE book (id int, name string, author string)  
 STORED BY "com.mongodb.hadoop.hive.MongoStorageHandler"  
 WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","name":"name","author":"author"}')  
 TBLPROPERTIES ( "mongo.uri" = "mongodb://localhost:27017/hive-test.books");  

Make sure data loaded into hive using

 select * from book;   


 Failed with exception org.apache.hadoop.hive.ql.metadata.HiveException: Error in loading storage  
 FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask  

Make sure you added above 3 jars into hadoop/lib and hive/lib