Suppose you want to load mongoDB data into hive. So that you can run any hive query (HQL) using hadoop map reduce.
First you have to setup hadoop , hive and mongoDB.
Here is another article how you can setup hadoop - https://www.guru99.com/how-to-install-hadoop.html
Then download the mongo-hadoop release based on your hadoop version.
extract it and copy all 3 jar files to hadoop/lib and hive/lib folders.
First we create a sample records in mongoDB.
use hive-test
db.books.insert({ "_id": 1, "name": "Java 7", "author": "author1" });
db.books.insert({ "_id": 2, "name": "Hadoop", "author": "author2" });
db.books.insert({ "_id": 3, "name": "Hive", "author": "author3" });
Then start hive console and run following commands.
create schema books;
use books;
CREATE TABLE book (id int, name string, author string)
STORED BY "com.mongodb.hadoop.hive.MongoStorageHandler"
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","name":"name","author":"author"}')
TBLPROPERTIES ( "mongo.uri" = "mongodb://localhost:27017/hive-test.books");
Make sure data loaded into hive using
select * from book;
Troubleshoot
Failed with exception org.apache.hadoop.hive.ql.metadata.HiveException: Error in loading storage handler.com.mongodb.hadoop.hive.MongoStorageHandler
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
Make sure you added above 3 jars into hadoop/lib and hive/lib