HOW TO INGEST THE DATA INTO ELASTICSEARCH AND VIEW IN KIBANA THROUGH HIVE

HOW TO INGEST THE DATA INTO ELASTICSEARCH AND VIEW IN KIBANA THROUGH HIVE

Objective:

The purpose of this tutorial is to walk you through the process of using Hive Table and push those datas to Elasticsearch and view this data in Kibana.

Prerequisites:

You should already have installed the Hortonworks Sandbox (HDP 2.5 Tech Preview).

You should already have installed Elasticsearch and kibana.

Scope:

This tutorial was tested using the following environment and components:

Ubuntu 14.04

HDP 2.4 Tech Preview on Hortonworks Sandbox

Elasticsearch 2.4.0

Kibana 4.6.1

ES-Hadoop 5.1.1 jar

Requirement:

Steps followed to implement:

1.HIVE TABLE CREATION:

1.1 HIVE TABLE WITH LOCATION DATA

1.2 HIVE TABLE WITH PHYSICIAN DATA

1.3 HIVE TABLE WITH ONOCOLOGY DATA(MERGE THE TABLE 1.1 and 1.2)

2.INDEX FILE:

3.External table to hold the Hive data for ElasticSearch Index:

4.Ingest the Hive data to ElasticSearch:

1.HIVE TABLE CREATION:

~~~~~~~~~~~~~~~~~~~~~

1.1 HIVE TABLE WITH LOCATION DATA

-----------------------------------

CREATE TABLE IF NOT EXISTS geo.location (location String,city string,state string)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t'

LINES TERMINATED BY '\n'

STORED AS TEXTFILE;

load data inpath '/project/location.csv' OVERWRITE INTO TABLE geo.location;

***********************************************************************************************************

1.2 HIVE TABLE WITH PHYSICIAN DATA

-----------------------------------

CREATE TABLE IF NOT EXISTS geo.oncology (id String,firstname string,secondname string,lastname string,city string,state string)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t'

LINES TERMINATED BY '\n'

STORED AS TEXTFILE;

load data inpath '/project/ongphy.csv' OVERWRITE INTO TABLE geo.oncology;

*******************************************************

1.3 HIVE TABLE WITH ONOCOLOGY DATA(MERGE THE TABLE 1.1 and 1.2)

---------------------------------------------------------------

CREATE TABLE IF NOT EXISTS geo.ongylocation (id int,firstname string,secondname string,lastname string,location String,city string,state string)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t'

LINES TERMINATED BY '\n'

STORED AS TEXTFILE;

INSERT OVERWRITE TABLE geo.ongylocat SELECT t2.id,t2.firstname,t2.secondname,t2.lastname,t1.location,t1.city,t1.state

FROM geo.location t1, geo.oncology t2

WHERE t1.city = t2.city AND t1.state = t2.state;

***********************************************

2.INDEX FILE:

~~~~~~~~~~~

curl -XPOST sandbox.hortonworks.com:9200/oncology -d '{

"settings" : {

"number_of_shards" : 1

},

"mappings" : {

"data" : {

"properties" : {

"key" : { "type" : "string" },

"location" : {

"type" : "geo_point"

}

}

}

}

}'

******************************************************

3.External table to hold the Hive data for ElasticSearch Index

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

ADD JAR /root/ elasticsearch-hadoop-5.1.1.jar;

CREATE EXTERNAL TABLE geo.oncologylocation (id int,firstname string,secondname string,lastname string,location String,city string,state string)

STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'

TBLPROPERTIES('es.resource'='oncology/data','es.nodes' = 'sandbox.hortonworks.com:9200','es.mapping.id'='id');

***************************************************************

4.Ingest the Hive data to ElasticSearch:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

ADD JAR /usr/hdp/current/hive-client/lib/commons-httpclient-3.0.1.jar;

INSERT OVERWRITE TABLE geo.oncologylocation SELECT * FROM geo.ongylocation;

You can verify that tweets are being written to elasticsearch by typing the following in a browser window:

http://sandobx.hortonworks.com:9200/oncology/_search

You should see something similar to this:

{"took":2,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":5030,"max_score":1.0,"hits":[{"_index":"oncology","_type":"data","_id":"1003870007","_score":1.0,"_source":{"id":1003870007,"firstname":"SANG-MO","secondname":"","lastname":"KANG","location":"37.775, -122.418333","city":"SAN FRANCISCO","state":"CA"}},{"_index":"oncology","_type":"data","_id":"1962650119","_score":1.0,"_source":{"id":1962650119,"firstname":"REBECCA","secondname":"PAMELA","lastname":"YOUNG","location":"37.775, -122.418333","city":"SAN FRANCISCO","state":"CA"}},{"_index":"oncology","_type":"data","_id":"1952499519","_score":1.0,"_source":{"id":1952499519,"firstname":"JAMES","secondname":"","lastname":"HUANG","location":"37.775, -122.418333","city":"SAN FRANCISCO","state":"CA"}},{"_index":"oncology","_type":"data","_id":"1952365900","_score":1.0,"_source":{"id":1952365900,"firstname":"SCOT","secondname":"H","lastname":"MERRICK","location":"37.775, -122.418333","city":"SAN FRANCISCO","state":"CA"}},{"_index":"oncology","_type":"data","_id":"1932287562","_score":1.0,"_source":{"id":1932287562,"firstname":"LISA","secondname":"G.","lastname":"SANDLES","location":"37.775, -122.418333","city":"SAN FRANCISCO","state":"CA"}},{"_index":"oncology","_type":"data","_id":"1932142296","_score":1.0,"_source":{"id":1932142296,"firstname":"MICHELLE","secondname":"L.","lastname":"HERMISTON","location":"37.775, -122.418333","city":"SAN FRANCISCO","state":"CA"}},{"_index":"oncology","_type":"data","_id":"1922242932","_score":1.0,"_source":{"id":1922242932,"firstname":"AMY","secondname":"JO","lastname":"CHIEN","location":"37.775, -122.418333","city":"SAN FRANCISCO","state":"CA"}},{"_index":"oncology","_type":"data","_id":"1912165440","_score":1.0,"_source":{"id":1912165440,"firstname":"NIHARIKA","secondname":"","lastname":"DIXIT","location":"37.775, -122.418333","city":"SAN FRANCISCO","state":"CA"}},{"_index":"oncology","_type":"data","_id":"1912165358","_score":1.0,"_source":{"id":1912165358,"firstname":"JUSTIN","secondname":"","lastname":"WAHLSTROM","location":"37.775, -122.418333","city":"SAN FRANCISCO","state":"CA"}},{"_index":"oncology","_type":"data","_id":"1902897515","_score":1.0,"_source":{"id":1902897515,"firstname":"BARRY","secondname":"MAURICE","lastname":"CHAUSER","location":"37.775, -122.418333","city":"SAN FRANCISCO","state":"CA"}}]}}

Access Kibana Web UI

You can access the Kibana web user interface via your browser:

http://sandbox.hortonworks.com:5601

When you first start Kibana, it will create a new Elasticsearch index called .kibana where it stores the visualizations and dashboards. Because this is the first time starting it, you should be prompted to configure an index pattern. You should see something similar to:

We are going to use our Twitter data which is stored in the oncology index. Uncheck the Index contains geopoint option. Replace logstash-* with oncology.

You should see something similar to this:

If everything looks correct, click green Create button.

Kibana will now show you the index definition for the oncology index. You can see all of the fields names, their data types and if the fields are analyzed and indexed. This provides a good high level overview of the data configuration in the index. You can also filter fields in the filter box.

Discover Twitter Data

Click on the Discover link in the top navigation bar. This opens the Discover view which is a very helpful way to dive into your raw data. You should see something similar to this:

Create Kibana Visualizations

Now we are going to create a map visualization. Click the Visualize link in the navigation Bar. You should see something like this:

You should note our previously saved visualization is listed at the bottom. Now we are going to click on the Tile map option. You want to choose From a new search and use the index pattern

Under the Select buckets type, click on Geo Coordinates. Kibana should auto populate the values with the only geo-enabled field we have, coordinates.coordinates. You should see something similar to this:

Click the Green run arrow. You should see something similar to this:

Create Kibana Dashboard

Now we ready to create a Kibana dashboard. Click on Dashboard in the navigation bar. You should see something similar to this:

Now we can save this dashboard.