Insert Postgresql database into Elasticsearch Using Logstash

Insert Postgresql database into Elasticsearch Using Logstash

Ever want to search your database entities from Elasticsearch? Now you can use Logstash to do just that! In this blog we introduce the JDBC input, which has been created to import data from any database that supports the JDBC interface. Below, we show you few examples of using this plugin.

Getting Started

Installation

bin/plugin install logstash-input-jdbc

Driver Support

Popular databases like Oracle, Postgresql, and MySQL have compatible JDBC drivers that can be used with this input. This plugin does not come packaged with any of these JDBC drivers out of the box, but is straightforward to download. You can then configure the plugin to use the desired jdbc driver library. The setting jdbc_driver_library and jdbc_driver_class are used to load the library path and the driver's class name.

Lets get started with the examples!

Example 1: Simple Postgres Input

Here is an example of how you get started reading from a local Postgresql database. As a prerequisite, download the Postgresql JDBC drivers to use with the plugin.

Setting Up The Database

Before we get started, let's create a table called contacts and populate it with some contacts!

create table contacts (
   uid serial,
   email VARCHAR(80) not null,
   first_name VARCHAR(80) NOT NULL,
   last_name VARCHAR(80) NOT NULL
);
INSERT INTO contacts(email, first_name, last_name) VALUES('jim@example.com', 'Jim', 'Smith');
INSERT INTO contacts(email, first_name, last_name) VALUES(‘sds@gmil.com’, 'John', 'Smith');
INSERT INTO contacts(email, first_name, last_name) VALUES('carol@example.com', 'Carol', 'Smit’);

After this runs, here are the contents in the database in table form.

Email	First Name	Last Name
null	John	Smith
carol@example.com	Carol	Smith
sam@example.com	Sam	Smith
jim@example.com	Jim	Smith

Logstash Configuration

We can go ahead and output all these events to the console with this sample Logstash configuration:

input {

jdbc {

# Postgres jdbc connection string to our database, mydb

jdbc_connection_string => "jdbc:postgresql://localhost:5432/hello"

# The user we wish to execute our statement as

jdbc_user => "rajesh"

jdbc_password => "supranimbus"

# The path to our downloaded jdbc driver

jdbc_driver_library => "/opt/postgresql-42.1.1.jre6.jar"

# The name of the driver class for Postgresql

jdbc_driver_class => "org.postgresql.Driver"

# our query

statement => "SELECT * from contacts"

}

output {

stdout { codec => json_lines }

}

Now we can run Logstash and see the results!

$ logstash-1.5.3/bin/logstash -f simple-out.conf Logstash startup completed{"uid":1,"email":null,"first_name":"hello","last_name":null,"@version":"1","@timestamp":"2015-07-29T21:03:18.958Z"}{"uid":2,"email":"jim@example.com","first_name":"Jim","last_name":"Smith","@version":"1","@timestamp":"2015-07-29T21:03:18.959Z"}{"uid":3,"email":null,"first_name":"John","last_name":"Smith","@version":"1","@timestamp":"2015-07-29T21:03:18.959Z"}{"uid":4,"email":"carol@example.com","first_name":"Carol","last_name":"Smith","@version":"1","@timestamp":"2015-07-29T21:03:18.959Z"}{"uid":5,"email":"sam@example.com","first_name":"Sam","last_name":null,"@version":"1","@timestamp":"2015-07-29T21:03:18.961Z"}Logstash shutdown completed
Read Less

Awesome, we read data from Postgresql!

Up next, we will demonstrate two examples of how you may use this plugin in the context of Elasticsearch.

Example 2: Synchronizing Data In Your Table To Elasticsearch

In the case that we are using our database as an input source for Elasticsearch, we may be interested in keeping our existing documents in-sync with our data as the database undergoes updates. In this case, we can simply index our rows in Elasticsearch with unique ids such that any time we re-index them, they will just update. This way, we prevent Elasticsearch from assigning a new ID for each record and generating duplicates!

# file: contacts-index-logstash.conf

input {

   jdbc {

       # Postgres jdbc connection string to our database, mydb

       jdbc_connection_string => "jdbc:postgresql://localhost:5432/hello"

       # The user we wish to execute our statement as

       jdbc_user => "rajesh"

       jdbc_password => "supranimbus"

       # The path to our downloaded jdbc driver

       jdbc_driver_library => "/opt/postgresql-42.1.1.jre6.jar"

       # The name of the driver class for Postgresql

       jdbc_driver_class => "org.postgresql.Driver"

       # our query

       statement => "SELECT * from contacts"

   }

}

output {

   stdout { codec => json_lines }

}

~

Let's do a quick check to see that "Sam" was indexed into Elasticsearch

curl ES_NODE_HOST:9200/contacts/contact/5?pretty
{
   "_index" : "contacts",
   "_type" : "contact",
   "_id" : "5",
   "_version" : 1,
   "found" : true,
   "_source":{"uid":5,"email":"sam@example.com","first_name":"Sam","last_name":null,"@version":"1","@timestamp":"2015-07-29T22:12:20.146Z"}
}

So far we just saw how to use a query to fetch results from a database query, but what if we want to update our index with new changes? What if some of our contacts changed emails, or we want to update someone's last name? Here is a sequence of changes that we can apply to our table and later verify the behavior we want in the resulting Elasticsearch index.

UPDATE contacts SET last_name = 'Smith' WHERE email = 'sam@example.com';
UPDATE contacts SET email = 'john@example.com' WHERE uid = 3;
INSERT INTO contacts(email, first_name, last_name) VALUES('new@example.com', 'New', 'Smith');

Now we can run Logstash with the same configuration. When we do the same query as before, we will

notice that our document containing Sam has been updated and @version is now 2

curl ES_NODE_HOST:9200/contacts/contact/5?pretty
{

   "_index" : "contacts",
   "_type" : "contact",
   "_id" : "5",
   "_version" : 2,
   "found" : true,
   "_source":{"uid":5,"email":"sam@example.com","first_name":"Sam","last_name":"Smith","@version":"1","@timestamp":"2015-07-29T22:12:56.980Z"}
}

Using this method, we can re-index our table into Elasticsearch without ending up with duplicates. One thing to note is that we are not able to capture deletes to documents under this scheme.

Study Link:

Link 1: https://www.elastic.co/blog/logstash-jdbc-input-plugin

Link 2: https://www.elastic.co/guide/en/logstash/5.4/getting-started-with-logstash.html

Comments

Prwatech2 July 2020 at 10:13
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
Apache Spark Training Institutes in Pune
Unknown23 December 2020 at 18:21
Thanks for sharing the informative post.
Machine Learning training in Pallikranai Chennai
Data science training in Pallikaranai
Python Training in Pallikaranai chennai
Bigdata training in Pallikaranai chennai
ddd28 April 2022 at 01:41
ElasticSearch + Kibana database

Elasticsearch is a free, open-source search and analytics engine based on the Apache Lucene library. It’s the most popular search engine and has been available since 2010. It’s developed in Java, supporting clients in many different languages, such as PHP, Python, C#, and Ruby.

Kibana is an free and open frontend application that sits on top of the Elastic Stack, providing search and data visualization capabilities for data indexed in Elasticsearch. Commonly known as the charting tool for the Elastic Stack (previously referred to as the ELK Stack after Elasticsearch, Logstash, and Kibana), Kibana also acts as the user interface for monitoring, managing, and securing an Elastic Stack cluster — as well as the centralized hub for built-in solutions developed on the Elastic Stack. Developed in 2013 from within the Elasticsearch community, Kibana has grown to become the window into the Elastic Stack itself, offering a portal for users and companies.ElasticSearch + Kibana database

our ElasticSearch + Kibana database expert skills & 24/7 dedicated support for stable clusters and achieve unparalleled performance and cost reduction along with stable data health. Experience our enterprise-class, worldwide support for Kibana integrated Elasticsearch & other stack.With years of direct, hands-on experience managing large Elasticsearch deployments, Genex efficiently supports data-analytics in real time. Take advantage of market-leading functionalities and Kibana visualizations on large data sets, with features including high available clusters, TLS, and RBAC.

Search This Blog

Rajesh AJ - BIGDATA