Insert Postgresql database into Elasticsearch Using Logstash
Insert Postgresql database into Elasticsearch Using Logstash
Ever want to search your database entities from Elasticsearch? Now you can use Logstash to do just that! In this blog we introduce the JDBC input, which has been created to import data from any database that supports the JDBC interface. Below, we show you few examples of using this plugin.
Getting Started
Installation
bin/plugin install logstash-input-jdbc
Driver Support
Popular databases like Oracle, Postgresql, and MySQL have compatible JDBC drivers that can be used with this input. This plugin does not come packaged with any of these JDBC drivers out of the box, but is straightforward to download. You can then configure the plugin to use the desired jdbc driver library. The setting jdbc_driver_library and jdbc_driver_class are used to load the library path and the driver's class name.
Lets get started with the examples!
Example 1: Simple Postgres Input
Here is an example of how you get started reading from a local Postgresql database. As a prerequisite, download the Postgresql JDBC drivers to use with the plugin.
Setting Up The Database
Before we get started, let's create a table called contacts and populate it with some contacts!
create table contacts (
uid serial,
email VARCHAR(80) not null,
first_name VARCHAR(80) NOT NULL,
last_name VARCHAR(80) NOT NULL
);
INSERT INTO contacts(email, first_name, last_name) VALUES('jim@example.com', 'Jim', 'Smith');
INSERT INTO contacts(email, first_name, last_name) VALUES(‘sds@gmil.com’, 'John', 'Smith');
INSERT INTO contacts(email, first_name, last_name) VALUES('carol@example.com', 'Carol', 'Smit’);
uid serial,
email VARCHAR(80) not null,
first_name VARCHAR(80) NOT NULL,
last_name VARCHAR(80) NOT NULL
);
INSERT INTO contacts(email, first_name, last_name) VALUES('jim@example.com', 'Jim', 'Smith');
INSERT INTO contacts(email, first_name, last_name) VALUES(‘sds@gmil.com’, 'John', 'Smith');
INSERT INTO contacts(email, first_name, last_name) VALUES('carol@example.com', 'Carol', 'Smit’);
After this runs, here are the contents in the database in table form.
Email
|
First Name
|
Last Name
|
null
|
John
|
Smith
|
carol@example.com
|
Carol
|
Smith
|
sam@example.com
|
Sam
|
Smith
|
jim@example.com
|
Jim
|
Smith
|
Logstash Configuration
We can go ahead and output all these events to the console with this sample Logstash configuration:
input {
jdbc {
# Postgres jdbc connection string to our database, mydb
jdbc_connection_string => "jdbc:postgresql://localhost:5432/hello"
# The user we wish to execute our statement as
jdbc_user => "rajesh"
jdbc_password => "supranimbus"
# The path to our downloaded jdbc driver
jdbc_driver_library => "/opt/postgresql-42.1.1.jre6.jar"
# The name of the driver class for Postgresql
jdbc_driver_class => "org.postgresql.Driver"
# our query
statement => "SELECT * from contacts"
}
}
output {
stdout { codec => json_lines }
}
~
Now we can run Logstash and see the results!
$ logstash-1.5.3/bin/logstash -f simple-out.conf Logstash startup completed{"uid":1,"email":null,"first_name":"hello","last_name":null,"@version":"1","@timestamp":"2015-07-29T21:03:18.958Z"}{"uid":2,"email":"jim@example.com","first_name":"Jim","last_name":"Smith","@version":"1","@timestamp":"2015-07-29T21:03:18.959Z"}{"uid":3,"email":null,"first_name":"John","last_name":"Smith","@version":"1","@timestamp":"2015-07-29T21:03:18.959Z"}{"uid":4,"email":"carol@example.com","first_name":"Carol","last_name":"Smith","@version":"1","@timestamp":"2015-07-29T21:03:18.959Z"}{"uid":5,"email":"sam@example.com","first_name":"Sam","last_name":null,"@version":"1","@timestamp":"2015-07-29T21:03:18.961Z"}Logstash shutdown completed
Read Less
Read Less
Awesome, we read data from Postgresql!
Up next, we will demonstrate two examples of how you may use this plugin in the context of Elasticsearch.
Example 2: Synchronizing Data In Your Table To Elasticsearch
In the case that we are using our database as an input source for Elasticsearch, we may be interested in keeping our existing documents in-sync with our data as the database undergoes updates. In this case, we can simply index our rows in Elasticsearch with unique ids such that any time we re-index them, they will just update. This way, we prevent Elasticsearch from assigning a new ID for each record and generating duplicates!
# file: contacts-index-logstash.conf
input {
jdbc {
# Postgres jdbc connection string to our database, mydb
jdbc_connection_string => "jdbc:postgresql://localhost:5432/hello"
# The user we wish to execute our statement as
jdbc_user => "rajesh"
jdbc_password => "supranimbus"
# The path to our downloaded jdbc driver
jdbc_driver_library => "/opt/postgresql-42.1.1.jre6.jar"
# The name of the driver class for Postgresql
jdbc_driver_class => "org.postgresql.Driver"
# our query
statement => "SELECT * from contacts"
}
}
output {
stdout { codec => json_lines }
}
~
Let's do a quick check to see that "Sam" was indexed into Elasticsearch
curl ES_NODE_HOST:9200/contacts/contact/5?pretty
{
"_index" : "contacts",
"_type" : "contact",
"_id" : "5",
"_version" : 1,
"found" : true,
"_source":{"uid":5,"email":"sam@example.com","first_name":"Sam","last_name":null,"@version":"1","@timestamp":"2015-07-29T22:12:20.146Z"}
}
{
"_index" : "contacts",
"_type" : "contact",
"_id" : "5",
"_version" : 1,
"found" : true,
"_source":{"uid":5,"email":"sam@example.com","first_name":"Sam","last_name":null,"@version":"1","@timestamp":"2015-07-29T22:12:20.146Z"}
}
So far we just saw how to use a query to fetch results from a database query, but what if we want to update our index with new changes? What if some of our contacts changed emails, or we want to update someone's last name? Here is a sequence of changes that we can apply to our table and later verify the behavior we want in the resulting Elasticsearch index.
UPDATE contacts SET last_name = 'Smith' WHERE email = 'sam@example.com';
UPDATE contacts SET email = 'john@example.com' WHERE uid = 3;
INSERT INTO contacts(email, first_name, last_name) VALUES('new@example.com', 'New', 'Smith');
UPDATE contacts SET email = 'john@example.com' WHERE uid = 3;
INSERT INTO contacts(email, first_name, last_name) VALUES('new@example.com', 'New', 'Smith');
Now we can run Logstash with the same configuration. When we do the same query as before, we will
notice that our document containing Sam has been updated and @version is now 2
curl ES_NODE_HOST:9200/contacts/contact/5?pretty
{
{
"_index" : "contacts",
"_type" : "contact",
"_id" : "5",
"_version" : 2,
"found" : true,
"_source":{"uid":5,"email":"sam@example.com","first_name":"Sam","last_name":"Smith","@version":"1","@timestamp":"2015-07-29T22:12:56.980Z"}
}
"_type" : "contact",
"_id" : "5",
"_version" : 2,
"found" : true,
"_source":{"uid":5,"email":"sam@example.com","first_name":"Sam","last_name":"Smith","@version":"1","@timestamp":"2015-07-29T22:12:56.980Z"}
}
Using this method, we can re-index our table into Elasticsearch without ending up with duplicates. One thing to note is that we are not able to capture deletes to documents under this scheme.
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
ReplyDeleteApache Spark Training Institutes in Pune
Thanks for sharing the informative post.
ReplyDeleteMachine Learning training in Pallikranai Chennai
Data science training in Pallikaranai
Python Training in Pallikaranai chennai
Bigdata training in Pallikaranai chennai
ElasticSearch + Kibana database
ReplyDeleteElasticsearch is a free, open-source search and analytics engine based on the Apache Lucene library. It’s the most popular search engine and has been available since 2010. It’s developed in Java, supporting clients in many different languages, such as PHP, Python, C#, and Ruby.
Kibana is an free and open frontend application that sits on top of the Elastic Stack, providing search and data visualization capabilities for data indexed in Elasticsearch. Commonly known as the charting tool for the Elastic Stack (previously referred to as the ELK Stack after Elasticsearch, Logstash, and Kibana), Kibana also acts as the user interface for monitoring, managing, and securing an Elastic Stack cluster — as well as the centralized hub for built-in solutions developed on the Elastic Stack. Developed in 2013 from within the Elasticsearch community, Kibana has grown to become the window into the Elastic Stack itself, offering a portal for users and companies.ElasticSearch + Kibana database
our ElasticSearch + Kibana database expert skills & 24/7 dedicated support for stable clusters and achieve unparalleled performance and cost reduction along with stable data health. Experience our enterprise-class, worldwide support for Kibana integrated Elasticsearch & other stack.With years of direct, hands-on experience managing large Elasticsearch deployments, Genex efficiently supports data-analytics in real time. Take advantage of market-leading functionalities and Kibana visualizations on large data sets, with features including high available clusters, TLS, and RBAC.