What is Elasticsearch?
Elasticsearch is a distributed, RESTful, open-source analytics and search engine. It works on all types of data, like numerical, textual, structured, unstructured and geospatial. It is built on Apache Lucene and released in 2010 by Elastic. Because of its distributed nature, simple REST APIs, scalability, and speed, Elasticsearch became the most popular search engine very quickly.
Note: ELK Stack is the set of open source tools (Elasticsearch, Logstash, and Kibana) which are used for data ingestion, storage, enrichment, analysis, and visualization. Elasticsearch is the most important and core component of ELK Stack.
Need of Elasticsearch
For applications with huge database, search engines face the issues like slow retrieval of data. This is because for most of the cases, data is spread out among multiple tables. When search engines need to fetch the meaning full data then it needs to join with multiple tables which results into slow retrieval. That’s why instead of using RDBMS for storing data, NoSQL distributed database are preferred now a days. Elasticsearch is one of NoSQL distributed database.
Why use Elasticsearch?
- Fast: Elasticsearch is built on base of Lucene and it is very fast as compared to RDBMS. It approximately takes 10ms to fetch the data, for which RDBMS can take up to 10s.
- Distributed by nature: Elasticsearch documents are distributed across multiple different containers and these containers are known as Shards. Because of Elasticsearch’s distributed nature it can handle huge amount of data.
- Scalability of the search Engine: As discussed, Elasticsearch is distributed in nature and because of distributed nature, it can scale up to thousands of servers and can accommodate huge amount of data.
- Analysis: Analytical workloads refers to count things and summarize the data. To analyse data, Elasticsearch provides different aggregations, and these aggregations are often generated by tools such as Kibana.
- Scraping and Combining Public Data: Elastic Stack provides number of tools to fetch and index data from different external sources like Twitter. One example of such tool is Twitter connector, which provides the functionality to setup hashtags. These hashtags are used to grab or fetch all hashtag tweets, which further can be analysed by Kibana.
- Visualizing Data and Reporting: There are number of good tools for visualizing and reporting the document data. Kibana is the most powerful tool for data visualization. It provides TimeLion service for time-series data, a tile service for geo-data and many more.
- Textual Search (searching for pure text): Elasticsearch provides the facility to search a text for the best match in large amount of data.
- JSON document storage: Elasticsearch provides the facility to save the JSON object as a document field.
- Auto Suggest and Auto Complete: Elasticsearch provides the facility to auto complete and auto suggest by start typing or partially typed words.
How does Elasticsearch work?
Elasticsearch provides API which can be used to send the data in JSON form. We can also use some ingestion tools such as Logstash and Amazon Kinesis Firehose to send the data to Elasticsearch engine in JSON document format. Elasticsearch then will store the document and provides a searchable reference to the JSON document. Once document is indexed, we can search and retrieve the document using the APIs provided by Elasticsearch. Some tools are also available like Kibana to visualize, analyse and build interactive reports and dashboards.