Posts

⚙️ Is Apache Spark Being Overtaken by ClickHouse, Snowflake, and StarRocks?

  Over the last few years, Apache Spark has been the de facto choice for large-scale ETL, batch processing, and ML workloads across data platforms. However, the emergence of modern analytical engines — ClickHouse , Snowflake , and StarRocks — is redefining how teams think about data pipelines and performance optimization. Let’s break down what’s really happening 👇 🧩 Spark: Still the Heavy-Lift Engine Spark remains unmatched for: Complex multi-source ETL and data lake transformations Large-scale joins and machine learning workloads Distributed compute flexibility ( Scala , PySpark , SQL, MLlib , Delta , etc.) But Spark’s batch-oriented execution model introduces startup overhead and cost inefficiencies when used for near real-time analytics or small-scale transformations. ⚡ ClickHouse & StarRocks: Redefining Real-Time OLAP Both ClickHouse and StarRocks are built for low-latency analytical workloads . ClickHouse leverages a columnar MergeTree engin...

Add ports to the HDP 2.5 VirtualBox Sandbox

Image
Add ports to the HDP 2.5 VirtualBox Sandbox Short Description: This tutorial will guide you through the process of adding additional ports to the VirtualBox version of the HDP 2.5 sandbox. Article Objective The Hortonworks Sandbox for HDP 2.5 now uses Docker containers, even the VirtualBox version. The process for exposing extra ports on older versions of the sandbox was as simple as setting up additional port forwarding rules in VirtualBox. The new container version of the sandbox requires additional steps; you have to do more than just setup port forwarding rules in VirtualBox. This tutorial will guide you through the process of adding additional ports to the VirtualBox version of the HDP 2.5 sandbox. Prerequisites You should have already downloaded and installed the VirtualBox version of the Hortonworks Sandbox for HDP 2.5 Hortonworks Sandbox Scope Mac OS X 10.11.6 (El Capitan) VirtualBox 5.1.6 HDP 2.5 VirtualBox Sandbox Steps Startup the Sandb...

BASIC ELASTICSEARCH-KIBANA SETUP

BASIC ELASTICSEARCH-KIBANA SETUP Prerequisite:   HDP 2.4 or 2.5  Elasticsearch Setup: ***************** Download Elasticsearch: ---------------------------- We use curl to download Elasticsearch from sandbox.(Elasticsearch version 2.4.0) $ cd ~ $ curl -O https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/tar/elasticsearch/2.4.0/elasticsearch-2.4.0.tar.gz Install Elasticsearch: -------------------------- Next we need to extract Elasticsearch to the /opt directory, which is where we'll run it. $ mv elasticsearch-2.4.0.tar.gz /opt/ $ cd /opt $ sudo tar xvfz elasticsearch-2.4.0.tar.gz Configure Elasticsearch: ------------------------------ We need to make a couple of changes to the Elasticsearch configuration file /opt/elasticsearch-2.4.0/config/elasticsearch.yml. $ cd config $ vi elasticsearch.yml We need to set the cluster.name setting to "elasticsearch". cluster.name: elastics...