Installing Neo4j on Kubernetes with Helm

Abhishek Purohit
6 min read4 days ago

--

Introduction

Graph databases like Neo4j excel at managing complex relationships in data. This guide combines Neo4j’s power with Kubernetes’ scalability and Helm’s ease of deployment to create a robust data management solution.

Understanding the Tools

  1. Neo4j: A high-performance graph database that stores data as nodes and relationships, perfect for applications like social networks, recommendation systems, and knowledge graphs.
  2. Kubernetes: An open-source system for managing containerized applications at scale, offering efficient resource allocation and high availability.
  3. Helm: The package manager for Kubernetes, streamlining the deployment of applications with pre-configured “charts” containing all necessary components.

Preparing for Deployment

  • Prerequisites:
  • A functional Kubernetes cluster (e.g., Minikube for local development, cloud-based clusters for production)
  • Helm installed and configured to interact with your cluster
  • Access to kubectl command-line tool for Kubernetes management
  • Adding Neo4j Helm Repository:
helm repo add neo4j https://neo4j-contrib.github.io/neo4j-helm
helm repo update

Installing Neo4j with Helm

  1. Helm Installation:
helm install neo4j neo4j/neo4j \
--set auth.enabled=false \ # For initial setup; enable later
--set volumes.data.persistentVolumeClaim.storageClassName=standard \ # Adjust for your storage class
--set service.type=NodePort \
--set service.nodePortHttp=30000 \ # Accessible outside the cluster
--set service.nodePortBolt=30001 # Bolt protocol access
  • This command installs the Neo4j Helm chart with basic settings.
  • NodePorts expose Neo4j’s HTTP and Bolt interfaces on specified ports.
  • Adjust storageClassName to match your cluster's storage configuration.
  • Configuration with values.yaml:
  • Fine-tune Neo4j by modifying the values.yaml file before installation.
  • Options include resource allocation, authentication settings, backup configuration, and more.
  • Example (basic):
auth:
enabled: true # Enable authentication
volumes:
data:
persistentVolumeClaim:
storageClassName: standard
size: 10Gi # Adjust storage size as needed

High Availability & Additional Considerations

  • High Availability (HA):
  • Use --set replicaCount=3 for a highly available Neo4j cluster.
  • Consider cloud-native load balancers for distributing traffic.
  • Cloud Architecture Principles:
  • Design for scalability: Kubernetes handles scaling Neo4j instances as needed.
  • Implement fault tolerance: Use HA clusters and persistent storage for data durability.
  • Security: Enable Neo4j authentication and follow security best practices for your cluster.

Verifying the Installation

  • Get Neo4j Pod Status:
kubectl get pods -l "app=neo4j"

Access Neo4j:

  • HTTP: Use the NodePort exposed for the HTTP interface (e.g., http://<your-cluster-ip>:30000).
  • Bolt: Use the NodePort for Bolt connections (e.g., bolt://<your-cluster-ip>:30001).

Fine-Tuning Your Neo4j Deployment: A Deep Dive into Helm Chart Configuration

The values.yaml file acts as the control center for tailoring your Neo4j installation on Kubernetes. It allows you to fine-tune everything from licensing and security to resource allocation and external access. Let's explore its key sections and understand their impact on your graph database environment.

acceptLicenseAgreement: "yes"
neo4jPassword: "mySecretPassword"

core:
numberOfServers: 3
persistentVolume:
size: "20Gi"
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "2"
memory: "4Gi"

readReplica:
numberOfServers: 2
persistentVolume:
size: "20Gi"

service:
type: NodePort
ports:
http:
port: 7474
nodePort: 30074
bolt:
port: 7687
nodePort: 30087

1. License and Password: Foundation of Your Neo4j Instance

  • acceptLicenseAgreement: "yes": This is a mandatory step, affirming your agreement with Neo4j's licensing terms.
  • neo4jPassword: "mySecretPassword": Sets the initial password for the Neo4j administrative user. Crucial: Change this immediately upon deployment to a strong, unique password!

2. Core Configuration: The Heart of Your Graph Database

  • numberOfServers: 3: Defines the number of core Neo4j instances. Setting this to three (or more) enables clustering, providing high availability and fault tolerance.
  • persistentVolume:
  • size: "20Gi": Allocates storage for the database. Adjust this based on your anticipated data volume and growth.
  • resources:
  • requests: The minimum resources each Neo4j pod requests from Kubernetes.
  • cpu: "1": At least one CPU core.
  • memory: "2Gi": 2 Gigabytes of memory.
  • limits: The maximum resources a pod can consume.
  • cpu: "2": Up to two CPU cores.
  • memory: "4Gi": Up to 4 Gigabytes of memory.
  • Why this matters: Proper resource allocation ensures optimal performance and prevents resource starvation, especially during peak loads.

3. Read Replica: Scaling Read Performance

  • numberOfServers: 2: Specifies the number of read replicas. Replicas handle read queries, offloading the core instances and improving overall query performance.
  • persistentVolume:
  • size: "20Gi": Same as the core configuration, ensuring sufficient storage for replica data.

4. Service Configuration: Opening the Doors

  • type: NodePort: Exposes Neo4j's services (HTTP and Bolt) outside of the Kubernetes cluster. This is essential for external applications to connect.
  • ports:
  • http:
  • port: 7474: The internal port on which Neo4j's web interface listens.
  • nodePort: 30074: The external port mapped to the internal HTTP port.
  • bolt:
  • port: 7687: Neo4j's binary Bolt protocol port for efficient client-server communication.
  • nodePort: 30087: The external port mapped to the internal Bolt port.

Launching Your Graph Database: Deploying Neo4j with Helm on Kubernetes

1. The Deployment Command: Unveiling the Magic

helm install neo4j-cluster neo4j/neo4j --values values.yaml --namespace neo4j

This seemingly simple command orchestrates a powerful sequence of actions:

  1. helm install: Instructs Helm, the Kubernetes package manager, to install a new application.
  2. neo4j-cluster: The name you choose for this particular Neo4j deployment. This name becomes a namespace within Kubernetes, isolating your Neo4j resources from other applications.
  3. neo4j/neo4j: Tells Helm to fetch the Neo4j chart from the official Neo4j Helm repository. This chart is a pre-packaged template containing all the Kubernetes manifests needed to deploy Neo4j.
  4. --values values.yaml: Directs Helm to use your meticulously crafted values.yaml file for configuration. This is where you've customized settings for your specific Neo4j instance (as discussed in the previous section).
  5. --namespace neo4j: Creates (if it doesn't exist) or deploys into the neo4j namespace, keeping your Neo4j resources organized and separate.

2. Behind the Scenes: Kubernetes in Action

When you execute this command, Helm and Kubernetes collaborate to:

  1. Create Pods: Pods are the smallest deployable units in Kubernetes, each containing a single Neo4j instance. Helm uses your values.yaml to determine the number of pods (core instances and replicas) needed for your desired cluster configuration.
  2. Set Up Services: Services provide a stable network endpoint to access your Neo4j pods. In this case, NodePort services expose your Neo4j instances on specified ports, making them accessible from outside the cluster.
  3. Manage Persistent Volumes: Helm creates Persistent Volume Claims (PVCs) based on your storage configuration. PVCs dynamically provision persistent storage from your Kubernetes environment, ensuring your Neo4j data remains safe even if pods are restarted or replaced.
  4. Configure Networking: Helm configures network policies and security settings as per your specifications to protect your Neo4j deployment.

3. Verification and Next Steps

  • Check Pod Status: Use kubectl get pods -n neo4j to monitor the status of your Neo4j pods. Ensure they are in a "Running" state and have completed initialization.
  • Access Neo4j: Connect to your Neo4j instance using the NodePorts you configured in values.yaml. The web interface is typically accessible via HTTP, and the Bolt protocol is used for programmatic access.
  • Fine-Tuning: Continuously monitor your Neo4j deployment, and adjust your values.yaml as needed to optimize performance, resource utilization, and security.

Mastering Neo4j on Kubernetes: High Availability, Scalability, Security, and Maintenance

High Availability (HA): Ensuring Resilience

High availability guarantees your graph database remains accessible and operational, even in the face of hardware failures or unforeseen disruptions. Here’s how you achieve it with Neo4j and Kubernetes:

  • Core Cluster: Deploying multiple Neo4j core instances (e.g., three or more) forms a cluster. This redundancy ensures that if one instance fails, others can seamlessly take over, preventing service interruptions.
  • Data Replication: Neo4j employs sophisticated data replication mechanisms to keep your data consistent across all cluster members. Changes made on one instance are propagated to others, ensuring data durability and availability.
  • Kubernetes Orchestration: Kubernetes monitors the health of your Neo4j pods and automatically restarts or replaces failed instances, further enhancing your system’s resilience.

Scalability: Adapting to Growth

Your graph database needs to evolve as your data and workload demands change. Kubernetes provides a powerful framework for scaling Neo4j horizontally and vertically:

Horizontal Scaling: Add or remove Neo4j read replicas on-the-fly. This allows you to dynamically adjust query processing capacity to match your workload.

Vertical Scaling: Increase the resources (CPU, memory) allocated to individual Neo4j pods. This can boost the performance of individual instances if your workload becomes more computationally intensive.

Security Considerations: Safeguarding Your Data

Protecting your valuable graph data is paramount.

Kubernetes Secrets: Store sensitive information like Neo4j passwords in Kubernetes Secrets. This ensures that credentials are encrypted at rest and only accessible to authorized pods.

Network Policies: Kubernetes Network Policies act as firewalls for your pods. Configure them to restrict traffic to your Neo4j instances, allowing access only from trusted sources (e.g., your application pods).

Neo4j Authentication: Enable Neo4j’s built-in authentication mechanisms to control user access and permissions within the database itself.

Monitoring and Maintenance: Keeping a Pulse on Your System

A well-maintained system is a healthy system.

Prometheus and Grafana: These popular open-source tools provide powerful metrics collection and visualization for your Neo4j deployment.

Regular Backups: Implement a comprehensive backup strategy to ensure you can recover your data in case of catastrophic failure. Kubernetes CronJobs can automate scheduled backups for convenience and reliability.

--

--