Installation

Welcome to the TDP Installation Guide. This page will walk you through setting up a TDP cluster in a straightforward manner. The deployment relies on the official tdp-getting-started repository which provides scripts and configurations to launch a TDP environment on top of a cluster of 7 VMs.

This cluster aims to be used for testing purposes only. Refer to the Documentation for production-ready clusters.

This guide provides instructions for Ubuntu systems. It has also been tested with NixOS and MacOS. Only x86_64 systems have been used for now. Alternative Linux distributions, as well as Windows, shall work.

Requirements

Hardware and Software

Before you start, ensure your system meets the following requirements:

Hardware:

  • CPU: 8 cores
  • Storage: Minimum 3 GB (NVMe SSDs are better for performance)
  • Memory: 32 GB

If required, update the machine resources assigned to the VMs in the ./inventory/group_vars/all.yml file of your cloned repository.

Software:

  • Git a version control system. Install it by running these commands:

    # Update the package list and install git
    sudo apt update && sudo apt install git
    
  • Python3 (version 3.6 or newer), a versatile programming language. Most systems come with Python pre-installed. To check and install if needed:

    # Display the version of Python
    python3 -V
    

    If not, install Python3 using the following command:

    # Update the package list and install python3
    sudo apt update && sudo apt install python3
    
  • venv (Python virtual environment), for managing Python dependencies like Ansible and JMESPath. Install venv:

    # Update the package list and install venv
    sudo apt update && sudo apt install python3-venv
    
  • VirtualBox (version 6.1.26 or newer), a powerful virtualization software. Install it:

    # Update the package list and install Virtualbox
    sudo apt update && sudo apt install virtualbox
    
  • Vagrant (version 2.2.19 or newer), virtual machine management for development:

    # Install the Hashicorp trusted key
    wget -O- https://apt.releases.hashicorp.com/gpg | gpg --dearmor | sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpg
    # Install the Hashicorp source
    echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
    # Update the package list and install Vagrant
    sudo apt update && sudo apt install vagrant
    
  • Ansible (version 2.9 or newer) to provision VMs on VirtualBox. Install it with these commands:

    # Include the official project’s PPA (personal package archive)
    sudo apt-add-repository ppa:ansible/ansible
    # Update the package list and install Vagrant
    sudo apt update && sudo apt install ansible
    
  • Zip and Unzip popular tools for managing compressed files. Install them:

    # Update the package list and install zip and unzip
    sudo apt update && sudo apt install zip unzip
    
  • jq (JSON Processor), required for some scripts during deployment. Install jq:

    # Update the package list and install jq
    sudo apt update && sudo apt install jq
    

Deployment Steps

Follow these steps to deploy your TDP cluster:

Step 1: Clone the tdp-getting-started Project

Run the following commands to clone the project:

git clone https://github.com/TOSIT-IO/tdp-getting-started.git
cd tdp-getting-started

Step 2: Set Up the Environment

Run this command to set up essential components, including collections, jar releases, and Vagrant:

./scripts/setup.sh -e extras -e prerequisites -e vagrant

Step 3: Activate the Virtual Environment

Activate the Python virtual environment by running:

source ./venv/bin/activate && source .env

Step 4: Launch Virtual Machines

Use this command to start your virtual machines:

vagrant up

Step 5: Configure TDP Prerequisites

Run this playbook to configure services like Chrony, CA, LDAP, KDC, and PostgreSQL:

ansible-playbook ansible_collections/tosit/tdp_prerequisites/playbooks/all.yml

Step 6: Deployment

You can deploy your TDP cluster using either the TDP lib CLI or Ansible Playbooks.

You can refer to the official tutorial for other deployment methods. Software prerequisites may vary, as well as the Step 2, typically involving adapting the version tag and switching from -r stable, the default value, to -r latest.

To Deploy with TDP lib CLI, run:

tdp deploy

To Deploy with Ansible Playbooks, use:

ansible-playbook ansible_collections/tosit/tdp/playbooks/meta/all.yml

For extra services, run specific playbooks for Livy, ZooKeeper, Kafka, etc.

Step 7: Configuration

After deployment, configure HDFS user home directories:

ansible-playbook ansible_collections/tosit/tdp/playbooks/utils/hdfs_user_homes.yml

Configure Ranger policies:

ansible-playbook ansible_collections/tosit/tdp/playbooks/utils/ranger_policies.yml

Deploy Knox Gateway:

ansible-playbook ansible_collections/tosit/tdp/playbooks/meta/knox.yml

Next Steps

Connecting to the Cluster

Access nodes via ssh, for example:

vagrant ssh edge-01

Accessing the User Interface (UI)

To access the UI, configure your host for SPNEGO with your web browser. We recommend configuring your /etc/hosts file for easier access to the UIs. For detailed instructions, visit our host configuration page.

Learning to Use TDP

Now that you have a cluster up and running, explore our documentation to learn how to manage and administer your cluster effectively. You can also follow tutorials on specific components like HDFS, YARN, Spark, Hive and HBase.