Installation
Welcome to the TDP Installation Guide. This page will walk you through setting up a TDP cluster in a straightforward manner.
The deployment relies on the official tdp-getting-started
repository which provides scripts and configurations to launch a TDP environment on top of a cluster of 7 VMs.
This cluster aims to be used for testing purposes only. Refer to the Documentation for production-ready clusters.
This guide provides instructions for Ubuntu systems. It has also been tested with NixOS and MacOS. Only x86_64 systems have been used for now. Alternative Linux distributions, as well as Windows, shall work.
Requirements
Hardware and Software
Before you start, ensure your system meets the following requirements:
Hardware:
- CPU: 8 cores
- Storage: Minimum 3 GB (NVMe SSDs are better for performance)
- Memory: 32 GB
If required, update the machine resources assigned to the VMs in the
./inventory/group_vars/all.yml
file of your cloned repository.
Software:
-
Git a version control system. Install it by running these commands:
# Update the package list and install git sudo apt update && sudo apt install git
-
Python3 (version 3.6 or newer), a versatile programming language. Most systems come with Python pre-installed. To check and install if needed:
# Display the version of Python python3 -V
If not, install Python3 using the following command:
# Update the package list and install python3 sudo apt update && sudo apt install python3
-
venv (Python virtual environment), for managing Python dependencies like Ansible and JMESPath. Install venv:
# Update the package list and install venv sudo apt update && sudo apt install python3-venv
-
VirtualBox (version 6.1.26 or newer), a powerful virtualization software. Install it:
# Update the package list and install Virtualbox sudo apt update && sudo apt install virtualbox
-
Vagrant (version 2.2.19 or newer), virtual machine management for development:
# Install the Hashicorp trusted key wget -O- https://apt.releases.hashicorp.com/gpg | gpg --dearmor | sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpg # Install the Hashicorp source echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list # Update the package list and install Vagrant sudo apt update && sudo apt install vagrant
-
Ansible (version 2.9 or newer) to provision VMs on VirtualBox. Install it with these commands:
# Include the official project’s PPA (personal package archive) sudo apt-add-repository ppa:ansible/ansible # Update the package list and install Vagrant sudo apt update && sudo apt install ansible
-
Zip and Unzip popular tools for managing compressed files. Install them:
# Update the package list and install zip and unzip sudo apt update && sudo apt install zip unzip
-
jq (JSON Processor), required for some scripts during deployment. Install jq:
# Update the package list and install jq sudo apt update && sudo apt install jq
Deployment Steps
Follow these steps to deploy your TDP cluster:
Step 1: Clone the tdp-getting-started
Project
Run the following commands to clone the project:
git clone https://github.com/TOSIT-IO/tdp-getting-started.git
cd tdp-getting-started
Step 2: Set Up the Environment
Run this command to set up essential components, including collections, jar releases, and Vagrant:
./scripts/setup.sh -e extras -e prerequisites -e vagrant
Step 3: Activate the Virtual Environment
Activate the Python virtual environment by running:
source ./venv/bin/activate && source .env
Step 4: Launch Virtual Machines
Use this command to start your virtual machines:
vagrant up
Step 5: Configure TDP Prerequisites
Run this playbook to configure services like Chrony, CA, LDAP, KDC, and PostgreSQL:
ansible-playbook ansible_collections/tosit/tdp_prerequisites/playbooks/all.yml
Step 6: Deployment
You can deploy your TDP cluster using either the TDP lib CLI or Ansible Playbooks.
You can refer to the official tutorial for other deployment methods. Software prerequisites may vary, as well as the Step 2, typically involving adapting the version tag and switching from
-r stable
, the default value, to-r latest
.
To Deploy with TDP lib CLI, run:
tdp deploy
To Deploy with Ansible Playbooks, use:
ansible-playbook ansible_collections/tosit/tdp/playbooks/meta/all.yml
For extra services, run specific playbooks for Livy, ZooKeeper, Kafka, etc.
Step 7: Configuration
After deployment, configure HDFS user home directories:
ansible-playbook ansible_collections/tosit/tdp/playbooks/utils/hdfs_user_homes.yml
Configure Ranger policies:
ansible-playbook ansible_collections/tosit/tdp/playbooks/utils/ranger_policies.yml
Deploy Knox Gateway:
ansible-playbook ansible_collections/tosit/tdp/playbooks/meta/knox.yml
Next Steps
Connecting to the Cluster
Access nodes via ssh
, for example:
vagrant ssh edge-01
Accessing the User Interface (UI)
To access the UI, configure your host for SPNEGO with your web browser. We recommend configuring your /etc/hosts
file for easier access to the UIs. For detailed instructions, visit our host configuration page.
Learning to Use TDP
Now that you have a cluster up and running, explore our documentation to learn how to manage and administer your cluster effectively. You can also follow tutorials on specific components like HDFS, YARN, Spark, Hive and HBase.