Ports and protocols used by TDP

System

  • SSH
    • Port: 22
    • Protocol: SSH
    • External access:
      • All nodes for administrators and deployment scripts
      • Edges nodes for user access to a working pre-configured environment

HDFS

  • NameNode
    • HTTPS service The namenode secure HTTP server address and port. It provides access to the HDFS web UI.
      • Port: 9871
      • Protocol: HTTPS
      • Property: dfs.namenode.https-address
      • External access: yes
      • Source
    • RPC service Main RPC port used by client to communicate with HDFS using a binary protocol. The port is embedded in the URI, eg hdfs://nn1.domain.com:8020/.
  • ZKFC
    • RPC access RPC port used by Zookeeper Failover Controller.
  • DataNode
    • Secure data transfert The datanode server address and port for data transfer. The value depends on the usage of SASL to authenticate data transfer protocol instead of running the DataNode as root, learn more about securing the DataNode.
    • HTTPS service The datanode secure HTTP server address and port. It is used to access the status, logs, etc, and file data operations when using WebHDFS or HttpFS. The NameNode UI redirects the user to the DataNode server when browsing files.
    • RPC service The DataNode RCP server address and port used for metadata information.
  • JournalNode
    • RPC server The JournalNode RPC server address and port.
    • HTTPS server The address and port the JournalNode HTTPS server listens on. If the port is 0 then the server will start on a free port.

YARN

  • ResourceManager
    • Ressource tracker This is used by the Node Manager to register/nodeHeartbeat/unregister with the ResourceManager.
      • Port: 8031
      • Protocol: IPC
      • Property: yarn.resourcemanager.resource-tracker.address
      • External access: no
      • Sources:
    • RPC server The address of the applications manager interface in the RM. It is used to submit jobs. In YARN non HA configuration yarn.resourcemanager.address uses port 8050. In YARN HA configuration, yarn.resourcemanager.address is redundant and instead yarn.resourcemanager.address.{id} is resolved and uses port 8032.
    • HTTPS server The HTTPS adddress of the RM web UI application. It is used to monitor applications.
      • Port: 8090
      • Protocol: HTTPS
      • Property: yarn.resourcemanager.webapp.https.address
      • External access: yes
      • Sources:
    • Admin RPC server It is used by administrators and developers.
    • Scheduler It is used by administrators and developers.
      • Port: 8030
      • Protocol: RPC
      • Property: yarn.resourcemanager.scheduler.address
      • External access: no
      • Sources
  • NodeManager
    • Container Manager The address of the container manager in the NodeManager. Access is typically granted to admins, and Dev/Support teams.
      • Port: 0 (default for dynamic port allocation) or 45454 (static port by convention)
      • Protocol: RPC
      • Property: yarn.nodemanager.address
      • External access: yes
      • Sources
    • Localizer Address where the localizer IPC is. It is responsible for downloading and copying remote resources on the local filesystem.
    • HTTPS server WebUI server of the NodeManager for administrator and developers.
      • Port: 8044
      • Protocol: HTTPS
      • Property: yarn.nodemanager.webapp.https.address
      • External access: yes
      • Sources
    • MapReduce ApplicationMaster Ephemeral HTTPS ports are opened by each ApplicationMaster. The tdp-collection default port range is unrestricted but is parameterized in the tdp_var_defaults/hadoop/hadoop.yml inventory file under the yarn.app.mapreduce.am.job.client.port-range property. Ports within this range can be accessible from outside the cluster if the permitted by the network firewall. Note that this only restricts the port range used for mapreduce jobs. In the case of Spark, refer to the Spark Driver port documentation, such as the spark.driver.port property.
      • Port: Random
      • Protocol: HTTP
      • Property: yarn.app.mapreduce.am.job.client.port-range
      • External access: no
  • App Timeline Server
    • RPC server This address for the timeline server to start the RPC server. It addresses the storage and retrieval of application’s current and historic information in a generic fashion.
    • HTTPS server The web UI of the timeline server.
      • Port: 8190
      • Protocol: RPC
      • Property: yarn.timeline-service.webapp.https.address
      • External access: yes
      • Sources

MapReduce Job History Server

  • Job History RPC server Server where client applications submit MapReduce jobs, {FQDN}:{PORT}.
  • Job History WebUI The MapReduce JobHistory Server Web UI, {FQDN}:{PORT}. It is used by administrators and developers.
  • Shuffle Handler
    • Default port that the ShuffleHandler will run on. ShuffleHandler is a service run at the NodeManager to facilitate transfers of intermediate Map outputs to requesting Reducers.
    • Port: 13562
    • Protocol: RPC
    • Property: mapreduce.shuffle.port
    • External access: no
    • Sources
  • RPC admin server The address of the History server admin interface.

ZooKeeper

  • Client connections Server dedicated to client connections.
  • Leader server Peers use the former port to connect to other peers, for example, to agree upon the order of updates. More specifically, a ZooKeeper server uses this port to connect followers to the leader.
  • Leader election connections Server connections used during the leader election phase.

Hive

  • Hive Metastore
    Store metadata information to expose data storage into a relational model.
  • Hive Server 2
    The JDBC/ODBC interface to the Hive Metastore.
    • Port: 10000 (in TCP) or 10001 (in HTTP)
    • Protocol: RPC
    • Property: hive.server2.thrift.port
    • External access: yes
    • Source
  • Web User Interface (UI)
    Port the the web UI to provides configuration, logging, metrics and active session information.

Ranger

  • Policy Manager Port for Ranger secured admin web UI.

Oozie

  • Web UI
    Port for the secured Oozie web UI.
  • Admin The admin port Oozie server runs. It may be opened externally if job submissions are accepted from outside the cluster.

Spark

  • Driver port If the spark.driver.port fails, it is incremented by 1 and retried up to spark.port.maxRetries times. spark.blockManager.port must be larger than spark.driver.port + sparkspark.sport.maxRetries. These parameters are identical for Spark2 and Spark3.

Knox

  • Gateway The port of Knox main gateway to internal cluster services.
    • Port: 8443
    • Protocol: HTTPS
    • Property: gateway.port
    • External access: yes
    • Source

Additionnal resources