The role of TDP is to provide a trusted and reliable platform to store and process a large amount of data. The feature selection, the choice of components, and the recommended usages are driven by the use-cases where the Hadoop ecosystem excels. The platform provides a robust core that is meant to be extended with complementary open-source and proprietary solutions for additional value.
Common usages of TDP
Secure and reliable storage expandable to multiple petabytes.
Scalable platform to handle large volumes of ingested data, in all possible forms, coming in batch and streaming
Data engineering with distributed processing engines including Spark and MapReduce, data cleaning, and data publication.
Management for structured, unstructured, and semi-structured data with enterprise-level governance.
Data mining and data science
Ad-hoc exploration, web UI for data access and visualization.
Storage of medium to cold data while preserving processing possibilities.