Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
1: HDFS (17%)
- Explain the roles of HDFS Daemons
- Describe standard operational procedures for an Apache Hadoop cluster, covering both data storage and processing aspects.
- Identify current computing system trends that necessitate the use of Apache Hadoop.
- Outline the primary objectives behind HDFS design.
- Evaluate scenarios to determine the appropriate use of HDFS Federation.
- Recognize the components and daemons involved in an HDFS HA-Quorum cluster.
- Analyze the role of HDFS security, specifically regarding Kerberos.
- Select the most suitable data serialization method for specific scenarios.
- Describe the processes involved in file reading and writing.
- Identify commands for manipulating files within the Hadoop File System Shell.
2: YARN and MapReduce version 2 (MRv2) (17%)
- Comprehend the impact of upgrading a cluster from Hadoop 1 to Hadoop 2 on cluster configurations.
- Understand the deployment of MapReduce v2 (MRv2 / YARN), including all associated YARN daemons.
- Grasp the fundamental design strategy of MapReduce v2 (MRv2).
- Explain how YARN manages resource allocations.
- Trace the workflow of a MapReduce job executing on YARN.
- Identify the necessary file modifications required to migrate a cluster from MapReduce version 1 (MRv1) to MapReduce version 2 (MRv2) on YARN.
3: Hadoop Cluster Planning (16%)
- Understand key considerations for selecting hardware and operating systems to host an Apache Hadoop cluster.
- Analyze options when selecting an operating system.
- Gain insight into kernel tuning and disk swapping mechanisms.
- Given a specific scenario and workload pattern, identify the appropriate hardware configuration.
- Given a scenario, determine the required ecosystem components to meet Service Level Agreements (SLAs).
- Perform cluster sizing: based on a scenario and execution frequency, specify workload requirements including CPU, memory, storage, and disk I/O.
- Address disk sizing and configuration, covering JBOD versus RAID, SANs, virtualization, and specific disk sizing needs within a cluster.
- Evaluate Network Topologies: understand network usage in Hadoop (for both HDFS and MapReduce) and propose or identify key network design elements for given scenarios.
4: Hadoop Cluster Installation and Administration (25%)
- Given a scenario, assess how the cluster handles disk and machine failures.
- Analyze logging configurations and their file formats.
- Understand the fundamentals of Hadoop metrics and cluster health monitoring.
- Identify the functions and purposes of available cluster monitoring tools.
- Install all ecosystem components in CDH 5, including (but not limited to): Impala, Flume, Oozie, Hue, Manager, Sqoop, Hive, and Pig.
- Identify the functions and purposes of available tools for managing the Apache Hadoop file system.
5: Resource Management (10%)
- Understand the overarching design goals of each Hadoop scheduler.
- Given a scenario, determine how the FIFO Scheduler allocates cluster resources.
- Given a scenario, determine how the Fair Scheduler allocates cluster resources under YARN.
- Given a scenario, determine how the Capacity Scheduler allocates cluster resources.
6: Monitoring and Logging (15%)
- Understand the functions and features of Hadoop’s metric collection capabilities.
- Analyze the NameNode and JobTracker Web UIs.
- Learn how to monitor cluster Daemons.
- Identify and monitor CPU usage on master nodes.
- Describe methods for monitoring swap and memory allocation across all nodes.
- Identify procedures for viewing and managing Hadoop log files.
- Interpret log file content.
Requirements
- Foundational skills in Linux system administration
- Basic programming proficiency
35 Hours
Testimonials (3)
I genuinely enjoyed the many hands-on sessions.
Jacek Pieczatka
Course - Administrator Training for Apache Hadoop
I genuinely enjoyed the big competences of Trainer.
Grzegorz Gorski
Course - Administrator Training for Apache Hadoop
I mostly liked the trainer giving real live Examples.