Get in Touch

Course Outline

1: HDFS (17%)

  • Explain the roles of HDFS Daemons
  • Describe standard operational procedures for an Apache Hadoop cluster, covering both data storage and processing aspects.
  • Identify current computing system trends that necessitate the use of Apache Hadoop.
  • Outline the primary objectives behind HDFS design.
  • Evaluate scenarios to determine the appropriate use of HDFS Federation.
  • Recognize the components and daemons involved in an HDFS HA-Quorum cluster.
  • Analyze the role of HDFS security, specifically regarding Kerberos.
  • Select the most suitable data serialization method for specific scenarios.
  • Describe the processes involved in file reading and writing.
  • Identify commands for manipulating files within the Hadoop File System Shell.

2: YARN and MapReduce version 2 (MRv2) (17%)

  • Comprehend the impact of upgrading a cluster from Hadoop 1 to Hadoop 2 on cluster configurations.
  • Understand the deployment of MapReduce v2 (MRv2 / YARN), including all associated YARN daemons.
  • Grasp the fundamental design strategy of MapReduce v2 (MRv2).
  • Explain how YARN manages resource allocations.
  • Trace the workflow of a MapReduce job executing on YARN.
  • Identify the necessary file modifications required to migrate a cluster from MapReduce version 1 (MRv1) to MapReduce version 2 (MRv2) on YARN.

3: Hadoop Cluster Planning (16%)

  • Understand key considerations for selecting hardware and operating systems to host an Apache Hadoop cluster.
  • Analyze options when selecting an operating system.
  • Gain insight into kernel tuning and disk swapping mechanisms.
  • Given a specific scenario and workload pattern, identify the appropriate hardware configuration.
  • Given a scenario, determine the required ecosystem components to meet Service Level Agreements (SLAs).
  • Perform cluster sizing: based on a scenario and execution frequency, specify workload requirements including CPU, memory, storage, and disk I/O.
  • Address disk sizing and configuration, covering JBOD versus RAID, SANs, virtualization, and specific disk sizing needs within a cluster.
  • Evaluate Network Topologies: understand network usage in Hadoop (for both HDFS and MapReduce) and propose or identify key network design elements for given scenarios.

4: Hadoop Cluster Installation and Administration (25%)

  • Given a scenario, assess how the cluster handles disk and machine failures.
  • Analyze logging configurations and their file formats.
  • Understand the fundamentals of Hadoop metrics and cluster health monitoring.
  • Identify the functions and purposes of available cluster monitoring tools.
  • Install all ecosystem components in CDH 5, including (but not limited to): Impala, Flume, Oozie, Hue, Manager, Sqoop, Hive, and Pig.
  • Identify the functions and purposes of available tools for managing the Apache Hadoop file system.

5: Resource Management (10%)

  • Understand the overarching design goals of each Hadoop scheduler.
  • Given a scenario, determine how the FIFO Scheduler allocates cluster resources.
  • Given a scenario, determine how the Fair Scheduler allocates cluster resources under YARN.
  • Given a scenario, determine how the Capacity Scheduler allocates cluster resources.

6: Monitoring and Logging (15%)

  • Understand the functions and features of Hadoop’s metric collection capabilities.
  • Analyze the NameNode and JobTracker Web UIs.
  • Learn how to monitor cluster Daemons.
  • Identify and monitor CPU usage on master nodes.
  • Describe methods for monitoring swap and memory allocation across all nodes.
  • Identify procedures for viewing and managing Hadoop log files.
  • Interpret log file content.

Requirements

  • Foundational skills in Linux system administration
  • Basic programming proficiency
 35 Hours

Number of participants


Price per participant

Testimonials (3)

Upcoming Courses

Related Categories