Get in Touch

Course Outline

Big Data Overview:

  • Defining Big Data
  • The drivers behind the growing popularity of Big Data
  • Real-world Big Data Case Studies
  • Key Characteristics of Big Data
  • Available solutions for managing Big Data.

Hadoop and Its Components:

  • Introduction to Hadoop and its core components.
  • Hadoop Architecture and its capabilities regarding data processing and handling.
  • A brief history of Hadoop, including companies that have adopted it and the reasons for doing so.
  • Detailed explanation of the Hadoop framework and its components.
  • Understanding HDFS and the mechanics of reading and writing to the Hadoop Distributed File System.
  • Instructions for setting up a Hadoop cluster in various modes: Stand-alone, Pseudo-distributed, and Multi-Node.

(This section covers setting up a Hadoop cluster using VirtualBox, KVM, or VMware, essential network configurations, launching Hadoop Daemons, and testing the cluster).

  • Introduction to the Map Reduce framework and its operational principles.
  • Executing Map Reduce jobs on a Hadoop cluster.
  • Exploring Replication, Mirroring, and Rack Awareness within Hadoop clusters.

Planning a Hadoop Cluster:

  • Strategies for planning your Hadoop cluster.
  • Aligning hardware and software requirements for cluster planning.
  • Analyzing workloads and planning the cluster to prevent failures and ensure optimal performance.

Introduction to MapR and Its Advantages:

  • An overview of MapR and its architecture.
  • Understanding and utilizing MapR Control System, MapR Volumes, snapshots, and Mirrors.
  • Cluster planning specific to MapR environments.
  • Comparing MapR with other distributions and Apache Hadoop.
  • MapR installation and cluster deployment processes.

Cluster Setup and Administration:

  • Managing services, nodes, snapshots, mirrored volumes, and remote clusters.
  • Understanding and managing nodes.
  • Comprehending Hadoop components and installing them alongside MapR Services.
  • Accessing data on the cluster, including via NFS, and managing services and nodes.
  • Data management using volumes, user and group management, assigning roles to nodes, node commissioning and decommissioning, cluster administration, performance monitoring, configuring and analyzing metrics, and administering MapR security.
  • Understanding and working with M7-native storage for MapR tables.
  • Cluster configuration and tuning for optimal performance.

Cluster Upgrades and Integration with Other Setups:

  • Upgrading MapR software versions and understanding upgrade types.
  • Configuring the MapR cluster to access an HDFS cluster.
  • Deploying a MapR cluster on Amazon Elastic Mapreduce.

All topics above include demonstrations and practice sessions to provide learners with hands-on experience with the technology.

Requirements

  • Foundational knowledge of Linux File Systems
  • Basic understanding of Java
  • Familiarity with Apache Hadoop (recommended)
 28 Hours

Number of participants


Price per participant

Testimonials (1)

Upcoming Courses

Related Categories