Big Data Hadoop Certification Training
Pinnacledu’s Big Data Hadoop Training Course is curated by Hadoop industry experts, and it covers in-depth knowledge on Big Data and Hadoop Ecosystem tools such as HDFS, YARN, MapReduce, Hive, Pig, HBase, Spark, Oozie, Flume and Sqoop. Throughout this online instructor-led Hadoop Training, you will be working on real-life industry use cases in Retail, Social Media, Aviation, Tourism and Finance domain using Cloud Lab.
- Lectures 110
- Quizzes 0
- Duration 50 hours
- Skill level All levels
- Language English
- Students 73293
- Assessments Yes
Understanding Big Data and Hadoop
Learning Objectives: In this module, you will understand what Big Data is, the limitations of the traditional solutions for Big Data problems, how Hadoop solves those Big Data problems, Hadoop Ecosystem, Hadoop Architecture, HDFS, Anatomy of File Read and Write & how MapReduce works.
Hadoop Architecture and HDFS
Learning Objectives: In this module, you will learn Hadoop Cluster Architecture, important configuration files of Hadoop Cluster, Data Loading Techniques using Sqoop & Flume, and how to setup Single Node and Multi-Node Hadoop Cluster.
Hadoop MapReduce Framework
Learning Objectives: In this module, you will understand Hadoop MapReduce framework comprehensively, the working of MapReduce on data stored in HDFS. You will also learn the advanced MapReduce concepts like Input Splits, Combiner & Partitioner.
- Traditional way vs MapReduce way
- Why MapReduce
- YARN Components
- YARN Architecture
- YARN MapReduce Application Execution Flow
- YARN Workflow
- Anatomy of MapReduce Program
- Input Splits, Relation between Input Splits and HDFS Blocks
- MapReduce: Combiner & Partitioner
- Demo of Health Care Dataset
- Demo of Weather Dataset
Advanced Hadoop MapReduce
Learning Objectives: In this module, you will learn Advanced MapReduce concepts such as Counters, Distributed Cache, MRunit, Reduce Join, Custom Input Format, Sequence Input Format and XML parsing.
Learning Objectives: In this module, you will learn Apache Pig, types of use cases where we can use Pig, tight coupling between Pig and MapReduce, and Pig Latin scripting, Pig running modes, Pig UDF, Pig Streaming & Testing Pig Scripts. You will also be working on healthcare dataset.
Learning Objectives: This module will help you in understanding Hive concepts, Hive Data types, loading and querying data in Hive, running hive scripts and Hive UDF.
- Introduction to Apache Hive
- Hive vs Pig
- Hive Architecture and Components
- Hive Metastore
- Limitations of Hive
- Comparison with Traditional Database
- Hive Data Types and Data Models
- Hive Partition
- Hive Bucketing
- Hive Tables (Managed Tables and External Tables)
- Importing Data
- Querying Data & Managing Outputs
- Hive Script & Hive UDF
- Retail use case in Hive
- Hive Demo on Healthcare Dataset
Advanced Apache Hive and HBase
Learning Objectives: In this module, you will understand advanced Apache Hive concepts such as UDF, Dynamic Partitioning, Hive indexes and views, and optimizations in Hive. You will also acquire in-depth knowledge of Apache HBase, HBase Architecture, HBase running modes and its components.
- Hive QL: Joining Tables, Dynamic Partitioning
- Custom MapReduce Scripts
- Hive Indexes and views
- Hive Query Optimizers
- Hive Thrift Server
- Hive UDF
- Apache HBase: Introduction to NoSQL Databases and HBase
- HBase v/s RDBMS
- HBase Components
- HBase Architecture
- HBase Run Modes
- HBase Configuration
- HBase Cluster Deployment
Advanced Apache HBase
Learning Objectives: This module will cover advance Apache HBase concepts. We will see demos on HBase Bulk Loading & HBase Filters. You will also learn what Zookeeper is all about, how it helps in monitoring a cluster & why HBase uses Zookeeper.
Processing Distributed Data with Apache Spark
Learning Objectives: In this module, you will learn what is Apache Spark, SparkContext & Spark Ecosystem. You will learn how to work in Resilient Distributed Datasets (RDD) in Apache Spark. You will be running application on Spark Cluster & comparing the performance of MapReduce and Spark.
Oozie and Hadoop Project
Learning Objectives: In this module, you will understand how multiple Hadoop ecosystem components work together to solve Big Data problems. This module will also cover Flume & Sqoop demo, Apache Oozie Workflow Scheduler for Hadoop Jobs, and Hadoop Talend integration.
Certification Project 1
Analyses of a Online Book Store
Certification Project 2