Welcome to ONLC Training Centers

MOC On-Demand: 20775-Perform Data Engineering on Microsoft HDInsight Course Outline

 (5 days)

Please note: This course aligns to Microsoft exam 70-775. Exam 70-775 retired 06-30-2019.

*** Note: This is an On-Demand Self Study Class, 5-days of content, 90-days unlimited access, $995 ***
You can take this class at any time; there are no set dates. It covers the same content as the 5-day instructor-led class of the same name. The cost for this MOC On-Demand class is $995. (Applicable State and Local taxes may be added for On-Demand purchases, depending on your location.) Microsoft Enterprise customers paying with Software Assurance Vouchers, see SATV Payment note below.

MOC On-Demand Learner Profiles
MOC On-Demand is a self-study training solution that was designed for two types of learners. First, MOC On-Demand is a great fit for experienced IT professionals who don't need a traditional 5-day class to upgrade their existing skills. They can pick and choose topics to make the most effective use of their time. Second, MOC On-Demand is perfect for highly-motivated individuals who are new to a technology and need to space their learning over a period of weeks or months. These learners can take their time and repeat sections as needed until they master the new concepts.

About MOC On-Demand
Our MOC On-Demand classes are self-study courses with 30 to 40 hours of content. They include hours of videos, hands-on labs using the actual software, and knowledge checks and were created by Microsoft to mirror the content found in the traditional live instructor-led version of this course. Those features are all part of the standard MOC On-Demand training. But don't settle for the standard MOC On-Demand class! Check out the "ONLC Extras" that you get when purchasing this course from us.

ONLC Extras
ONLC Training Centers bundles in valuable extras with our MOC On-Demand Courses. These items are not available from other training companies.
Courseware After the Course. Get the digital courseware that is used in the live, instructor-led version of this class. While the MOC On-Demand access goes away after 90 days, you will have access to the "extra" digital courseware for an unlimited period of time.
24/7 Online Support. You will be able to chat online with a content matter expert while you are taking your MOC On-Demand class. And, with your permission, the expert can even take over your computer to provide with assistance with your labs.

Optional Add-Ons
These add-ons are available exclusively by ONLC Training Centers and are offered to you at an additional cost.
Certification Pak, $150. Interested in obtaining certification? Get a Transcender practice exam and a Microsoft exam voucher at this reduced price.
ILT Listener, $250. Want to listen in and follow along with a live Instructor-Led Training (ILT) class? We offer this option for individuals on a limited budget who have time during the day to hear a live class in progress. ILT Listeners have access to their online support chat expert during the class but they do not have direct access to the live instructor.
ILT Participant, $ Varies. You've purchased MOC On-Demand, have gone through the training and decided that you still want a live class. Just pay difference between MOC On-Demand course and and the Instructor-Led Training (ILT) class and you can have a seat in our live class. Get both self-study and live, instructor-led training for the retail price of the instructor-led class alone!

Paying with Software Assurance Training Vouchers (SATV)
For Microsoft Enterprise customers paying with Software Assurance Vouchers, the cost of this class is 5 vouchers--this includes access to the self-study materials, the student workbook, 24/7 access to an online expert, and a corresponding exam voucher, if applicable, upon request.

Do You Still Prefer a Live, Instructor-led Class?
Already know MOC On-Demand is not right for you? We also offer this same course content in a live, instructor-led format. For more details, click on the link below:
20775 Instructor-led


The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.

Audience profile
The primary audience for this course is data engineers, data architects, data scientists, and data developers who plan to implement big data engineering workflows on HDInsight.

In addition to their professional experience, students who attend this course should have:
• Programming experience using R, and familiarity with common R packages
• Knowledge of common statistical methods and data analysis best practices.
• Basic knowledge of the Microsoft Windows operating system and its core functionality.
• Working knowledge of relational databases.

At course completion
After completing this course, students will be able to:
• Deploy HDInsight Clusters.
• Authorizing Users to Access Resources.
• Loading Data into HDInsight.
• Troubleshooting HDInsight.
• Implement Batch Solutions.
• Design Batch ETL Solutions for Big Data with Spark
• Analyze Data with Spark SQL.
• Analyze Data with Hive and Phoenix.
• Describe Stream Analytics.
• Implement Spark Streaming Using the DStream API.
• Develop Big Data Real-Time Processing Solutions with Apache Storm.
• Build Solutions that use Kafka and HBase.

Course Outline

Module 1: Getting Started with HDInsight
This module introduces Hadoop, the MapReduce paradigm, and HDInsight.
• What is Big Data?
• Introduction to Hadoop
• Working with MapReduce Function
• Introducing HDInsight
Lab : Working with HDInsight
• Provision an HDInsight cluster and run MapReduce jobs

Module 2: Deploying HDInsight Clusters
This module provides an overview of the Microsoft Azure HDInsight cluster types, in addition to the creation and maintenance of the HDInsight clusters. The module also demonstrates how to customize clusters by using script actions through the Azure Portal, Azure PowerShell, and the Azure command-line interface (CLI). This module includes labs that provide the steps to deploy and manage the clusters.
• Identifying HDInsight cluster types
• Managing HDInsight clusters by using the Azure portal
• Managing HDInsight Clusters by using Azure PowerShell
Lab : Managing HDInsight clusters with the Azure Portal
• Create an HDInsight cluster that uses Data Lake Store storage
• Customize HDInsight by using script actions
• Delete an HDInsight cluster

Module 3: Authorizing Users to Access Resources
This module provides an overview of non-domain and domain-joined Microsoft HDInsight clusters, in addition to the creation and configuration of domain-joined HDInsight clusters. The module also demonstrates how to manage domain-joined clusters using the Ambari management UI and the Ranger Admin UI. This module includes the labs that will provide the steps to create and manage domain-joined clusters.
• Non-domain Joined clusters
• Configuring domain-joined HDInsight clusters
• Manage domain-joined HDInsight clusters
Lab : Authorizing Users to Access Resources
• Prepare the Lab Environment
• Manage a non-domain joined cluster

Module 4: Loading data into HDInsight
This module provides an introduction to loading data into Microsoft Azure Blob storage and Microsoft Azure Data Lake storage. At the end of this lesson, you will know how to use multiple tools to transfer data to an HDInsight cluster. You will also learn how to load and transform data to decrease your query run time.
• Storing data for HDInsight processing
• Using data loading tools
• Maximising value from stored data
Lab : Loading Data into your Azure account
• Load data for use with HDInsight

Module 5: Troubleshooting HDInsight
In this module, you will learn how to interpret logs associated with the various services of Microsoft Azure HDInsight cluster to troubleshoot any issues you might have with these services. You will also learn about Operations Management Suite (OMS) and its capabilities.
• Analyze HDInsight logs
• YARN logs
• Heap dumps
• Operations management suite
Lab : Troubleshooting HDInsight
• Analyze HDInsight logs
• Analyze YARN logs
• Monitor resources with Operations Management Suite

Module 6: Implementing Batch Solutions
In this module, you will look at implementing batch solutions in Microsoft Azure HDInsight by using Hive and Pig. You will also discuss the approaches for data pipeline operationalization that are available for big data workloads on an HDInsight stack.
• Apache Hive storage
• HDInsight data queries using Hive and Pig
• Operationalize HDInsight
Lab : Implement Batch Solutions
• Deploy HDInsight cluster and data storage
• Use data transfers with HDInsight clusters
• Query HDInsight cluster data

Module 7: Design Batch ETL solutions for big data with Spark
This module provides an overview of Apache Spark, describing its main characteristics and key features. Before you start, it’s helpful to understand the basic architecture of Apache Spark and the different components that are available. The module also explains how to design batch Extract, Transform, Load (ETL) solutions for big data with Spark on HDInsight. The final lesson includes some guidelines to improve Spark performance.
• What is Spark?
• ETL with Spark
• Spark performance
Lab : Design Batch ETL solutions for big data with Spark.
• Create a HDInsight Cluster with access to Data Lake Store
• Use HDInsight Spark cluster to analyze data in Data Lake Store
• Analyzing website logs using a custom library with Apache Spark cluster on HDInsight
• Managing resources for Apache Spark cluster on Azure HDInsight

Module 8: Analyze Data with Spark SQL
This module describes how to analyze data by using Spark SQL. In it, you will be able to explain the differences between RDD, Datasets and Dataframes, identify the uses cases between Iterative and Interactive queries, and describe best practices for Caching, Partitioning and Persistence. You will also look at how to use Apache Zeppelin and Jupyter notebooks, carry out exploratory data analysis, then submit Spark jobs remotely to a Spark cluster.
• Implementing iterative and interactive queries
• Perform exploratory data analysis
Lab : Performing exploratory data analysis by using iterative and interactive queries
• Build a machine learning application
• Use zeppelin for interactive data analysis
• View and manage Spark sessions by using Livy

Module 9: Analyze Data with Hive and Phoenix
In this module, you will learn about running interactive queries using Interactive Hive (also known as Hive LLAP or Live Long and Process) and Apache Phoenix. You will also learn about the various aspects of running interactive queries using Apache Phoenix with HBase as the underlying query engine.
• Implement interactive queries for big data with interactive hive.
• Perform exploratory data analysis by using Hive
• Perform interactive processing by using Apache Phoenix
Lab : Analyze data with Hive and Phoenix
• Implement interactive queries for big data with interactive Hive
• Perform exploratory data analysis by using Hive
• Perform interactive processing by using Apache Phoenix

Module 10: Stream Analytics
The Microsoft Azure Stream Analytics service has some built-in features and capabilities that make it as easy to use as a flexible stream processing service in the cloud. You will see that there are a number of advantages to using Stream Analytics for your streaming solutions, which you will discuss in more detail. You will also compare features of Stream Analytics to other services available within the Microsoft Azure HDInsight stack, such as Apache Storm. You will learn how to deploy a Stream Analytics job, connect it to the Microsoft Azure Event Hub to ingest real-time data, and execute a Stream Analytics query to gain low-latency insights. After that, you will learn how Stream Analytics jobs can be monitored when deployed and used in production settings.
• Stream analytics
• Process streaming data from stream analytics
• Managing stream analytics jobs
Lab : Implement Stream Analytics
• Process streaming data with stream analytics
• Managing stream analytics jobs

Module 11: Implementing Streaming Solutions with Kafka and HBase
In this module, you will learn how to use Kafka to build streaming solutions. You will also see how to use Kafka to persist data to HDFS by using Apache HBase, and then query this data.
• Building and Deploying a Kafka Cluster
• Publishing, Consuming, and Processing data using the Kafka Cluster
• Using HBase to store and Query Data
Lab : Implementing Streaming Solutions with Kafka and HBase
• Create a virtual network and gateway
• Create a storm cluster for Kafka
• Create a Kafka producer
• Create a streaming processor client topology
• Create a Power BI dashboard and streaming dataset
• Create an HBase cluster
• Create a streaming processor to write to HBase

Module 12: Develop big data real-time processing solutions with Apache Storm
This module explains how to develop big data real-time processing solutions with Apache Storm.
• Persist long term data
• Stream data with Storm
• Create Storm topologies
• Configure Apache Storm
Lab : Developing big data real-time processing solutions with Apache Storm
• Stream data with Storm
• Create Storm Topologies

Module 13: Create Spark Streaming Applications
This module describes Spark Streaming; explains how to use discretized streams (DStreams); and explains how to apply the concepts to develop Spark Streaming applications.
• Working with Spark Streaming
• Creating Spark Structured Streaming Applications
• Persistence and Visualization
Lab : Building a Spark Streaming Application
• Installing Required Software
• Building the Azure Infrastructure
• Building a Spark Streaming Pipeline
View outline in Word


Attend hands-on, instructor-led MOC On-Demand: 20775-Perform Data Engineering on Microsoft HDInsight training classes at ONLC's more than 300 locations. Not near one of our locations? Attend these same live classes from your home/office PC via our Remote Classroom Instruction (RCI) technology.

For additional training options, check out our list of Courses and select the one that's right for you.

Microsoft Gold Partner


Need a price quote?

Follow the link to our self-service price quote form to generate an email with a price quote.

Attend computer classes from ONLC Training Centers Request a copy via mail


Class Format
Class Policies
Student Reviews

Bookmark and Share

First Name

Last Name