Data Engineering Using AWS Analytics Services

By paidcoursesforfree Last updated Aug 26, 2021

Build Data Engineering Pipelines using AWS Analytics Services such as Glue, EMR, Athena, Kinesis, Quick Sight, etc

What you’ll learn

Data Engineering leveraging AWS Analytics features
Managing Tables using Glue Catalog
Engineering Batch Data Pipelines using Glue Jobs
Orchestrating Batch Data Pipelines using Glue Workflows
Running Queries using Athena – Server less query engine service
Using AWS Elastic Map Reduce (EMR) Clusters for building Data Pipelines
Using AWS Elastic Map Reduce (EMR) Clusters for reports and dashboards
Data Ingestion using Lambda Functions
Scheduling using Events Bridge
Engineering Streaming Pipelines using Kinesis
Streaming Web Server logs using Kinesis Firehose

Requirements

Programming experience using Python
Data Engineering experience using Spark
Ability to write and interpret SQL Queries
This course is ideal for experienced data engineers to add AWS Analytics Services as key skills to their profile

Description

Data Engineering is all about building Data Pipelines to get data from multiple sources into Data Lake or Data Warehouse and then from Data Lake or Data Warehouse to downstream systems. As part of this course, I will walk you through how to build Data Engineering Pipelines using AWS Analytics Stack. It includes services such as Glue, Elastic Map Reduce (EMR), Lambda Functions, Athena, QuickSight, and many more.

Here are the high-level steps which you will follow as part of the course.

Setup Development Environment
Getting Started with AWS
Development Life Cycle of Pyspark
Overview of Glue Components
Setup Spark History Server for Glue Jobs
Deep Dive into Glue Catalog
Exploring Glue Job APIs
Glue Job Bookmarks
Data Ingestion using Lambda Functions
Streaming Pipeline using Kinesis
Consuming Data from s3 using boto3
Populating GitHub Data to Dynamodb

Getting Started with AWS

Introduction – AWS Getting Started
Create s3 Bucket
Create IAM Group and User
Overview of Roles
Create and Attach Custom Policy
Configure and Validate AWS CLI

Development Lifecycle for Pyspark

Setup Virtual Environment and Install Pyspark
Getting Started with Pycharm
Passing Run Time Arguments
Accessing OS Environment Variables
Getting Started with Spark
Create Function for Spark Session
Setup Sample Data
Read data from files
Process data using Spark APIs
Write data to files
Validating Writing Data to Files
Productionizing the Code

Overview of Glue Components

Introduction – Overview of Glue Components
Create Crawler and Catalog Table
Analyze Data using Athena
Creating S3 Bucket and Role
Create and Run the Glue Job
Validate using Glue CatalogTable and Athena
Create and Run Glue Trigger
Create Glue Workflow
Run Glue Workflow and Validate

Who this course is for:

Beginner or Intermediate Data Engineers who want to learn AWS Analytics Services for Data Engineering
Intermediate Application Engineers who want to explore Data Engineering using AWS Analytics Services
Data and Analytics Engineers who want to learn Data Engineering using AWS Analytics Services
Testers who want to learn Databricks to test Data Engineering applications built using AWS Analytics Services

Created by Durga Viswanatha Raju Gadiraju
Last updated 8/2021
English
English [Auto]

Size: 3.99 GB

Download Course