Loading Classes

« All Classes

Virtual Class Virtual Class
  • This class has passed.
Jan 1

Data Engineering on Google Cloud Platform

January 1 @ 8:00 AM - 5:00 PM AEST

Virtual Class Virtual Class
Google Cloud logo

Data Engineering on Google Cloud

This instructor led live training gives you hands-on experience with designing and building data processing systems on Google Cloud. This course uses lectures, demos, and hands-on labs to show you how to design data processing systems, build end-to-end data pipelines, analyze data, and implement machine learning. This course covers structured, unstructured, and streaming data.

What you Will learn

What's Included?

Instructor Live Training

An instructor will answer your questions

OFFICIAL GOOGLE CLOUD CONTENT

Course content reflects the latest google cloud class

hands on labs

Real world hands on labs provided by Qwiklabs and supported by instructor

CertIficate of completion

Receive official certificate on completion of 80% of labs

Who's this course for?

Level

Language

Duration

Prerequisites

products

Course Content

Topics

  • Explore the role of a data engineer
  • Analyze data engineering challenges
  • Introduction to BigQuery
  • Data lakes and data warehouses
  • Transactional databases versus data warehouses
  • Partner effectively with other data teams
  • Manage data access and governance
  • Build production-ready pipelines
  • Review Google Cloud customer case study

Objectives

  • Understand the role of a data engineer
  • Discuss benefits of doing data engineering in the cloud
  • Discuss challenges of data engineering practice and how building data pipelines in
    the cloud helps to address these
  • Review and understand the purpose of a data lake versus a data warehouse, and when to use which

Activities

  • Lab: Using BigQuery to do Analysis

Topics

  • Introduction to data lakes
  • Data storage and ETL options on Google Cloud
  • Building a data lake using Cloud Storage
  • Securing Cloud Storage
  • Storing all sorts of data types
  • Cloud SQL as a relational data lake

Objectives

  • Understand why Cloud Storage is a great option for building a data lake on Google
    Cloud
  • Learn how to use Cloud SQL for a relational data lake

Activities

  • Lab: Loading Taxi Data into Cloud SQL

Topics

  • The modern data warehouse
  • Introduction to BigQuery
  • Getting started with BigQuery
  • Loading data
  • Exploring schemas
  • Schema design
  • Nested and repeated fields
  • Optimizing with partitioning and clustering

Objectives

  • Discuss requirements of a modern warehouse
  • Understand why BigQuery is the scalable data warehousing solution on Google Cloud
  • Understand core concepts of BigQuery and review options of loading data into BigQuery

Activities

  • Lab: Loading Data into BigQuery
  • Lab: Working with JSON and Array Data in BigQuery

Topics

  • EL, ELT, ETL
  • Quality considerations
  • How to carry out operations in BigQuery
  • Shortcomings
  • ETL to solve data quality issues

Objectives

  • Review different methods of loading data into your data lakes and warehouses: EL,
    ELT, and ETL
  • Discuss data quality considerations and when to use ETL instead of EL and ELT

Topics

  • The Hadoop ecosystem
  • Run Hadoop on Dataproc
  • Cloud Storage instead of HDFS
  • Optimize Dataproc

Objectives

  • Review the parts of the Hadoop ecosystem
  • Learn how to lift and shift your existing Hadoop workloads to the cloud using
    Dataproc
  • Understand considerations around using Cloud Storage instead of HDFS for storage
  • Learn how to optimize Dataproc jobs

Activities

  • Lab: Running Apache Spark jobs on Dataproc

Topics

  • Introduction to Dataflow
  • Why customers value Dataflow
  • Dataflow pipelines
  • Aggregating with GroupByKey and Combine
  • Side inputs and windows
  • Dataflow templates
  • Dataflow SQL

Objectives

  • Understand how to decide between Dataflow and Dataproc for processing data
    pipelines
  • Understand the features that customers value in Dataflow
  • Discuss core concepts in Dataflow
  • Review the use of Dataflow templates and SQL

Activities

  • Lab: A Simple Dataflow Pipeline (Python/Java)
  • Lab: MapReduce in Dataflow (Python/Java)
  • Lab: Side inputs (Python/Java)

Topics

  • Building batch data pipelines visually with Cloud Data Fusion
  • Components
  • UI overview
  • Building a pipeline
  • Exploring data using Wrangler
  • Orchestrating work between Google Cloud services with Cloud Composer
  • Apache Airflow environment
  • DAGs and operators
  • Workflow scheduling
  • Monitoring and logging

Objectivies

  • Discuss how to manage your data pipelines with Data Fusion and Cloud Composer
  • Understand Data Fusion’s visual design capabilities
  • Learn how Cloud Composer can help to orchestrate the work across multiple Google
    Cloud services

Activities

  • Lab: Building and Executing a Pipeline Graph in Data Fusion
  • Optional Lab: An introduction to Cloud Composer

Topics

  • Process Streaming Data

Objectives

  • Explain streaming data processing
  • Describe the challenges with streaming data
  • Identify the Google Cloud products and tools that can help address streaming data
    challenges

Topics

  • Introduction to Pub/Sub
  • Pub/Sub push versus pull
  • Publishing with Pub/Sub code

Objectives

  • Describe the Pub/Sub service
  • Understand how Pub/Sub works
  • Gain hands-on Pub/Sub experience with a lab that simulates real-time streaming
    sensor data

Activities

  • Lab: Publish Streaming Data into Pub/Sub

Topics

  • Steaming data challenges
  • Dataflow windowing

Objectives

  • Understand the Dataflow service
  • Build a stream processing pipeline for live traffic data
  • Demonstrate how to handle late data using watermarks, triggers, and accumulation

Activities

  • Lab: Streaming Data Pipelines

Topics

  • Streaming into BigQuery and visualizing results
  • High-throughput streaming with Cloud Bigtable
  • Optimizing Cloud Bigtable performance

Objectives

  • Learn how to perform ad hoc analysis on streaming data using BigQuery and
    dashboards
  • Understand how Cloud Bigtable is a low-latency solution
  • Describe how to architect for Bigtable and how to ingest data into Bigtable
  • Highlight performance considerations for the relevant services

Activities

  • Lab: Streaming Analytics and Dashboards
  • Lab: Streaming Data Pipelines into Bigtable

Topics

  • Analytic window functions
  • Use With clauses
  • GIS functions
  • Performance considerations

Objectivies

  • Review some of BigQuery’s advanced analysis capabilities
  • Discuss ways to improve query performance

Activities

  • Lab: Optimizing your BigQuery Queries for Performance
  • Optional Lab: Partitioned Tables in BigQuery

Topics

  • What is AI?
  • From ad-hoc data analysis to data-driven decisions
  • Options for ML models on Google Cloud

Objectives

  • Understand the proposition that ML adds value to your data
  • Understand the relationship between ML, AI, and Deep Learning
  • Identify ML options on Google Cloud

Topics

  • Unstructured data is hard
  • ML APIs for enriching data

Objectives

  • Discuss challenges when working with unstructured data
  • Learn the applications of ready-to-use ML APIs on unstructured data

Activities

  • Lab: Using the Natural Language API to Classify Unstructured Text

Topics

  • What’s a notebook?
  • BigQuery magic and ties to Pandas

Objectives

  • Introduce Notebooks as a tool for prototyping ML solutions
  • Learn to execute BigQuery commands from Notebooks

Activities

  • Lab: BigQuery in Jupyter Labs on AI Platform

Topics

  • Ways to do ML on Google Cloud
  • Kubeflow
  • AI Hub

Objectives

  • Describe options available for building custom ML models
  • Understand the use of tools like Kubeflow

Activities

  • Lab: Running ML Pipelines on Kubeflow

Topics

  • BigQuery ML for quick model building
  • Supported models

Objectives

  • Learn how to create ML models by using SQL syntax in BigQuery
  • Demonstrate building different kinds of ML models using BigQuery ML

Activities

  • Lab option 1: Predict Bike Trip Duration with a Regression Model in BigQuery ML
  • Lab option 2: Movie Recommendations in BigQuery ML

Topics

  • Why AutoML?
  • AutoML Vision
  • AutoML NLP
  • AutoML tables

Objectives

  • Explore various AutoML products used in machine learning
  • Learn to use AutoML to create powerful models without coding

sign up to be notified for upcoming classes

Have Questions?

No worries. Send us a quick message and we’ll be happy to answer any questions you have.

Ref: T-GCPDE-I-02

Location

Online

Instructor

Axalon Academy
Email:
training@axalon.io
View Instructor Website

Other

Competencies
Intermediate
Learning Path
Database Engineer
Event Type
Live Virtual Training Day