Participants will explore the capabilities of Google Cloud in managing big data and machine learning workflows. The course introduces core tools such as BigQuery, Cloud Storage, and AutoML. They will practise building data pipelines and deploying models in the cloud.
Learning Outcomes:
Understand the data-to-AI lifecycle in Google Cloud
Identify GCP services for data processing and analysis
Create data pipelines using BigQuery and Cloud Dataflow
Describe machine learning tools and model deployment
Key Topics:
GCP architecture for data analytics and ML
BigQuery, Cloud Storage, and Pub/Sub
Cloud Dataflow, Dataproc, and AI Platform
Practical use cases for ML in GCP environments
Module 1: Introducing Google Cloud Platform
- Google Platform Fundamentals Overview.
- Google Cloud Platform Big Data Products.
Module 2: Compute and Storage Fundamentals
- CPUs on demand (Compute Engine).
- A global filesystem (Cloud Storage).
- Cloud Shell.
- Lab: Set up an Ingest-Transform-Publish data processing pipeline.
Module 3: Data Analytics on the Cloud
- Stepping-stones to the cloud.
- Cloud SQL: your SQL database on the cloud.
- Lab: Importing data into CloudSQL and running queries.
- Spark on Dataproc.
- Lab: Machine Learning Recommendations with Spark on Dataproc.
Module 4: Scaling Data Analysis
- Fast random access.
- Datalab.
- BigQuery.
- Lab: Build machine learning dataset.
Module 5: Machine Learning
- Machine Learning with TensorFlow.
- Lab: Carry out ML with TensorFlow
- Pre-built models for common needs.
- Lab: Employ ML APIs.
Module 6: Data Processing Architectures
- Message-oriented architectures with Pub/Sub.
- Creating pipelines with Dataflow.
- Reference architecture for real-time and batch data processing.
Module 7: Summary
- Why GCP?
- Where to go from here
- Additional Resources