Automating Data Pipelines
About the course
Automating your data pipeline ensures its speed and quality. This two-day module will teach you how to design a data pipeline that you can apply to all work which involves processing data. We will cover everything from the basics of creating automatable data pipelines, to orchestrating scripts and monitoring an automated data pipeline. You will learn the management of data pipelines including the why, when, and how of making code idempotent and improving scripts. All of this is combined with implementing basic logging and validation checks, so you walk away with a complete understanding and the confidence to apply your new knowledge to your business. This course follows a five-step structured approach, combining theory with case studies so participants gain both a theoretical understanding and the experience and confidence to use their knowledge within the business. After completing this course, participants will be able to successfully design and develop an automated data pipeline with the appropriate quality control measures to monitor its performance over time.
Why this is for you
Do you still manually run scripts or perform checks in a periodic data process? Do you feel like you could improve the speed of your data pipeline by redesigning it? Do you run into bugs with your data pipeline? This course will teach you an improved way of thinking when it comes to designing, developing, and monitoring data pipelines and give you hands-on experience with widely used tools to achieve this. Reduce the time spent on doing things manually that could be automated, and learn how to manage them well.
This course is designed for AI Engineers, Data Engineers, and Data Scientists who have experience with manipulating data and programming and are looking to automate the flow of their data from source to models and applications. Before signing up for this course we require you to have completed both the Data Models and Manipulation (4204) and Programming Meta-Skills (4205) badges. Expert programming in SQL and Python is also required as a prerequisite as both languages are used on advanced levels in the cases during this course.
What you’ll learn
- Designing an E2E automated data pipeline for an E2E AI solution
- Defining and orchestrating components to create an automated data pipeline
- Ways of changing and improving scripts to make them automatable
- Managing quality of automated data pipelines
- Implementing quality control measures in your data pipelines
Theory and practical use
- Design data pipelines – Based on design and data flow requirements for E2E AI solution
- Orchestrate scripts – Automate data pipelines and define requirements from each component
- Write automatable code – Change and improve scripts to make them automatable
- Manage quality of automated data pipelines – Define components to monitor data pipeline quality and plan how to act on irregularities
All trainings in the GAIn portfolio combine high-quality standardized training material with theory sessions from experts and hands-on experience where you directly apply the material to real-life cases. Each training is developed by top of the field practitioners which means they are full of industry examples along with practical challenges and know-how, fueling the interactive discussions during training. We believe this multi-level approach creates the ideal learning environment for participants to thrive.