Automating Data Pipelines

This two-day course will provide a way of thinking and best practices for automating data pipelines from scratch, starting with the design then continuing with implementing orchestration and making scripts automatable. It concludes with the components you need to monitor and manage an automated data pipeline, showing options on how to implement those components.

Duration: 2 day

Star Level: Expert

Certification: Yes

 

MIacademy / Badge Overview / Open Course Calendar / 4213 Automating Data Pipelines

Course Description

Automating your data pipeline ensures its speed and quality. This two-day module will teach you how to design a data pipeline that you can apply to all work which involves processing data. We will cover everything from the basics of creating automatable data pipelines, to orchestrating scripts and monitoring an automated data pipeline. You will learn the management of data pipelines including the why, when, and how of making code idempotent and improving scripts. All of this is combined with implementing basic logging and validation checks, so you walk away with a complete understanding and the confidence to apply your new knowledge to your business. This course follows a five-step structured approach, combining theory with case studies so participants gain both a theoretical understanding and the experience and confidence to use their knowledge within the business. After completing this course, participants will be able to successfully design and develop an automated data pipeline with the appropriate quality control measures to monitor its performance over time.

 

Why is this for you?

 

Do you still manually run scripts or perform checks in a periodic data process? Do you feel like you could improve the speed of your data pipeline by redesigning it? Do you run into bugs with your data pipeline? This course will teach you an improved way of thinking when it comes to designing, developing, and monitoring data pipelines and give you hands-on experience with widely used tools to achieve this. Reduce the time spent on doing things manually that could be automated, and learn how to manage them well.

 

 

Who should attend?

 

This course is designed for AI Engineers, Data Engineers, and Data Scientists who have experience with manipulating data and programming and are looking to automate the flow of their data from source to models and applications. Before signing up for this course we require you to have completed both the Data Models and Manipulation (4204) and Programming Meta-Skills (4205) badges. Expert programming in SQL and Python is also required as a prerequisite as both languages are used on advanced levels in the cases during this course.

 

 

What will you learn?

 

  1. Designing an E2E automated data pipeline for an E2E AI solution
  2. Defining and orchestrating components to create an automated data pipeline
  3. Ways of changing and improving scripts to make them automatable
  4. Managing quality of automated data pipelines
  5. Implementing quality control measures in your data pipelines

 

 

Learning Goals

 

  • Design data pipelines – Based on design and data flow requirements for E2E AI solution
  • Orchestrate scripts – Automate data pipelines and define requirements from each component
  • Write automatable code – Change and improve scripts to make them automatable
  • Manage quality of automated data pipelines – Define components to monitor data pipeline quality and plan how to act on irregularities

 

Theory and practical use

 

All trainings in the GAIn portfolio combine high-quality standardized training material with theory sessions from experts and hands-on experience where you directly apply the material to real-life cases. Each training is developed by top of the field practitioners which means they are full of industry examples along with practical challenges and know-how, fueling the interactive discussions during training. We believe this multi-level approach creates the ideal learning environment for participants to thrive.

 

 

Skills

 

    • Data
    • Technology
    • Data pipelines
    • Automation
    • Data flows
    • Orchestration
    • Airflow
    • SQL
    • Python
    • ETL
    • Monitoring
    • Quality control
    • Alerting
    • Logging
    • Parametrization
Interested in taking the course?

Open Course Schedule

 

MIacademy does offer part of its portfolio in an Open Course Schedule Format in our location in the center of Amsterdam. Via the form below you can register your interest to participate. Our team will contact you to finalize the booking and answer any questions you may have.

All of our courses are delivered by our expert trainers.

 

If no dates are mentioned. The specific course is not scheduled yet in 2020. If this is the case you can use the form to register your interest. In case there is enough demand MIacademy can schedule additional courses and will notify you.

Dates & Availability

 

In-company Training Programs

Are you are interested to train a larger group of people, looking for specific training pe or and/or interested in creating a company-wide program? We will be happy to assist!

Whether you have a very specific training need (for example: training your Data Engineers on advanced technical topics, or your Data Scientists on model implementation), or the need for a large transformational program, or something in between, we can help. Over the past 13 years, we have built up extensive experience not only in the implementation of multi-year, multi-population, multi-country programs but also in providing high quality, very specific modules for specific target groups. Both in in-house set-ups and cross-company programs. Not sure what type of program would fit your organization best? We’d be happy to discuss the best approach together.