Imbalanced Data & Anomaly Detection

Imbalanced data and anomaly detection is not an easy problem to solve, especially when your project begins with imperfect data. This two-day training will teach you two effective and proven techniques to overcome these data issues and strengthen the success of all your data-based projects.

Duration: 2 days

Star Level: Expert

Certification: Yes

 

MIacademy / Badge Overview / Open Course Calendar / 3231-Imbalanced data handling

Course Description

We often face the problem of looking for a needle in a haystack of data. A tedious and challenging task that requires specific tools to tackle. Enter Imbalanced Data and Anomaly Detection, a training badge to equip you with two specific tools to do so. One is aimed at working with imbalanced datasets in supervised learning, for instance by creating many more needles so they are easier to find. The other will help you to identify what a needle is in the first place, using unsupervised anomaly detection methods. The first day will begin laying the foundation of anomaly detection by exploring the context and different types of anomalies, discussing those unique to your business and testing your skills in detecting anomalies using data plotting. We will then look at the first challenge: imbalanced datasets. Learning why they are considered an issue and how to maximize model results from imbalanced datasets. The second day will delve into anomaly detection. Beginning with unsupervised learning, looking at its application and evaluating the effectiveness. Then moving onto time-series anomaly detection, using Z-score, decomposition, and forecasting.  Of course, this training would not be complete without ensuring your newly found knowledge can be translated into reality. Thus, the final hurdle will be an analytical task whereby you apply what you have learned to detect anomalies in a real-life case.

 

Why is this for you?

 

You might be confident at building models and interpreting the results, however, without knowledge of these evaluation metrics, you most likely will obtain naïve results rendering your efforts useless. Equipping yourself with these tools for detecting anomalies and imbalanced data will empower you to be confident in your models and applying them in business.

 

 

Who should attend?

 

This training course is designed with both Data Scientists and Data Engineers in mind. At Expert level, this training requires previous knowledge and the completion of our Classification Using Tree Models course (3202).

 

 

What will you learn?

 

This training will develop your skills and draw on a variety of tools and methods for the detection and transformation of anomalies and imbalanced data. Specific content covered includes:

  1. Point, contextual, and collective anomalies
  2. Univariate and multivariate anomalies
  3. Structured and unstructured anomaly detection
  4. Graphical tools such as jitter plots, violin plots, Z-scores and more
  5. Cost-sensitive learning
  6. Static and dynamic anomaly detection
  7. Z-score, modified Z-score, decomposition and forecasting for anomaly detection
  8. Two unsupervised algorithms: Isolation Forest, One-Class SVM

 

Learning Goals

 

    • Introduction to anomaly detection: Able to explain the context of anomalies and distinguish between different types.
    • Graphical anomaly detection: Able to detect anomalies using data plotting.
    • Handling imbalanced datasets: Able to maximize model results from imbalanced datasets.
    • Unsupervised anomaly detection: Knowing when unsupervised learning can be applied and how to evaluate an unsupervised model.
    • Time-series anomaly detection: Able to detect anomalies in time-series data.
    • Unsupervised algorithms: Able to apply two unsupervised algorithms to detect anomalies.
    • Detecting fraud using all anomaly detection tools: Able to combine all your knowledge to detect anomalies in a real-life case.

 

 

Theory and practical use

 

Each training in the GAIn portfolio combines high-quality standardized training material with theory sessions from experts and hands-on experience where you directly apply the material to real-life cases. Each training is developed by top of the field practitioners which means they are full of industry examples along with practical challenges and know-how, fueling the interactive discussions during training. We believe this multi-level approach creates the ideal learning environment for participants to thrive.

 

 

Skills

 

    • Machine Learning
    • Data
    • Anomaly Detection
    • Imbalanced Data
Interested in taking the course?

Open Course Schedule

 

MIacademy offers part of its portfolio in an Open Course Schedule Format in our location in the center of Amsterdam. Via the form below you can register your interest to participate. Our team will contact you to finalize the booking and answer any questions you may have.

All of our courses are delivered by our expert trainers.

 

If no dates are mentioned, the specific course is not scheduled yet in 2020. If this is the case you can use the form to register your interest. In case there is enough demand MIacademy can schedule additional courses and will notify you.

Dates & Availability

 

In-company Training Programs

Are you interested to train a larger group of people, looking for specific training and/or interested in creating a company-wide program? We will be happy to assist!

Whether you have a very specific training need (for example: training your Data Engineers on advanced technical topics, or your Data Scientists on model implementation), or the need for a large transformational program, or something in between, we can help. Over the past 13 years, we have built up extensive experience not only in the implementation of multi-year, multi-population, multi-country programs but also in providing high quality, very specific modules for specific target groups. Both in in-house set-ups and cross-company programs. Not sure what type of program would fit your organization best? We’d be happy to discuss the best approach together.