The use of online data has revolutionized business; allowing for easier data collection, data that is more representative, and data, therefore, that provides more accurate insights. This two-day Processing Online Data course provides a thorough investigation into all areas of the topic, building on existing knowledge as well as introducing new concepts. The first day begins with an outline of the types of online data, the best practices and challenges in designing architecture and data pipeline processes. We then move to legal web scraping and crawling, the terminology, techniques and tools, which you will apply to a practical case, scraping data from both SSR and CSR websites. Then you will apply existing API skills to extract and import data from external APIs. The second day dives into accessing data and usability from your digital channels in the context of end-2-end AI solutions. Finally, this training will close by applying what you have learned to integrate fast- and slow-moving data into one optimal model. Challenging you to design and build a data model and pipeline, that is flexible and reliable, to generate added value.
Why is this for you?
There is a wide variety of data sources at your disposal: offline data, external online data, data on customer behavior from digital channels. Knowing how to process data and design the architecture for these data sources is no mean feat. After this course, you will be able to start using the most common types of online data skillfully. As well as conduct thoughtful discussion on this topic in terms of unlocking online data and using it in your daily automated process to make the data available for models in end-2-end AI solutions.
Who should attend?
This course is designed specifically for Data Scientists and Data Engineers. Many of the skills covered in this course involve preexisting knowledge outlined in the web scraping pre-work accompanying this badge, along with the Data Models and Manipulation (4204) badge. Participants must have experience interacting with APIs and expert programming skills in SQL and Python to keep up with this course.
What will you learn?
This training will dive into the concept of online data through the stages of extraction, processing, scraping and crawling, assessment, and integration. Specifically, this will include:
- The types of online data, ownership, and accessibility.
- The legality of scraping and crawling.
- A recap on APIs.
- How to set requirements for data collection.
- How to perform data quality checks.
- How to track customer behavior longitudinally.
- Determine the optimal architecture for processing online data – Explain the methods and challenges of extracting different types of online data and decide on the optimal architecture for your case.
- Apply web scraping and crawling – Scrape and crawl data from both SSR and CSR websites and explain the legality of scraping and crawling.
- Use external APIs for collecting data – Apply existing skills to gather external (website) data through APIs.
- Assess data from your digital channels – Identify the different types of data sources, set requirements, and assess the quality and usability in the context of end-2-end AI solutions.
- Integrate fast- and slow-moving data – In a single flexible and reliable data model to generate added value from this integration.
Theory and practical use
Each training in the GAIn portfolio combines high-quality standardized training material with theory sessions from experts and hands-on experience where you directly apply the material to real-life cases. Each training is developed by top of the field practitioners which means they are full of industry examples along with practical challenges and know-how, fueling the interactive discussions during training. We believe this multi-level approach creates the ideal learning environment for participants to thrive.
- Online Data
Open Course Schedule
MIacademy offers part of its portfolio in an Open Course Schedule Format in our location in the center of Amsterdam. Via the form below you can register your interest to participate. Our team will contact you to finalize the booking and answer any questions you may have.
All of our courses are delivered by our expert trainers.
If no dates are mentioned, the specific course is not scheduled yet in 2020. If this is the case you can use the form to register your interest. In case there is enough demand MIacademy can schedule additional courses and will notify you.
Are you interested to train a larger group of people, looking for specific training and/or interested in creating a company-wide program? We will be happy to assist!
Whether you have a very specific training need (for example: training your Data Engineers on advanced technical topics, or your Data Scientists on model implementation), or the need for a large transformational program, or something in between, we can help. Over the past 13 years, we have built up extensive experience not only in the implementation of multi-year, multi-population, multi-country programs but also in providing high quality, very specific modules for specific target groups. Both in in-house set-ups and cross-company programs. Not sure what type of program would fit your organization best? We’d be happy to discuss the best approach together.