Back to All Events

ODSC Boston Meetup: Building Data Pipelines for ML

  • CIC Cambridge, Cloud City, 14th floor 101 Main Street Cambridge, MA, 02142 United States (map)

Speaker: Suraj Bang, Solutions Architect at Qubole

Building Data Pipelines for ML

6:00pm - 6:30pm - ODSC Intro, Pizza & Refreshments
6:30pm - 7:20pm - Talk
7:20pm - 7:30pm - Q&A
7:30pm - 8:00pm - Networking

Suraj Bang is a Solutions Architect at Qubole where he brings over 13 years of experience in data analytics and engineering to help customers on their big data journey. He has subject matter expertise in building big data applications with Apache Spark and other Open Source technologies such as Airflow, Apache Zeppelin etc. Prior to Qubole, Suraj worked as Data Engineering Lead building various big data applications for financial, retail and insurance organizations. He enjoys being outdoor, loves biking and when indoors enjoys building alexa apps.

Companies now need to apply machine learning (ML) techniques on their data in order to remain relevant. Among the new challenges faced by data scientists is the need to build get access to large data sets so that trained models can scale to run with production data.

Aside from dealing with larger data volumes, these pipelines need to be flexible in order to accommodate the variety of data and the high processing velocity required by the new ML applications. Apache Airflow and Spark addresses these challenges by providing a highly scalable technology for autoscaling big data engines.

In this presentation we will cover:
- Some of the typical challenges faced by data scientists when building pipelines for machine learning.
- Typical uses of the various big data engines to address these challenges.
- Real-world example using Apache Spark and Airflow to operationalize a recommendation engine

ODSC Links:
• Get free access to more talks like this at LearnAI:
• Facebook:
• Twitter: & @odsc
• LinkedIn:
• East Conference Apr 30 - May 3: