Number of seats is limited to 50. Please provide your full first and last name while registering for the event, here on meetup. The list of attendees freezes 4 hours before the event starts – in case of any urgent changes please leave us a note via kontakt@pydata-trojmiasto.pl . Don’t forget to bring your ID for the security check.
Agenda
18:00 – 18:10 Welcoming words by Graphcore and PyData
18:10 – 18:40 Low-Precision Data Formats for High-Performance AI
Alex Titterton, ML Engineer GRAPHCORE
18:40 – 19:20 Sparser Llamas Run Faster: Speed Up LLM Inference with SparQ Attention, Luke Hudlass-Galley, Research Scientist GRAPHCORE
Alexandre Payot, ML Engineer GRAPHCORE
19:20 – 19:30 Q&A session
19:30 – 21:00 Networking & pizza
Talk #1
Low-Precision Data Formats for High-Performance AI
Abstract: “In recent years AI models, in particular LLMs, have scaled up enormously both in terms of capability and hardware requirements. Providing the required computational power, storage capacity and memory bandwidth all come at a cost, leading to increased research activity into low-precision data formats both for storage and compute. In this talk we discuss recent advances in low-precision training and inference, quantisation methods and new microscaling (MX) data formats designed to offer efficient AI compute with minimal loss in accuracy and without requiring changes to model training workflows.”
Presenters:
Alex Titterton, ML Engineer, Graphcore
Presenter intro: “Alex is a Machine Learning Engineer at Graphcore, who over the last 5 years has been working with customers across a wide range of applications from computer vision to large language models, and has led Graphcore’s Academic Programme, collaborating with AI researchers around the world. Before joining Graphcore in 2019 Alex completed his PhD in Particle Physics at the University of Bristol and University of Southampton, working on the Compact Muon Solenoid experiment at CERN in search for Supersymmetry.”
Talk #2
Sparser Llamas Run Faster: Speed Up LLM Inference with SparQ Attention
Abstract: “As we try and push the capability limits of large language models, we want to feed in larger and larger sequences into the model. However, these long sequences cause throughput to drop, bottlenecking the performance we can achieve with these models. In this talk, we will uncover what is causing this slow down, look at standard optimisations to resolve it, and present our solution, SparQ Attention, as a way to overcome this.”
Presenters:
Luke Hudlass-Galley, Research Scientist, Graphcore
Presenter intro: “Luke is a Research Scientist at Graphcore, who has worked on a range of fundamental machine learning topics over the last six years, including computer vision, distributed processing, multimodal embedding alignment, and LLM inference. Luke currently leads Graphcore’s reasoning effort, helping to uncover how to get language models to solve difficult problems through step-by-step thought. Prior to Graphcore, Luke received his Masters in Engineering Mathematics from the University of Bristol.”
Alexandre Payot, ML Engineer, Graphcore
Presenter intro: “Alex is an ML Engineer on Graphcore’s Applied AI team. Over the last three years, he has contributed to a wide range of projects, from kernel implementation to delivering Graphcore’s cloud models. His latest focus has been on LLM inference for an internal code assistant.
Before joining Graphcore, Alex developed software for designing and optimizing steered carbon fiber composites, worked on data analytics and computer vision for microscopes, and even filled up the supercomputer at Bristol University while pursuing a PhD in aerodynamic optimization.”