Skip to Main Content
UCLA Logo Institute for Digital Research and Education

data science

High Performance Machine Learning Using Scikit-Learn

Machine learning is playing an increasingly important role in science and technology. In this advanced session, we focus on leveraging scikit-learn for high-performance machine learning. We will explore how to…

Learning Scikit-Learn

Machine learning has become a key element in science and technology today. Mastering libraries like scikit-learn–a vital tool for machine learning–is essential. In this workshop, you’ll receive a fundamental introduction…

High Performance Python for Data Analytics pt.2

While Python has been the most popular programming language since 2019, data scientists often critique its slow speed and limited capabilities in handling big data scenarios. In this workshop series,…

High Performance Python for Data Analytics pt.1

While Python has been the most popular programming language since 2019, data scientists often critique its slow speed and limited capabilities in handling big data scenarios. In this workshop series,…

High-Performance Data Science in Python (1) Interpreter War

This workshop series will present an extensive discussion on how to improve the performance of Python in data science by looking under the hood of its language/libraries and using the technologies to make Python a practical solution for the high-performance big data analytics. In the first session, we will focus on how to boost the speed of python code in an interperter level by explaining the concepts (e.g. GIL, GIT) and introducing the packages of pypy, numba, pythran, cython etc. Although no specific prerequisite is required to attend the talk, having programming experience in Python will be helpful to fully understand the lecture content.

High-Performance Data Science in Python (2) DataFrame Game

This workshop series will present an extensive discussion on how to improve the performance of Python in data science by looking under the hood of its language/libraries and using the technologies to make Python a practical solution for the high-performance big data analytics. In the second session, we will focus on how to load/process the super big dataset in Python using a single machine and comparing the dataframe implementations from Pandas, Modin, Pandarallel, Dask and Vaex etc. Although no specific prerequisite is required to attend the talk, having programming experience in Python’s numpy and Pandas packages will be helpful to fully understand the lecture content.

Introduction to Data Science with Python Part 2

The term “data science” has become a ubiquitous and all-encompassing term to address any field that utilizes data analytics in one form or another. Despite having such a broad mandate,…