While Python becomes the most popular programming language since 2019, data scientists often have a few common complaints about its slow speed and the limited capabilities of handling the big data scenarios. In this workshop series, we will present an extensive discussion on how to improve the performance of Python in data science by looking under the hood of its language/libraries and using the technologies to make Python a practical solution for the high-performance big data analytics. In the second session, we will focus on how to load/process the super big dataset in Python using a single machine and comparing the dataframe implementations from Pandas, Modin, Pandarallel, Dask and Vaex etc. Although no specific prerequisite is required to attend the talk, having programming experience in Python’s numpy and Pandas packages will be helpful to fully understand the lecture content.
Any questions about this workshop can be emailed to Qiyang Hu at email@example.com.
Presented by the Office of Advanced Research Computing.