- This event has passed.
HPC and Big Data Analytics using Comet
UCLA-IDRE and SDSC are organizing the workshop “HPC and Big Data Analytics using Comet” in order to introduce the Comet supercomputer and its usage to the researchers at UCLA.
Comet, a petascale supercomputer at San Diego Supercomputing Center (SDSC), is one of the key resources within the NSF’s XSEDE (Extreme Science and Engineering Discovery Environment) program. It provides “free” computer time via XSEDE portal to the researchers across USA. The participants will be able to get hands-on experience on Comet during the workshop’s different sessions.
The RSVP link and the agenda for the workshop are as follows:
RSVP: https://idre.ucla.edu/calendar-event/hpc-big-data-analytics-using-comet#rsvp
Agenda:
(Link to slides: https://github.com/sdsc-scicomp/2017-04-04-comet-workshop-ucla)
9:00 AM – 9:10 AM: Introduction & Welcome
9:10 AM – 10:00 AM: Comet – SDSC’s 2 PetaFLOPS HPC Resource
- Architecture, queue/partition info, software stack
- Examples for compute, shared, gpu, and gpu-shared partitions
- Hands-on on Comet to help prep for next sessions which will use Comet
10:00 AM – 10:30 AM: Science Gateways
10:30 AM – 10:40 AM: Short break
10:40 AM- 12:00 PM: Introduction to Hadoop on Comet
- Overview of running Hadoop within scheduler frameworks (using myHadoop)
- Demonstration/Hands on of Hadoop cluster spin up, interactive usage
- New technologies/approaches like RDMA-Hadoop and hands on with RDMA-Hadoop
12 PM – 1 PM: Lunch (provided by IDRE)
1:00 PM – 2:00 PM: Data Analytics and Data Mining
- R and parallel execution of R
- Data mining/machine learning
2:00 PM- 3:00 PM: Python for Scientific Computing
- How to run Jupyter notebook on Comet
- Use IPython Parallel for distributed computation
- Easy multithreading and distributed computing with dask
3:00 PM-3:10 PM: Short break
3:05 PM – 4:30 PM: Spark for Scientific Computing
- Overview of the capabilities of Spark and how they can be leveraged to solve problems in Scientific Computing
- Hands-on introduction to Spark, from batch and interactive usage on Comet to running a sample map/reduce example in Python
- Two key libraries in the Spark ecosystem: Spark SQL, a general purpose query engine that can interface to SQL databases or JSON files and Spark MLlib, a scalable Machine Learning library
4:30 PM: Wrap up