- This event has passed.
HPC and Big Data Analytics using Comet – SDSC’s latest Computing Resource through XSEDE
Comet, a new petascale supercomputer at San Diego Supercomputing Center (SDSC), is one of the latest key resources within the NSF’s XSEDE (Extreme Science and Engineering Discovery Environment) program, which comprises the most advanced collection of integrated digital resources and services in the world. Researchers may apply for “free” computer time on Comet via XSEDE.
UCLA-IDRE and SDSC are organizing this workshop in order to introduce the Comet system and its usage for HPC and Big Data Analytics to UCLA researchers. The participants will be able to get hands-on experience on Comet system during workshop’s different sessions.
The detailed agenda is as follows:
RSVP@ http://cfapps.ats.ucla.edu/cfapps/events/rsvp/RSVPNow.cfm?EveID=3479&SecID=3467
Agenda:
9:00 AM – 9:10 AM: Introduction & Welcome
9:10 AM – 10:25 AM: Comet – SDSC’s New HPC Resource
- Architecture, queue/partition info, software stack
- Examples for compute, shared, gpu, and gpu-shared partitions
- Hands-on on Comet to help prep for next sessions which will use Comet
10:25 AM – 10:35 AM: Short break
10:35 AM- 12:00 PM: Introduction to Hadoop on Comet and Gordon
- Overview of running Hadoop within scheduler frameworks (using myHadoop)
- Demonstration/Hands on of Hadoop cluster spin up, interactive usage
- New technologies/approaches like RDMA-Hadoop and hands on with RDMA-Hadoop
12 PM – 1 PM: Lunch
1:00 PM – 2:00 PM: Data Analytics and Data Mining
- R and parallel execution of R
- Data mining/machine learning
2:00 PM- 3:00 PM: Python for Scientific Computing
- How to run Jupyter notebook on Comet
- Use IPython Parallel for distributed computation
- Distributed numpy-like arrays with distarray
3:00 PM-3:10 PM: Short break
3:05 PM – 4:30 PM: Spark for Scientific Computing
- Overview of the capabilities of Spark and how they can be leveraged to solve problems in Scientific Computing
- Hands-on introduction to Spark, from batch and interactive usage on Comet to running a sample map/reduce example in Python
- Two key libraries in the Spark ecosystem: Spark SQL, a general purpose query engine that can interface to SQL databases or JSON files and Spark MLlib, a scalable Machine Learning library