Materials

Slides

Description

In this course, we will focus on how to develop a scalable application with Spark programming model. Key concepts in the Spark programming model will be reviewed and introduced to participants. The course will also give a basic introduction to the Scala programming language which will be used for in-class demonstration and practice. Attendees will learn knowledge of basic Spark applications, how to run a Spark application, key features to make scalable applications and how to get started development using Spark programming model after this class. Example codes and exercises will be prepared for attendees to explore and use with a Spark cluster during the class. Attendees are expected to have prior knowledge on Hadoop and Spark cluster concepts and working knowledge on how to use computing resources at TACC. Knowledge of programming with Scala is not required to attend this class. Knowledge of any programming language is preferred but not essential to this class.

Instructors

Weijia Xu, Ph.D.

Dr. Weijia Xu is the group lead for Data Mining & Statistics group. Prior to joining TACC, he obtained a master's degree in Biological Sciences and a doctoral degree in Computer Science from The University of Texas at Austin. Dr. Xu's main research interest is in the field of large scale information management and analysis. The goal of his research is to enable data driven discoveries through developing new methods and applications that facilitate the data to knowledge transfer process. Dr. Xu has extensive experiences in working with domain scientists in database and analytical methods development. Dr. Xu has over thirty peer-reviewed conference and journal publications in similarity based data retrieval, data analysis and information visualization with data from various scientific domains.

Zhao Zhang, Ph.D.

Dr. Zhao Zhang is a computer scientist in the Data Intensive Computing group at TACC. His research interest is to build computer systems to enable and facilitate scientific research in parallel and distributed computing environments. Dr. Zhang’s current work focuses on machine learning and deep learning systems. He is supporting open source deep learning frameworks on TACC supercomputers and clusters. Dr. Zhang is also actively researching on topics of deep learning framework scalability, performance prediction, reproducibility, and usability. Dr. Zhang has rich collaboration experience with domain scientists from the areas of astronomy, bioinformatics, and earth science. Dr. Zhang joins TACC in 2016. Before that, he was a joint-postdoc researcher in AMPLab and Berkeley Institute for Data Science at University of California, Berkeley. He received the Ph.D from the Department of Computer Science at University of Chicago in 2014.


Recording Date:
April 27, 2017

Last modified: Wednesday, August 23, 2017, 4:30 PM