Materials

Slides

Hadoop Exercises

Zeppelin Demo JSON File

Zeppelin Demo Results

Description

This course will introduce Hadoop and Spark cluster to newcomers. We will introduce basic concepts used in MapReduce programming model, major components of a Hadoop cluster, how to get started with Hadoop on your own computer and with computing resources at TACC. We will also introduce Spark programming models and how Spark can be used in conjunction with a Hadoop cluster. We will also discuss different ways to use Hadoop and Spark for your analysis. During the course, a participant can explore a Hadoop cluster and perform exercises with prepared examples. Since this is an introductory course, participants do not need have the particular programming background to attend. Working knowledge of Linux operating system are required. Participants who are new to either Hadoop, Spark or TACC resources are strongly recommended to participate in this course.

Instructors

Weijia Xu, Ph.D.

Dr. Weijia Xu is the group lead for Data Mining & Statistics group. Prior to joining TACC, he obtained a master's degree in Biological Sciences and a doctoral degree in Computer Science from The University of Texas at Austin. Dr. Xu's main research interest is in the field of large scale information management and analysis. The goal of his research is to enable data driven discoveries through developing new methods and applications that facilitate the data to knowledge transfer process. Dr. Xu has extensive experiences in working with domain scientists in database and analytical methods development. Dr. Xu has over thirty peer-reviewed conference and journal publications in similarity based data retrieval, data analysis and information visualization with data from various scientific domains.

Ruizhu Huang, Ph.D.

Dr. Ruizhu Huang joined the Data Mining & Statistics group at TACC in 2014. He earned his Ph.D in the Interdisciplinary Program of Urban Design and Planning from University of Washington at Seattle. His Ph.D research focused on quantifying relationship between built environment, travel behavior and health outcomes using GPS, accelerometer and travel diary data. He obtained a M.S. degree in Statistics from University of Washington at Seattle and a Master of Urban Planing degree from University of Maryland at College Park. He has extensive experience in spatial statistics, Geo data visualization and large scale data analysis.


Recording Date:
April 20, 2017

Last modified: Wednesday, August 23, 2017, 4:40 PM