Introduction

Your Instructors

Most of us are members (or alumni) of the functional genomics lab of Vishwanath Iyer, UT Austin.

Anna Battenhouse, Associate Research Scientist, Iyer Lab, abattenhouse@utexas.edu
- BA English literature, 1978
- Commercial software development 1982 – 2005
- Joined Iyer Lab 2007 (“retirement career”)
- BS Biochemistry, 2013
Amelia Weber Hall, Graduate Student, Iyer Lab, ameliahall@utexas.edu
- 5th year Microbiology graduate student
- Laboratory Technician at UT 2007-2010
- BS Molecular Genetics, 2007
Nathan Abell, Research Assistant, Xhemalce Lab, abell.nathan@gmail.com
- Undergraduate researcher in Iyer Lab 2011-2013
- BS Molecular Biology, UT, 2013
- Research Assistant
Dakota Derryberry, Graduate Student, Wilke Lab, dakotaz@utexas.edu
- ???

About the Iyer Lab

http://iyerlab.org/

Main focus is functional genomics
- large-scale transciptional reprogramming in response to diverse stimuli
- Encode consortium collaborator
- work in human and yeast
Research methods include
- microarrays (Dr. Iyer was co-inventor)

- high-throughput sequencing (since 2007)
  - especially ChIP-seq
  - also RNA-seq, RIP-seq, MNase-seq ...
  - we now have > 1,500 NGS datasets

Communication

Post its

Green post-it – I'm good at the moment.

Pink post-it – I need a bit of help.

Conventions

Text that you find in courier font refers to a program or file name on a computer.

If you see a block of text like this:

Example code block

ls -h

it means, "type the command ls -h into a terminal window, hit return, and see what happens".

We intend this course to offer as much self-learning as possible. Consequently, you'll find many sections like this - click on the triangle to expand them:

Hint

Hint sections will provide you some guidance on what to do next, but will not spell it out.

and some sections like this:

Solution

Solution sections will contain the commands so that you could copy-and-paste them if you have to. They should be exactly accurate.

Goals and challenges

Course goals

Hands-on, tutorial style – learn by doing
Cover the NGS tool basics – the first few things you'll do after receiving raw sequences
Get you comfortable with Linux and TACC – your best "frenemies"
Make you self sufficient in 4 days to become experts over time
Show some "best practices" for working with NGS data

Challenges

Large and growing datasets

NGS methods procude staggering amounts of data!

Typical dataset these days

yeast: 5 – 20 million reads
human: 20 – 100 million reads
paired end, length 75 – 100 bases

The initial fastq files are big (100s of MB to GB) – and they're just the start.

Organization and naming conventions are critical.
Your data can get out of hand very quickly!

progression of Iyer Lab ChIP-seq datasets over time

2008 – Yeast heat shock remodeling of chromatin
- 2 yeast datasets
- less than 2 million reads
2010 – Allelic bias in CTCF binding
- 13 CTCF datasets from 3 GM cell lines
- ~200 million reads
2012 – Analysis of 3 TFs across 11 cell lines
- 32 datasets gathered over 3 years
- ~ 1 billion reads
2014 – QTL analysis of CTCF binding
- 52 very deeply sequenced CTCF datasets
- ~ 8 billion reads
in progress – Functional analysis of glioblastoma tumors and cell lines
- > 300 datasets so far
- > 17 billion reads

Page tree