Your Instructors

Most of us are members (or alumni) of the functional genomics lab of Vishwanath Iyer, UT Austin.

About the Iyer Lab

http://iyerlab.org/

Communication

Post its

Green post-it – I'm good at the moment.

Pink post-it – I need a bit of help.

Conventions

Text that you find in courier font refers to a program or file name on a computer.

If you see a block of text like this:

ls -h

it means, "type the command ls -h into a terminal window, hit return, and see what happens".

We intend this course to offer as much self-learning as possible. Consequently, you'll find many sections like this - click on the triangle to expand them:

Hint sections will provide you some guidance on what to do next, but will not spell it out.

and some sections like this:

Solution sections will contain the commands so that you could copy-and-paste them if you have to. They should be exactly accurate.

Goals and challenges

Course goals

Challenges

Large and growing datasets

NGS methods procude staggering amounts of data!

Typical dataset these days

The initial fastq files are big (100s of MB to GB) – and they're just the start.

progression of Iyer Lab ChIP-seq datasets over time

Data wrangling best practices summary

keep fastq files compressed

You may be tempted un-compress your sequencing files to manipulate them more directly

arrange adequate storage space

backup analysis artifacts regularly

distinguish between types of data

Artifacts from different stages of the analysis will have different archival requirements.

While a project is active you will want to keep more intermediate artifacts for reference. Many of these can be deleted after publication.

track your analysis steps

Your analyses should be reproducible by others so you need to keep the equivalent of a lab notebook to document your protocols.