Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Your Instructors

  • Anna Battenhouse, Associate Research Scientist, abattenhouse@utexas.edu

    • BA English literature, 1978

    • Commercial software development 1982 – 2007

    • Joined Iyer Lab 2007 (“retirement career”)

    • BS Biochemistry, UT Austin, 2013

    • Joined the Biomedical Research Computing Facility (BRCF) and Marcotte Lab summer 2017
    • Also affiliated with
      • Bioinformatics Consulting Group (BCG)
      • Genome Sequencing and Analysis Facility (GSAF)
  • Daryl Barthdaryl.barth@utexas.edu

    • BS Materials Science & Engineering, UC Berkeley, 2017
    • Student Researcher in France and Portugal 2017 - 2018
    • Research Assistant in single cell genomics, UT Southwestern 2019-2021
    • 3rd year graduate student in the Marcotte Lab 
    • Research Interests: biomaterials, developmental biology, and bioinformatics

About the Iyer Lab (where Anna learned NGS)

http://iyerlab.org/

Dr. Vishy Iyer, PI

Image Added

Main focus is functional genomics

    • large-scale transcriptional reprogramming
      in response to diverse stimuli
    • Encode consortium collaborator
    • works in human and yeast


Research methods include
  • microarrays (Dr. Iyer was co-inventor)

Image Added

  • high-throughput sequencing (since 2007)
    • especially ChIP-seq, RNA-seq
    • also miRNA-seq, RIP-seq, MNase-seq ...
    • have ~2,000 NGS datasets

Image Added

Communication

Post its

Green post-it – I'm good at the moment.

Asking questions

Feel free to ask questions any time during the instructor's lecture and demonstrations.

For online attendees, you can also post your question to the Zoom chat. We'll sometimes use breakout rooms when troubleshooting problems you run into, if so, TA Daryl Barth will assign you to one.

Getting help

Since most folks are new to the Linux command line, we expect you to run into problems! Please let us know if you're having difficulties!

Making mistakes and running into problems is key to learning the Linux command line! It is not only expected – it is encouraged (smile)Pink post-it – I need a bit of help.

Conventions

If you see a block of text like this:

...

it means, type the command ls -h into a terminal window, hit return Enter, and see what happens.

We intend this course to offer as much self-learning as possible. Consequently, you'll find many sections like this - click on the triangle to expand them:

Expand
Hint
titleHint...

Hint sections will provide you some guidance on what to do next, but will not spell it out.

and some sections like this:

Expand
Solution
titleSolution...

Solution sections will contain the commands so that you could copy-and-paste them if you have to. They will represent one method of answering the question – but there are often many ways to skin a cat!

Your Instructors

...

Anna Battenhouse, Associate Research Scientist, abattenhouse@utexas.edu

  • BA English literature, 1978

  • Commercial software development 1982 – 2007

  • Joined Iyer Lab 2007 (“retirement career”)

  • BS Biochemistry, UT Austin, 2013

  • Joined the Marcotte Lab and Biomedical Research Support Facility (BRCF) summer 2017

Haridha Shivram, haridh@utexas.edu

...

...

Research Interests: Transcriptional and post-transcriptional regulation of gene expression 

...

Experienced in analyzing RNA-seq, ChIP-seq, RIP-seq, and CLIP-seq datasets

...

Claire McWhite, claire.mcwhite@utexas.edu

About the Iyer Lab

...

http://iyerlab.org/

Dr. Vishy Iyer, PI

...

Image Removed

...

Main focus is functional genomics

    • large-scale transciptional reprogramming
      in response to diverse stimuli
    • Encode consortium collaborator
    • work in human and yeast

...

  • microarrays (Dr. Iyer was co-inventor)

...

Image Removed

...

  • high-throughput sequencing (since 2007)
    • especially ChIP-seq
    • also RNA-seq, RIP-seq, MNase-seq ...
    • we now have nearly 2,000 NGS datasets

...

Image Removed

Course goals

  • Hands-on, tutorial style – learn by doing
    • common Common bioinformatics tools & file formats
  • Introduce NGS vocabulary
    • both high-level view and practice with specific tools
  • Cover the NGS basics
    • the The first few things you'll do after receiving raw sequences
      • raw sequence QC and preparation
      • alignment to reference
      • basic alignment analysis
  • Understand and practice required skills
    • Get you comfortable with Linux and TACC – your best "frenemies"
    • Make you self-sufficient enough in 4 5 days to become experts over time
    • Show some "best practices" for working with NGS data

...

  • Analysis – making sense of raw data
    • one part bioinformatics and statistics
    • one part scripting / programming
      • Linux command line
      • High Performance Computing (TACC)
      • bash scripting (grep, awk, sed)
      • R, python, perl
  • Management – making order out of chaos
    • one part organization
    • one part data wrangling
  • Adoption of best practices is critical!

Large and growing datasets

...

  • yeast:  5 – 20 million reads
  • human:  20 – 250 million reads (~5 - 8 million for TagSeq)
  • single end (SE) or paired end (PE), length 75 50 250 bases300 bases (100 or 150 typical)

The initial fastq FASTQ files are big (100s of MB to GB) – and they're just the start.

  • Organization and naming conventions are critical.
  • Your data can get out of hand very quickly!

progression Progression of Iyer Lab datasets over time:

...