Acropora pulchra Functional Annotation Workflow
Creating a functional annotation file for Acropora pulchra de novo transcriptome.
Goal
The following document contains the bioinformatic pipeline used for the functional annotation of a de novo transcriptome for A. pulchra. Pipeline adapted from the Trinotate wiki. Trinotate is a comprehensive annotation suite designed for automatic functional annotation of transcriptomes, particularly de novo assembled transcriptomes, from model or non-model organisms. Trinotate makes use of a number of different well referenced methods for functional annotation including homology search to known sequence data (BLAST+/SwissProt), protein domain identification (HMMER/PFAM), and leveraging various annotation databases (eggNOG/GO/Kegg databases). All functional annotation data derived from the analysis of transcripts is integrated into a SQLite database which allows fast efficient searching for terms with specific qualities related to a desired scientific hypothesis or a means to create a whole annotation report for a transcriptome.
All metadata and information for these projects can be found in this repository and in these notebook posts of the de novo transcriptome assembly and the extraction protocol for the samples used in the assembly. These commands were compiled into bash scripts to run on the URI HPC Andromeda server.
Step 1: Obtain de novo transcriptome
I am using the Acropora pulchra de novo transcriptome assembled in November 2023, following methods outlined for transcript sequence reconstruction from RNA-Seq: reference generation and analysis with Trinity.
Location on Andromeda, the HPC server for URI:
cd /data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/output/final_assembly/CD-HIT/stranded_assembly_cdhit.fasta
Make folder structure for functional annotation pipeline
cd /data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/
mkdir functional_annotation
cd functional_annotation
mkdir Trinotate
cd Trinotate
mkdir data
mkdir output
mkdir scripts
Step 2: Access previously created open reading frame prediction and protein sequence prediction with Transdecoder and Transpredictor
Already created .gff3, .pep, and .cds files for the reference transcriptome in step 8 of my A.pulchra RNASeq Workflow.
Steps followed:
1) Create .gff3 files for predicted gene structures, .pep for predicted proteins, and .cds predicted coding sequences for your de novo transcriptome
#To include putative gene information in your Trinity analysis, you can use TransDecoder.LongOrfs and TransDecoder.Predict #First step identifies likely coding regions (long open reading frames or ORFs) in your Trinity transcripts and creates a file named Trinity.fasta.transdecoder_dir/longest_orfs.pep, which contains the predicted protein sequences.
#Second step predicts likely coding regions and identifies potential coding regions using the output from the LongOrfs step and generates several output files in the Trinity.fasta.transdecoder_dir/ directory, including Trinity.fasta.transdecoder.cds (predicted coding sequences) and Trinity.fasta.transdecoder.gff3 (predicted gene structures in GFF3 format).
Directory for TransDecoder and Transpredictor output files:
cd /data/putnamlab/dbecks/Heatwave_A.pul_2022Project/data/trinity/
Step 3: Set up Trinotate environmental variables in Andromeda HPC
Export the location of our Trinotate installation so we can run the software from any directory:
export TRINOTATE_HOME=/opt/software/Trinotate/4.0.2-foss-2022a
export PATH=$TRINOTATE_HOME:$PATH
Step 4: Load the sequence databases required for TRINOTATE_HOME
Trinotate relies heavily on SwissProt and Pfam, and custom protein files are generated as described below to be specifically used with Trinotate. You can obtain the protein database files by running this Trinotate build process. This step will download several data resources including the most current versions of swissprot, pfam, and other companion resources, create and populate a Trinotate boilerplate sqlite database (Trinotate.sqlite), and yield uniprot_sprot.pep file to be used with BLAST, and the Pfam-A.hmm.gz file to be used for Pfam searches. Run the build process like so:
$TRINOTATE_HOME/Trinotate --create \
--db myTrinotate.sqlite \
--trinotate_data_dir /path/to/TRINOTATE_DATA_DIR \
--use_diamond
where /path/to/TRINOTATE_DATA_DIR is the directory where you want all the Trinotate data resources to be installed – and should not be your current working directory, but can be a destionation directory found within your current working directory (ie. $PWD != /path/to/TRINOTATE_DATA_DIR)
the –db myTrinotate.sqlite (or whatever you name it based on your target transcriptome) is what should be used for subsequent Trinotate commands below.
include –use_diamond if you plan to use diamond blast with Trinotate, otherwise NCBI blast+ will be used for database preparation.
and once it completes, it will create the ‘myTrinotate.sqlite’ database in your current working directory, and you’ll find resources added to the TRINOTATE_DATA_DIR including:
uniprot_sprot.pep Pfam-A.hmm.gz
Create script and run:
nano /data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/scripts/Trinotate_database_build.sh
#!/bin/bash
#SBATCH --job-name=Trinotate # Job name
#SBATCH --time=72:00:00 # Time limit: 3 days
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks-per-node=1 # Number of tasks per node
#SBATCH --exclusive # Exclusive node allocation
#SBATCH --export=NONE # Do not export the environment
#SBATCH --mem=150GB # Memory allocation
#SBATCH --mail-type=BEGIN,END,FAIL # Send email notifications
#SBATCH --mail-user=danielle_becker@uri.edu # Email address for notifications
#SBATCH --account=putnamlab # Account to use
#SBATCH -D /data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/output
#SBATCH --error="script_error" # File for error messages
#SBATCH --output="output_script" # File for standard output
# Load necessary modules
module load Trinotate/4.0.2-foss-2022a
module load DIAMOND/2.1.9-GCC-13.2.0
# Set Trinotate environment variable
export TRINOTATE_HOME=/opt/software/Trinotate/4.0.2-foss-2022a
# Create Trinotate SQLite database and initialize with Diamond
$TRINOTATE_HOME/Trinotate --create \
--db myTrinotate.sqlite \
--trinotate_data_dir /data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/data \
--use_diamond
sbatch /data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/scripts/Trinotate_database_build.sh
Submitted batch job 338474
Step 5: Initialize your Trinotate sqlite database with your sequence data and filter my fasta for peptide protein coding regions for each transcript while selecting the longest predicted peptide sequences
The following inputs are required for Trinotate:
- transcripts.fasta : your target transcriptome in fasta format
- coding_seqs.pep : coding regions translated in fasta format (specific header formatting required - see below. Most use TransDecoder to generate this)
- gene_to_trans_map.tsv : pairwise mappings between gene and transcript isoform identifiers
If a Trinity reconstructed transcriptome is the target, then the transcripts.fasta and gene_to_trans_map.tsv are the final products of running Trinity. The coding_seqs.pep is derived from running TransDecoder to predict coding regions within the transcripts.
To select only protein coding transcripts and longest predicted peptide sequences:
First, extract the .pep ids
grep '^>' /data/putnamlab/dbecks/Heatwave_A.pul_2022Project/data/trinity/stranded_assembly_cdhit.fasta.transdecoder.pep | sed 's/^>//' > pep_base_ids.txt
#remove the > character
sed 's/^>//' pep_base_ids.txt > cleaned_pep_base_ids.txt
Next, filter the .fasta file for only the ids found in the .pep files
interactive
module load SeqKit/2.3.1
seqtk subseq stranded_assembly_cdhit.fasta cleaned_pep_base_ids.txt > filtered_stranded_assembly_cdhit.fasta
Next, filter any duplicate .pep ids from the fasta file by selecting the longest longest predicted peptide sequences
# create file with peptide peptide_lengths
awk '/^>/ {header=$0; getline seq; print header "\t" length(seq)}' /data/putnamlab/dbecks/Heatwave_A.pul_2022Project/data/trinity/stranded_assembly_cdhit.fasta.transdecoder.pep > peptide_lengths.txt
# extract longest predicted peptide sequence
awk '{
split($1, id, ".p");
gene_id=id[1];
len = $NF; # Length is the last field
if (len > max_len[gene_id]) {
max_len[gene_id] = len;
longest_peptide[gene_id] = $1; # Store the full peptide ID
}
}
END {
for (gene in longest_peptide)
print longest_peptide[gene];
}' peptide_lengths.txt > longest_peptide_ids.txt
#remove the > character
sed 's/^>//; s/\..*//' longest_peptide_ids.txt > cleaned_longest_peptide_ids.txt
# make final fasta
seqtk subseq stranded_assembly_cdhit.fasta cleaned_longest_peptide_ids.txt > final_filtered_stranded_assembly_cdhit.fasta
Create script and run:
nano /data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/scripts/Trinotate_sqlite.sh
#!/bin/bash
#SBATCH --job-name=Trinotate_sqlite # Job name
#SBATCH --time=72:00:00 # Time limit: 3 days
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks-per-node=1 # Number of tasks per node
#SBATCH --exclusive # Exclusive node allocation
#SBATCH --export=NONE # Do not export the environment
#SBATCH --mem=150GB # Memory allocation
#SBATCH --mail-type=BEGIN,END,FAIL # Send email notifications
#SBATCH --mail-user=danielle_becker@uri.edu # Email address for notifications
#SBATCH --account=putnamlab # Account to use
#SBATCH -D /data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/output
#SBATCH --error="script_error_sqlite.err" # File for error messages
#SBATCH --output="output_script_sqlite.out" # File for standard output
# Load Trinotate and SQLite modules
module load Trinotate/4.0.2-foss-2022a
module load SQLite/3.36-GCCcore-11.2.0
# Set paths for inputs
TRANSCRIPTS=/data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/output/final_assembly/CD-HIT/final_filtered_stranded_assembly_cdhit.fasta
PEP_FILE=/data/putnamlab/dbecks/Heatwave_A.pul_2022Project/data/trinity/stranded_assembly_cdhit.fasta.transdecoder.pep
GENE_MAP=/data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/output/final_assembly/CD-HIT/stranded_assembly_cdhit.fasta.gene_trans_map
TRINOTATE_DB=/data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/output/myTrinotate.sqlite
# Initialize Trinotate SQLite database
Trinotate --db $TRINOTATE_DB --init \
--gene_trans_map $GENE_MAP \
--transcript_fasta $TRANSCRIPTS \
--transdecoder_pep $PEP_FILE
sbatch /data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/scripts/Trinotate_sqlite.sh
Submitted batch job 338491
Step 6: Asses reduced transcriptome output with N50 and BUSCO
Run N50 on final filtered assembly:
/opt/software/Trinity/2.15.1-foss-2022a/trinityrnaseq-v2.15.1/util/TrinityStats.pl final_filtered_stranded_assembly_cdhit.fasta > final_trinity_cdhit_assembly_stats
################################
## Counts of transcripts, etc.
################################
Total trinity 'genes': 160088
Total trinity transcripts: 210772
Percent GC: 47.41
########################################
Stats based on ALL transcript contigs:
########################################
Contig N10: 4731
Contig N20: 3448
Contig N30: 2692
Contig N40: 2148
Contig N50: 1725
Median contig length: 719
Average contig: 1147.51
Total assembled bases: 241862276
The N10 through N50 values are shown computed based on all assembled contigs. In this example, 10% of the assembled bases are found in transcript contigs at least 3,594 bases in length (N10 value), and the N50 value indicates that at least half the assembled bases are found in contigs that are at least 812 bases in length.
The contig N50 values can often be exaggerated due to an assembly program generating too many transcript isoforms, especially for the longer transcripts. To mitigate this effect, the script will also compute the Nx values based on using only the single longest isoform per ‘gene’:
#####################################################
## Stats based on ONLY LONGEST ISOFORM per 'GENE':
#####################################################
Contig N10: 4654
Contig N20: 3315
Contig N30: 2533
Contig N40: 1976
Contig N50: 1547
Median contig length: 632
Average contig: 1043.14
Total assembled bases: 166994585
You can see that the Nx values based on the single longest isoform per gene are lower than the Nx stats based on all assembled contigs, as expected, and even though the Nx statistic is really not a reliable indicator of the quality of a transcriptome assembly, the Nx value based on using the longest isoform per gene is perhaps better for reasons described above.
Run BUSCO on reduced transcriptome output assembly:
nano /data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/scripts/busco_final_trans.sh
#!/bin/bash
#SBATCH --job-name="busco_final_trans"
#SBATCH --time="100:00:00"
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=36
#SBATCH --mem=120G
##SBATCH --output="busco-%u-%x-%j"
##SBATCH --account=putnamlab
##SBATCH --export=NONE
echo "START" $(date)
export NUMEXPR_MAX_THREADS=36
labbase=/data/putnamlab
busco_shared="${labbase}/shared/busco"
[ -z "$query" ] && query="${labbase}/dbecks/DeNovo_transcriptome/2023_A.pul/output/final_assembly/CD-HIT/final_filtered_stranded_assembly_cdhit.fasta" # set this to the query (genome/transcriptome) you are running
source "${busco_shared}/scripts/busco_init.sh" # sets up the modules required for this in the right order
# This will generate output under your $HOME/busco_output
cd "${labbase}/dbecks/DeNovo_transcriptome/2023_A.pul/output/final_assembly/CD-HIT/"
# Run BUSCO with the --offline flag and specify the download path
busco --config "$EBROOTBUSCO/config/config.ini" -f -c 20 -i "${query}" -l metazoa_odb10 -o busco_output -m transcriptome --offline --download_path "${busco_shared}/downloads/"
echo "STOP" $(date)
sbatch /data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/scripts/busco_final_trans.sh
Submitted batch job 339114
cd /data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/output/final_assembly/CD-HIT/busco_output
less short_summary.specific.metazoa_odb10.busco_output.txt
--------------------------------------------------
|Results from dataset metazoa_odb10 |
--------------------------------------------------
|C:99.8%[S:29.8%,D:70.4%],F:0.1%,M:0.1%,n:954 |
|952 Complete BUSCOs (C) |
|280 Complete and single-copy BUSCOs (S) |
|672 Complete and duplicated BUSCOs (D) |
|1 Fragmented BUSCOs (F) |
|1 Missing BUSCOs (M) |
|954 Total BUSCO groups searched |
--------------------------------------------------
Step 7: Run sequence analysis using BLAST, Pfam, SignalP, and other tools, and loads the results into the Trinotate database
To run the sequence analyses and database searches, simply run Trinotate like so:
Trinotate --db <sqlite.db> --CPU <int> \
--transcript_fasta <file> \
--transdecoder_pep <file> \
--trinotate_data_dir /path/to/TRINOTATE_DATA_DIR
--run "swissprot_blastp swissprot_blastx pfam signalp6 tmhmmv2 infernal EggnogMapper" \
--use_diamond
where, under –run, the list of analyses to perform are indicated within a quoted list. Of course, the required tools for performing each analysis should be installed as per above.
setting ‘–run ALL’ is shorthand and equivalent to the above –run list.
parameter -E or –evalue can be used to set the E-value threshold for blast searches (default: 1e-5)
When the Trinotate –run is used to perform analyses, the results are automatically loaded into the Trinotate sqlite database.
If you need to run TmHMM or signalP separately (due to licensing issues), instructions are provided separately here.
Create script and run:
nano /data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/scripts/Trinotate_sequences.sh
#!/bin/bash
#SBATCH --job-name=Trinotate_sequences # Job name
#SBATCH --time=72:00:00 # Time limit: 3 days
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks-per-node=1 # Number of tasks per node
#SBATCH --exclusive # Exclusive node allocation
#SBATCH --export=NONE # Do not export the environment
#SBATCH --mem=120GB # Memory allocation
#SBATCH --mail-type=BEGIN,END,FAIL # Send email notifications
#SBATCH --mail-user=danielle_becker@uri.edu # Email address for notifications
#SBATCH --account=putnamlab # Account to use
#SBATCH -D /data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/output
#SBATCH --error="script_error_sequences.err" # File for error messages
#SBATCH --output="output_script_sequences.out" # File for standard output
# Load Trinotate and SQLite modules
module load Trinotate/4.0.2-foss-2022a
module load DIAMOND/2.1.0-GCC-11.3.0
module load HMMER/3.3.2-iimpi-2021b
# Set paths for inputs and Trinotate database
TRANSCRIPTS=/data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/output/final_assembly/CD-HIT/final_filtered_stranded_assembly_cdhit.fasta
PEP_FILE=/data/putnamlab/dbecks/Heatwave_A.pul_2022Project/data/trinity/stranded_assembly_cdhit.fasta.transdecoder.pep
GENE_MAP=/data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/output/final_assembly/CD-HIT/stranded_assembly_cdhit.fasta.gene_trans_map
TRINOTATE_DB=/data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/output/myTrinotate.sqlite
TRINOTATE_DATA_DIR=/data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/data
# Run sequence analyses using Trinotate
Trinotate --db $TRINOTATE_DB --CPU 8 \
--transcript_fasta $TRANSCRIPTS \
--transdecoder_pep $PEP_FILE \
--trinotate_data_dir $TRINOTATE_DATA_DIR \
--run "swissprot_blastp swissprot_blastx pfam infernal EggnogMapper" \
--use_diamond
sbatch /data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/scripts/Trinotate_sequences.sh
Submitted batch job 338492
Step 8: Generate the Trinotate Report
The Trinotate annotation report is generated using the –report parameter like so:
Trinotate --db <sqlite.db> --report [ -E (default: 1e-5) ]
[--pfam_cutoff DNC|DGC|DTC|SNC|SGC|STC (default: DNC=domain noise cutoff)]
[--incl_pep]
[--incl_trans]
an an example command might look like so:
Trinotate --db myTrinotate.sqlite --report > myTrinotate.tsv
The report is a tab-delimited output with the following columns:
0 #gene_id 1 transcript_id 2 sprot_Top_BLASTX_hit 3 infernal 4 prot_id 5 prot_coords 6 sprot_Top_BLASTP_hit 7 Pfam 8 SignalP 9 TmHMM 10 eggnog 11 Kegg 12 gene_ontology_BLASTX 13 gene_ontology_BLASTP 14 gene_ontology_Pfam 15 transcript # optional, use –incl_trans 16 peptide # optional, use –incl_pep
The formatting of the data fields is somewhat intuitive and relatively easy to parse. Missing data or NULL results are indicated by ‘.’ placeholders.
If EggnogMapper is included, the EggnogMapper results are further integrated into this tabulated output with an expanded set of columns and easily identified as derived accordingly.
Create script and run:
nano /data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/scripts/Trinotate_report.sh
#!/bin/bash
#SBATCH --job-name=Trinotate_report # Job name
#SBATCH --time=72:00:00 # Time limit: 3 days
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks-per-node=1 # Number of tasks per node
#SBATCH --exclusive # Exclusive node allocation
#SBATCH --export=NONE # Do not export the environment
#SBATCH --mem=150GB # Memory allocation
#SBATCH --mail-type=BEGIN,END,FAIL # Send email notifications
#SBATCH --mail-user=danielle_becker@uri.edu # Email address for notifications
#SBATCH --account=putnamlab # Account to use
#SBATCH -D /data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/output
#SBATCH --error="script_error_report.err" # File for error messages
#SBATCH --output="output_script_report.out" # File for standard output
# Load Trinotate and SQLite modules
module load Trinotate/4.0.2-foss-2022a
# Set paths for inputs and Trinotate database
TRINOTATE_DB=/data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/output/myTrinotate.sqlite
REPORT_FILE=/data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/output/myTrinotate_report.tsv
# Generate the Trinotate annotation report
Trinotate --db $TRINOTATE_DB --report > $REPORT_FILE
sbatch /data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/scripts/Trinotate_report.sh
Submitted batch job 339046
Step 9: Extract GO assignments from the Trinotate Report
Use the extract GO assignments from Trinotate report pipeline to get GO terms in a .txt file for later GO enrichment analysis.
nano /data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/scripts/Trinotate_GO_extract.sh
#!/bin/bash
#SBATCH --job-name=Trinotate_GO_extract # Job name
#SBATCH --time=72:00:00 # Time limit: 3 days
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks-per-node=1 # Number of tasks per node
#SBATCH --exclusive # Exclusive node allocation
#SBATCH --export=NONE # Do not export the environment
#SBATCH --mem=150GB # Memory allocation
#SBATCH --mail-type=BEGIN,END,FAIL # Send email notifications
#SBATCH --mail-user=danielle_becker@uri.edu # Email address for notifications
#SBATCH --account=putnamlab # Account to use
#SBATCH -D /data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/output
#SBATCH --error="script_error_extract.err" # File for error messages
#SBATCH --output="output_script_extract.out" # File for standard output
# Load Trinotate and SQLite modules
module load Trinotate/4.0.2-foss-2022a
# Set paths for the Trinotate report file
REPORT_FILE=/data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/output/myTrinotate_report.tsv
# Set path for the Trinotate utilities
UTIL_PATH=/opt/software/Trinotate/4.0.2-foss-2022a/util
# Run GO annotation extraction script
$UTIL_PATH/extract_GO_assignments_from_Trinotate_xls.pl \
--Trinotate_xls $REPORT_FILE \
-G --include_ancestral_terms \
--go_sources gene_ontology_BLASTX,gene_ontology_BLASTP,gene_ontology_Pfam,EggNM.GOs \
> go_annotations.txt
# Confirm the job completion
echo "GO annotations extracted successfully to go_annotations.txt"
sbatch /data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/scripts/Trinotate_GO_extract.sh
Submitted batch job 340046
Step 10: Download functional annotation and GO extracted .txt file to local computer
scp -r danielle_becker@ssh3.hac.uri.edu:/data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/output/myTrinotate_report.tsv /Users/Danielle/Desktop/Putnam_Lab/A.pul_Heatwave/bioinformatics/data
scp -r danielle_becker@ssh3.hac.uri.edu:/data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/functional_annotation/Trinotate/output/go_annotations.txt /Users/Danielle/Desktop/Putnam_Lab/A.pul_Heatwave/bioinformatics/data