The true positive and negative data sets were used to generate training and testing data sets. The training data sets were used to build the support vector machine (SVM) based IRES prediction models. While testing data sets were used to validate their performance.
The positive and negative data sets are available here. The file consists of Tables S1-S6.
Positive data set consists of a total of 189 viral and cellular 5´UTR sequences given in Table S1 and S2. Negative data set consists of a total of 189 viral and cellular gene coding sequences and 5´UTRs of cellular housekeeping genes. Details of negative data sets are provided in Table S3, S4 and S5.
List of 27 small subunit ribosomal proteins (SSRP) used to compute their interaction probabilities with UTRs is given in Table S6.