STRuster


Download STRuster Library

This package includes the source code for the STRuster library and a few sample scripts. The source code is provided under GPL license. The library is not yet documented. We hope the sample job files provided can help you to start using it.

Download the STRuster Library

From the README file
This package includes the source code for the STRuster library and a few sample scripts.
The source code is provided under GPL license.
The library is not yet documented. We hope the sample job files provided can help you to start using it.


Requirements:
    Linux (tested on kernel 2.6.20), should run fine in other UNIXes too.
    Python (tested on 2.4.4)
    Numeric (tested on 24.2-7)
    Biopython (tested on 1.42-2)
    Rpy (tested on 1.0)
    mmLib (tested on 1.0.0)
    R (tested on 2.4.0)
    
Package contents:
    DATA/  
        Contains some raw data required to run the alpha amylase example:
            PDB files as provided by ASTRAL, definition of sets, alignments, coordinate uncertainties and UniProt sequence
    
    README 
        This documentation
        
    RESULTS/
        Includes some example scripts that illustrate different types of analysis for the alpha amylase:
         BCK/ uses backbone atoms  
            job_bck_var.py: get fasta sequences, cluster, S T R X matrices, superposition
            job_bck_comp.py: U 
         SC/ uses centroid of side chain atoms to analyse ligand binding site
            job_sc_var.py: clustering, S T R X matrices 
            job_sc_comp.py: U

    SRC/
        The source code
    
How to run:
    Download the package file struster_3_1_0.tar.gz
    uncompress and untar, this create the directory STRUSTER_3_1_0 that contains all files:
        tar xvzf struster_3_1_0.tar.gz        
    set STRUSTER environment variable. If you install the package under /home/me/MYDIRECORY
        export STRUSTER=/home/me/MYDIRECORY/STRUSTER_3_1_0
    run using python, for example:
        /home/me/MYDIRECORY/STRUSTER_3_1_0/RESULTS/BCK$ python job_bck_var.py




Running on a user defined set of proteins:
    
To apply the method to user defined set of proteins, you generate first a FASTA file
with the sequences of each protein in the set including a reference uniprot sequence,
then align the sequences (manually or with a alignment program).
Then read the alignment back into STRuster.
To generate the sequences you should use:
baseset.write_sequences_fasta(uniprot_sequence_directory, list with name of uniprot sequence, name_of_fasta_file)
for example, in RESULTS/BCK/job_bck_var.py:
# write sequences as fasta file
uniprot_id_lst = ['P00690']
fasta_file = "example_set.fasta"
baseset.write_sequences_fasta(config_dic['uniprot_dir'], uniprot_id_lst, fasta_file)

config_dic['uniprot_dir'] provides a path where to find the sequence file P00690.fasta,
which is the reference sequence defined in uniprot_id_lst.
The output sequence file name is defined in fasta_file 

After aligning the sequences in fasta_file you can read them into STRuster.
In job_bck_var.py:
alig_file = os.path.join(os.environ['STRUSTER'],'DATA/SETS/51459_alig.fasta')
baseset.get_aligment_from_msa(alig_file)

Where alig_file is the file with aligned sequences.
Note that alig_file should include the sequences generated with baseset.write_sequences_fasta(),
as this is the only way to guarantee the correct mapping between residue position in the sequence
and PDB residue numbering. You should not use another source for the sequences of the protein structures.


Alternatively you can align the structures in an automated way
using the PDB to Uniprot mapping as provided by Andrew Martin (http://www.bioinf.org.uk/pdbsws/).
In this case you need to get the fasta sequence files of the uniprot entries mapped to PDB using job_uniprot_pdbsws.py,
then compile the PDB/uniprot mapping files using job_compile_maps.py.
Then you can get the alignments using baseset.get_alignment().
job_uniprot_pdbsws.py and job_compile_maps.py are available in the RESULTS directory.
In  job_uniprot_pdbsws.py you need to provide the PDB to Uniprot mapping file (pdbswsmap_file),
a list uniprot sequence files (seqdb_lst), and the directory where you save the fasta sequence files (fasta_dir).
In job_compile_maps.py you have to provide the file with PDB/uniprot mapping (pdbsws_file),
and the directory with the compiled mapping files (stmap_dir).
Then in job_bck_var.py you comment the line
baseset.get_aligment_from_msa(alig_file)
and instead uncomment
baseset.get_alignment(stmap_dir,uniprot_dir)
You have to define stmap_dir as the directory with the mapping files,
and uniprot_dir needs to be defied as the directory with the uniprot sequence files.




Contact:
    Francisco S. Domingues
    email: doming@mpi-sb.mpg.de