8000 Add Rob's reorganization by sdhutchins · Pull Request #1 · datasnakes/OrthoEvolution · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add Rob's reorganization #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

< 8000 /div>
Merged
merged 64 commits into from
May 3, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
9bd398f
Cleaned; Added PAML organisms, taxon_ids.
grabear Apr 24, 2017
afadd89
README for lister.
grabear Apr 24, 2017
282a1d9
Added index directory for testing.
grabear Apr 24, 2017
f329286
Needs a rework.
grabear Apr 24, 2017
4c52aeb
Cleaned; Added P
grabear Apr 24, 2017
f0964ec
Added post-blast analysis(duplication, missing); Added paml org, and …
grabear Apr 26, 2017
201aebf
Test file - Missing and duplicated data.
grabear Apr 26, 2017
9f54163
Test File - Template
grabear Apr 26, 2017
14f9f9d
Added stuff from shaurita.
grabear Apr 26, 2017
9f16ed3
Added MyGene data frame generator.
grabear Apr 27, 2017
2f7e5ba
Updated lister class.
grabear May 1, 2017
cda2f58
Created a Lister subclass.
grabear May 1, 2017
2b8fe18
Created a BLASTn class.
grabear May 1, 2017
76794eb
Added Shauritas logit class.
grabear May 1, 2017
d31ba4e
Update.
grabear May 1, 2017
b421a0e
Finished BLASTn class; needs testing.
grabear May 2, 2017
3179924
Deprecation.
grabear May 2, 2017
1d56733
Added post blast analysis to BLASTingTemplate class
grabear May 2, 2017
db7dbd2
Added low level taxonomy function to Lister class.
grabear May 2, 2017
5955281
Added .gitignore file and PyCharm Plugin
grabear May 2, 2017
3044c92
Added post blast analysis logger.
grabear May 2, 2017
e2b570c
Added .gitignore file and PyCharm Plugin
grabear May 2, 2017
cc6105e
Added .gitignore file and PyCharm Plugin
grabear May 2, 2017
c8687bd
Removed .idea from VCS
grabear May 2, 2017
fa13feb
MCSR blast testing script
grabear May 2, 2017
d6f2219
MCSR blast testing script
grabear May 2, 2017
07e33d3
MCSR blast testing script
grabear May 2, 2017
97404c3
MCSR blast testing script
grabear May 2, 2017
ab2d64f
MCSR edit
May 2, 2017
ba66e63
Merge branch 'master' of https://github.com/datasnakes/Datasnakes-Script
May 2, 2017
d01162f
Added Shauritas function to create a gi-list.
grabear May 2, 2017
0d4d84d
Fixed errors.
grabear May 2, 2017
5a64a7a
Merge branch 'master' of https://github.com/datasnakes/Datasnakes-Scr…
May 2, 2017
93e1214
Fixed errors.
grabear May 2, 2017
8e57a4d
Added .log to .gitignore; Updated BLASTn class
May 2, 2017
335a3e6
update
May 2, 2017
7050b8c
Fixed errors.
grabear May 2, 2017
632d413
Fixed errors.
grabear May 2, 2017
93b77de
Fixing bugs.
May 2, 2017
010e66e
Merge branch 'master' of https://github.com/datasnakes/Datasnakes-Scr…
May 2, 2017
6ba1b95
Update
May 2, 2017
44d0f1b
Fixed gi_list method.
grabear May 2, 2017
677ce36
Fixed gi_list method.
grabear May 2, 2017
7e6220d
Fixed gi_list method.
grabear May 2, 2017
32d13a5
Fixed gi_list method.
grabear May 2, 2017
997874f
Fixed gi_list method.
grabear May 2, 2017
95457c2
Renamed from BLASTingTemplate.py to blast_analysis.py
grabear May 2, 2017
50254da
Renamed from lister.py to ortho_analysis.py
grabear May 2, 2017
e0ec7b9
Add Shaurita's BLAST scripts.
grabear May 2, 2017
092b1ad
Add Shaurita's BLAST scripts.
grabear May 2, 2017
166126b
Reduced; Add SLACK stuff to logging class.
grabear May 2, 2017
5d10927
Update.
grabear May 2, 2017
5a9f998
Update.
grabear May 2, 2017
13b1cf0
Archived blast files.
grabear May 3, 2017
406fdae
Archived blastn files.
grabear May 3, 2017
9790ca5
Archived blast setup files.
grabear May 3, 2017
28ead59
Archived Entrez and MyGene files.
grabear May 3, 2017
4c4b50d
Archived MyGene files.
grabear May 3, 2017
8df88ef
Archived Pandas files.
grabear May 3, 2017
d7237ad
Added a centralized BLAST directory
grabear May 3, 2017
7c51c4b
Updated Packages.
grabear May 3, 2017
b3da1ea
Created Orthologs module and moved files.
grabear May 3, 2017
cc0d1c0
Created Tools module and moved files.
grabear May 3, 2017
b1f96b4
Archived files.
grabear May 3, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
11 changes: 11 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
*.gbff
/manager/data
*.log
*.gbk
*.pyc
*.db
*.ffn.best
shiny_dir_mana.txt
/.Rproj*
/.idea
/.idea/**
8 changes: 0 additions & 8 deletions .idea/Vallender-Labs-Scripts.iml

This file was deleted.

36 changes: 0 additions & 36 deletions .idea/misc.xml

This file was deleted.

8 changes: 0 additions & 8 deletions .idea/modules.xml

This file was deleted.

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
7 changes: 7 additions & 0 deletions Orthologs/blasting test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
from Orthologs.manager.blast.blastn import BLASTn

x = BLASTn('MAFV3.2.csv') # This is a template for GPCR project

BLASTER = x.blast_config

BLASTER(x.blast_human, 'Homo_sapiens', auto_start=True)
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
195 changes: 195 additions & 0 deletions Orthologs/manager/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
## Using Lister

Lister is a class that creates a Lister object, which grants access to all of the initial data that we will use for the project.

## Description

The data to beinitialized includes the following:

* Master_Accession_File.csv (include project name)
* common_names.csv (PAML) (rename)
* taxon_ids.txt

**Add some examples of these files as template**

## Usage
Using docstrings for help:
```python
>>> from manager.lister import Lister
>>> help(Lister)
```
Using this class is simple:
```python
>>> from lister import Lister
>>> from pprint import pprint
>>> x = Lister(acc_file = 'Master_Accession_File.csv', paml_file = 'common_names.csv', taxon_file = 'taxon_ids.txt')
```
Lister uses pandas to manipulate our data so that we can more easily call it other processes.

###Parameters
```python
from lister import Lister
x = Lister(acc_file, paml_file, taxon_file, go_list = None, hgnc_file = False)
```
* **acc_file** (*'Master_Accession_file.csv'*) - Contains accession numbers for a group of genes ranked by tier. Each gene has a group of
orthlogs used in our phylogenetic anlysis.
* **paml_file** (*'commonnames.csv'*) - Contains a list of shortened organism names used in the MSA files. This is done to compy with
PAML.
* **taxon_file** (*'taxon_ids.txt'*) - Contains an ordered list of taxon ids
* **go_list** (*[[gene.1, org.1], ... , [gene.n, org.n]]*) - A nested list that can be used to get information about specific gene/org pairs.
* **hgnc_file** - For future implementation. Used as a file handle to parse an HGNC *.csv* file.

###Variables
* x.gene_count
* x.org_count
* x.paml_org_list
* x.taxon_ids

###Lists
List that contain header info(**x.header**):
```python
>>>pprint(x.header)
['Tier',
'Gene',
'Homo_sapiens',
'Macaca_mulatta',
'Mus_musculus',
'Rattus_norvegicus',
...
'Trichechus_manatus_latirostris',
'Tupaia_chinensis',
'Tursiops_truncatus']
```
List of Accessions (**x.acc_list**):
```python
>>> pprint(x.acc_list)
['NM_000680.3',
'NM_000679.3',
'NM_000678.3',
...
'xm_004368425.2',
'XM_006155397.2',
'XM_004317686.1']
```
List of Genes (**x.gene_list**):
```python
>>> pprint(x.gene_list)
['ADRA1A',
'ADRA1B',
'CHRM2',
...
'SSTR2',
'TSHR',
'VIPR1']
```
List of Organisms (**x.org_list**):
```python
>>> pprint(x.org_list)
['Homo_sapiens',
'Macaca_mulatta',
'Mus_musculus',
'Rattus_norveg',
...
'Trichechus_manatus_latirostris',
'Tupaia_chinensis',
'Tursiops_truncatus']
```
###Dictionaries
Dictionary of Accessions is a nested list(**x.acc_dict**):
```python
>>> pprint(x.acc_dict)
{'NM_000115.3': ['EDNRB', 'Homo_sapiens'],
'NM_000145.3': ['FSHR', 'Homo_sapiens'],
'NM_000164.3': ['GIPR', 'Homo_sapiens'],
...
'NM_001001620.1': ['CCR3', 'Sus_scrofa'],
'NM_001002911.3': ['GPR139', 'Homo_sapiens'],
'NM_001002944.1': ['ADORA2B', 'Canis_lupus_familiaris']}
```
Dictionary of Genes is a nested dictionary. (**x.gene_dict**, **x.tier_dict**):
```python
>>> pprint(x.gene_dict['HTR1A'])
{'Ailuropoda_melanoleuca': 'XM_002926305.1',
'Bos_taurus': 'XM_600535.5',
'Callithrix_jacchus': 'XM_008992005.2',
...
'Tier': '1',
'Trichechus_manatus_latirostris': 'xm_004374552.2',
'Tupaia_chinensis': 'xm_006156821.1',
'Tursiops_truncatus': 'xm_004325159.1'}
###########################################################################
>>> pprint(x.tier_dict['HTR1A'])
'1'
```
Dictionary of Organisms is a nested dictionary (**x.org_dict**)
```python
>>> HS_query = x.org_dict['Homo_sapiens'].values()
>>> HS_gene_list = x.org_dict['Homo_sapiens'].keys()
>>> pprint(list(HS_query))
['NM_000680.3',
'NM_000679.3',
'NM_000678.3',
...
'XM_011517263.2',
'NM_000369.2',
'NM_004624.3']
```
Dictionar

###Dataframes
Dataframe that uses Gene as an index(**x.df**):
```python
>>> pprint(x.df.T.HTR1A)
Tier 1
Homo_sapiens NM_000524.3
Macaca_mulatta NM_001198700.1
Mus_musculus NM_008308.4
...
Tupaia_chinensis xm_006156821.1
Tursiops_truncatus xm_004325159.1
Name: HTR1A, dtype: object
```
Pivot Table MultiIndexed with pandas(**x.pt**):
```python
# #### Format the main pivot table #### #
self.pt = pd.pivot_table(pd.read_csv(self.__filename_path), index=['Tier', 'Gene'], aggfunc='first')
array = self.pt.axes[1].tolist() # Organism list
self.pt.columns = pd.Index(array, name='Organism')
```

Dictionary of Dataframes that correspond to tiers (**x.get_tier_frame**, **x.tier_frame_dict**):
```python
>>> Tiers = x.get_tier_frame('1')
>>> Tiers.keys()
dict_keys(['1'])
>>> Tiers = x.tier_frame_dict()
>>> Tiers.keys()
dict_keys(['1', '2', '3', 'None'])
```

###Methods
Lookup Accessions (**x.get_accession(gene, org)**, **x.get_accesions(go_list=None)**)
```python
>>> x.get_accession('HTR1A', 'Homo_sapiens')
'NM_000524.3'
>>> go_list = [['HTR1A', 'Homo_sapiens'], ['HTR1A', 'Macaca_mulatta']]
>>> x.get_accessions(go_list = go_list)
['NM_000524.3', 'NM_001198700.1']
```
Lookup a list of Accession for alignment(**x.get_accession_alignment(gene)**):
```python
>>> pprint(x.get_accession_alignment('HTR1A'))
['NM_000524.3',
'NM_001198700.1',
...
'xm_006156821.1',
'xm_004325159.1']
```
Get the master lists from a new dataframe(**self.get_master_list(df)**):
```python
>>> from manager.lister import Lister
>>> import os
>>> from pathlib import Path
>>> y = Lister()
>>> y.get_master_lists(csv_file='MAFV3.1.csv')
```
Loading
0