spacer
spacer

IPI - International Protein Index

IPI provides a top level guide to the main databases that describe the proteomes of higher eukaryotic organisms. IPI:
  1. effectively maintains a database of cross references between the primary data sources
  2. provides minimally redundant yet maximally complete sets of proteins for featured species (one sequence per transcript)
  3. maintains stable identifiers (with incremental versioning) to allow the tracking of sequences in IPI between IPI releases.
IPI is updated monthly in accordance with the latest data released by the primary data sources.

IPI Quick Search

Search for  

IPI search: type in a database identifier or protein name (e.g. IPI00015171, P50238, ENSP00000332449, TFR2, etc.) to retrieve matching entries from one or all of the current IPI dataset's.

IPI History search: track deleted and secondary identifiers (e.g. IPI00030830) by searching the IPI history database (more information available here).

You can also...

Publication

If you use IPI in any published work, please cite the following reference:


Kersey P. J., Duarte J., Williams A., Karavidopoulou Y., Birney E., Apweiler R.
The International Protein Index: An integrated database for proteomics experiments.
Proteomics 4(7): 1985-1988 (2004).
abstract full-text PDF

News

IPI News
September 2007
Schedule Changes
The IPI schedule has now been modified to a three week production schedule to coincide with public Uniprot releases.
MySql Table Rename
The release table in the MySql dumps has been renamed to current_release.
August 2007
Format changes
IPI has updated the CC (comment) line of the IPI UniProt file format to enable support of multiple genomic locations in IPI entries.
March 2007
Format changes
IPI FASTA file format documentation updated with addition of Gene symbols.
February 2007
A new version of IPI, MSIPI, is now available for human and mouse. MSIPI contains additional information about cSNPs and N-terminal peptides in a format suitable for easy use in mass spectrometry search engines. MSIPI is produced by the the Max-Planck Institute for Biochemistry at Martinsried and the University of Southern Denmark (more).
January 2007
Mappings from over 500,000 NCBI GIs to IPI IDs are now available. The file gi2ipi.xrefs.gz is available from the IPI FTP site. A described of the file format is available here
Ensembl release 42 available in IPI
July 2006
Ensembl release 39 available in IPI
Ensembl has released a new build and assembly for Mouse and Zebrafish (more).
Format changes
MySQL Dumps format documentation updated according to a new relational schema easier to query with public IDs used as keys, also featuring now organism and release information.
May 2006
Ensembl release 38 available in IPI
Ensembl has released a new build and assembly for Human and Mouse (more). The new Human assembly includes more than 12,000 full-length protein-coding transcripts annotated by the Havana team which are mainly redundant with Vega data. IPI identifies these transcripts and reflects this redundancy.
H-InvDB representative set used in place of the complete set
Following the release 3.0 of the H-Invitational Database in March, IPI now uses the H-Invitational representative set (~24,000 transcripts) in place of the complete set (~170,000 transcripts).
Elimination of unsupported hypothetical proteins in IPI
RefSeq ab initio predictions (entries with curation status MODEL) are now filtered out of IPI along with H-Inv hypothetical proteins and pseudogene candidates when there is no clear support for their validity (see here for explanations).
Format changes
UniProt-like format documentation updated with addition of "ENSEMBL_HAVANA" cross-reference. Similar changes affecting Gene xrefs file format, Protein xrefs file format.
History file format documentation updated with new comment 'Unsupported hypothetical protein'.
March 2006
Cross References to PathoSign database added
PathoSign is a database which collects data about defective cell signaling molecules causing human diseases.
Cross-references to PathoSign data have been added to IPI UniProt format files.
Format changes
UniProt-like format comments documentation updated with addition of "-!- STRAND:" comment.
End co-ordinates and strand columns inserted after Start co-ordinates column in the Gene xrefs file format (subsequent columns shifted to the right).
February 2006
RefSeq entry revision status in IPI
The RefSeq entry revision status is now used to build IPI data sets. This information will provide a finer level of granularity in the level of trust which can be given to an IPI entry according to its cross-referenced primary data sources. Documentation of consequent file format changes is accessible from the Announcements page.
January 2006
IPI data sets released for Cow
We are pleased to announce the release of the first IPI data sets for the Bos taurus (Cow). The first release of IPI cow contains 33699 protein sequences, with cross-references to UniProtKB, Ensembl, RefSeq and Entrez Gene.
As with the data for other species, the Cow datasets are available in UniProt or FASTA format, and additional summary files are also available (click here for more information about available file formats).

>>>Visit the News Archive

Copyright

IPI - The International Protein Index.

Copyright © The European Bioinformatics Institute.

The IPI may be copied and redistributed freely, without advance permission, provided that this copyright statement is reproduced with each copy.

Contact

You can contact us here.

Funding

The development of IPI is funded by the European Commission under FELICS, contract number 021902 (RII3) within the Research Infrastructure Action of the FP6 "Structuring the European Research Area" Programme

spacer
spacer