IPI - International Protein Index

IPI provides a top level guide to the main databases that describe the proteomes of higher eukaryotic organisms. IPI:

effectively maintains a database of cross references between the primary data sources
provides minimally redundant yet maximally complete sets of proteins for featured species (one sequence per transcript)
maintains stable identifiers (with incremental versioning) to allow the tracking of sequences in IPI between IPI releases.

IPI is updated monthly in accordance with the latest data released by the primary data sources.

IPI Quick Search

IPI search: type in a database identifier or protein name (e.g. IPI00015171, P50238, ENSP00000332449, TFR2, etc.) to retrieve matching entries from one or all of the current IPI dataset's.

IPI History search: track deleted and secondary identifiers (e.g. IPI00030830) by searching the IPI history database (more information available here).

You can also...

Download the IPI datasets here (more information).
Search IPI under SRS at the EBI's SRS server.
Fetch IPI entries using dbfetch (more information).
Search using BLAST or FASTA algorithms against the IPI at the EBI.
Get statistics for the latest IPI releases:
Check IPI Frequently asked questions
Subscribe to IPI Announcements mailing list

Publication

If you use IPI in any published work, please cite the following reference:

Kersey P. J., Duarte J., Williams A., Karavidopoulou Y., Birney E., Apweiler R.

The International Protein Index: An integrated database for proteomics experiments.

Proteomics 4(7): 1985-1988 (2004).

abstract full-text PDF

News

IPI News

September 2007

Schedule Changes

The IPI schedule has now been modified to a three week production schedule to coincide with public Uniprot releases.

MySql Table Rename

The release table in the MySql dumps has been renamed to current_release.

August 2007

Format changes

IPI has updated the CC (comment) line of the IPI UniProt file format to enable support of multiple genomic locations in IPI entries.

March 2007

Format changes

IPI FASTA file format documentation updated with addition of Gene symbols.

February 2007

A new version of IPI, MSIPI, is now available for human and mouse. MSIPI contains additional information about cSNPs and N-terminal peptides in a format suitable for easy use in mass spectrometry search engines. MSIPI is produced by the the Max-Planck Institute for Biochemistry at Martinsried and the University of Southern Denmark (more).

January 2007

Mappings from over 500,000 NCBI GIs to IPI IDs are now available. The file gi2ipi.xrefs.gz is available from the IPI FTP site. A described of the file format is available here

Ensembl release 42 available in IPI

July 2006

Ensembl release 39 available in IPI

Ensembl has released a new build and assembly for Mouse and Zebrafish (more).

Format changes

MySQL Dumps format documentation updated according to a new relational schema easier to query with public IDs used as keys, also featuring now organism and release information.

May 2006

Ensembl release 38 available in IPI

Ensembl has released a new build and assembly for Human and Mouse (more). The new Human assembly includes more than 12,000 full-length protein-coding transcripts annotated by the Havana team which are mainly redundant with Vega data. IPI identifies these transcripts and reflects this redundancy.

H-InvDB representative set used in place of the complete set

Following the release 3.0 of the H-Invitational Database in March, IPI now uses the H-Invitational representative set (~24,000 transcripts) in place of the complete set (~170,000 transcripts).

Elimination of unsupported hypothetical proteins in IPI

RefSeq ab initio predictions (entries with curation status MODEL) are now filtered out of IPI along with H-Inv hypothetical proteins and pseudogene candidates when there is no clear support for their validity (see here for explanations).

Format changes

UniProt-like format documentation updated with addition of "ENSEMBL_HAVANA" cross-reference. Similar changes affecting Gene xrefs file format, Protein xrefs file format.

History file format documentation updated with new comment 'Unsupported hypothetical protein'.

March 2006

Cross References to PathoSign database added

PathoSign is a database which collects data about defective cell signaling molecules causing human diseases.
Cross-references to PathoSign data have been added to IPI UniProt format files.

Format changes

UniProt-like format comments documentation updated with addition of "-!- STRAND:" comment.

End co-ordinates and strand columns inserted after Start co-ordinates column in the Gene xrefs file format (subsequent columns shifted to the right).

February 2006

RefSeq entry revision status in IPI

The RefSeq entry revision status is now used to build IPI data sets. This information will provide a finer level of granularity in the level of trust which can be given to an IPI entry according to its cross-referenced primary data sources. Documentation of consequent file format changes is accessible from the Announcements page.

January 2006

IPI data sets released for Cow

We are pleased to announce the release of the first IPI data sets for the Bos taurus (Cow). The first release of IPI cow contains 33699 protein sequences, with cross-references to UniProtKB, Ensembl, RefSeq and Entrez Gene.

As with the data for other species, the Cow datasets are available in UniProt or FASTA format, and additional summary files are also available (click here for more information about available file formats).

>>>Visit the News Archive

Copyright

IPI - The International Protein Index.

The IPI may be copied and redistributed freely, without advance permission, provided that this copyright statement is reproduced with each copy.

Contact

You can contact us here.

Funding

The development of IPI is funded by the European Commission under FELICS, contract number 021902 (RII3) within the Research Infrastructure Action of the FP6 "Structuring the European Research Area" Programme