science
Defining absolute protein abundance
At the heart of Systems Biology is a vast hunger for measurements. mRNA abundance, metabolite concentration, reactions rates, degradation rates, protein abundance. This last measurement has long been problematic for researchers, mass spectrometers get increasingly accurate and powerful, but are still hindered by the simple fact that observed signal intensity does not necessarily correlate directly with the abundance of that peptide in the sample. Factors such as peptide ionisation efficiencies, dominant neighbour effects, and missing observations all give rise to erroneous estimates of peptide quantities. Until recently, the best way to get close to measures of protein abundance was to use a peptide tagging methodology, but these are typically expensive, and provide only relative quantification (useful for expression proteomics studies, less useful if you need to know the absolute levels of a protein for a Systems Biology study).
Recently, a three step method has been proposed for determining the absolute quantities of proteins in the cell, on a proteome scale. Step one is isoelectric focussing of tryptic digests of whole cell extracts. Step two, calculating the absolute abundance of a small group of proteins by Selective Reaction Monitoring (SRM). SRM uses spike in, isotopically labelled peptides of known concentration as references to calculate the actual abundance of peptides of interest. Finally, step three uses these abundances as reference points to calculate the abundance of all proteins in the sample, using the median intensities from the 3 most intense peptides for each protein.
Using this methodology, the abundances of >50% of the proteome of a human parasite (Leptospira interrogans) have been determined to an accuracy of ~2-fold. These abundance measurements were confirmed by almost literally counting the number of flagellar proteins present in a cell by cryo-electron tomography.
Although current hardware probably limits this technique to a few thousand proteins, that is still a big step forward on what was previously possible. If whole proteome scale absolute abundance measurements become an achievable reality, maybe proteomics can finally take on microarrays as the dominant technique in the post genomics world.
![]()
Malmström, J., Beck, M., Schmidt, A., Lange, V., Deutsch, E., & Aebersold, R. (2009). Proteome-wide cellular protein concentrations of the human pathogen Leptospira interrogans Nature, 460 (7256), 762-765 DOI: 10.1038/nature08184
“Peer review does not guarantee quality”
I am still catching up on my podcast backlog after my 2 week holiday in August. The excellent ‘More or Less’ provided the gem of a quote in the title during a discussion about meta-analyses.
Professor Stephen Senn was explaining why careless mathematics can distort the results of a meta-analysis (things like including a prior meta-analysis amongst your data sets can lead to double-counting – see this paper). The presenter, Tim Harford, suggested that surely this is a problem easily fixed. A reader spots an error in a published meta-analysis, contacts the journal and a correction ensues. A suggestion that was quickly knocked back by Prof Senn. The problem, as he sees it, is that we have no culture of correction; that peer reviewed results are considered irreproachable.
Doesn’t peer review offer some guarantee of quality?, suggests Harford. “Peer review is of minimal value” is the response to this, “…checkability is what really guarantees quality”. Senn goes on to suggest that scientists sign an undertaking to provide raw original data to anyone who requests it.
This was the clearest argument I’ve heard, not against peer review, but for the availability of raw data, and for post-publication quality control on a grand scale.
This multi-eyes approach to quality checking, post-publication, is familiar from somewhere…

Charles Minard's 1869 chart showing the losses in men, their movements, and the temperature of Napoleon's 1812 Russian campaign.
The same edition of the show had a section on data visualisation, and bought the ‘Napoleon’s March’ graphic to my attention. I had not previously been aware of this ‘infographic’, produced in the mid-19th century.
From eczema to asthma (in mice)
Eczema and asthma often co-occur, indeed, I suffer from both (albeit mildly). What I wasn’t aware of was that eczema often comes first. Though eczema often precedes asthma (asthma has an underlying rate of 4-8% in the general population, but 70% in individuals with a history of chronic severe eczema), the underlying mechanism for this so called ‘atopic march’ isn’t known, though work published today in PLoS Biology elucidates a possible mechanism.
Researchers genetically engineered mice with chronic skin barrier defects (mice lacking Notch signalling in the skin, leading to impairment of epidermal differentiation), who exhibit an eczema like skin condition. They then used these mice to demonstrate the predisposition of such affected individuals to allergic asthma. Occurance of allergic asthma was 7-fold higher in the mutant mouse population, compared to a wild-type population.
The authors then went on to demonstrate that a cytokine called thymic stromal lymphopoietin (TSLP), which is secreted by the damaged skin into the circulation, is required for atopic march in the mutant mice. They show that by knocking out the TSLP receptor in these mice, they can prevent atopic march. They also show that over-production of TSLP in the skin is sufficient to cause allergic asthma, regardless of the cause of that over-production.
This is a paper a little outside my areas of expertise, which is why this is much more of a skim overview than normal. However, there is clearly good work being done here elucidating the molecular mechanisms of a very common disease process. There are also clear implications in this paper on the future management and treatment of eczema and asthma patients. Even though this is unlikely to improve my own experiences of these conditions, I’m very happy this kind of work is being done.
Demehri, S., Morimoto, M., Holtzman, M., & Kopan, R. (2009). Skin-Derived TSLP Triggers Progression from Epidermal-Barrier Defects to Asthma PLoS Biology, 7 (5) DOI: 10.1371/journal.pbio.1000067
Nature Methods
I love my free Nature Methods subscription. It allows me to get my hands on a paper journal, which I rarely get to do these days, and the content is actually pretty marvellous.
This month there’s a new technique for enzymatic assembly of DNA molecules from the Venter Institute, a standardised methodology for proteomics sample preparation, and a great technology feature from Nathan Blow about new proteomics techniques, including surface plasmon resonance (about which I knew nothing before today). Not to mention cool pictures of mice having light shone on their brains.
You can still apply for a free subscription, and if you are eligible to do so (individuals in North America and Europe involved in research within the life sciences or chemistry), I would urge you to.
IET BioSysBio 2009
Frank and Dan have already blogged about this year’s BioSysBio conference in Cambridge (23rd-25th March). I just thought I’d add my thoughts to theirs.
I don’t get to go to many conferences. The nature of my work doesn’t really demand it, but about once a year it does me good to reconnect with some cutting edge science, and get a good idea of developments in the field as a whole.
Before now, ISMB has been the conference of choice, as the largest gathering of bioinformatics types, it certainly was the obvious one. But in recent years it has become a cumbersome beast. Multi-tracked and vast, hard to pin down stuff you want to hear, often disappointing when you do find something. So this year we cast about for something smaller and fresher. We had heard good things about BioSysBio last year, and it certainly looked promising, so we made our decision.
And boy, was it the right decision. Small enough to be single track, there were very few choices to make in terms of what talks to attend (actually there were none, there was only really one parallel session, workshops on the Tuesday afternoon, and I was obliged to be at the ONDEX one, since I was helping out). This meant that instead of skipping between halls, missing bits and pieces of talks, and sometimes not bothering at all, I sat in one place, pretty much for 3 days straight, and listened to everything.
Highlights were the ethics and biosecurity debate, with a fabulously engaging talk from Drew Endy; showcases of the importance of transcription initiation and elongation from Marko Djordjevic and Andre Riberio; an excellent Synthetic Biology talk from a man apparently inspired by the iGEM competition, Philip LoCascio; and a couple of excellent videos of lab robots hard at work (Adam the Robot Scientist, and another in the final paper talk of the conference by T Ben Yehezkel).
Next year I would happily micro-blog the conference again. This was my first conference since I joined Twitter and FriendFeed, and I was unsure about how I (and my followers) would feel about really going hard at the live updating of the conference experience. I think, though, that those of us who Tweeted provided an idea of the content being presented to those who could not attend, and the feeling I got from the feedback we received, and the fact that not a single person unfollowed me in the three days, is that we were providing a useful service. It has also provided me with a useful resource, a set of notes on the event produced by a crowd, not just me. Search for #biosysbio to see what I mean. Oh, and no review of this conference would be complete without a mention of Ally’s blogging, in which she chronicled pretty much every single talk, except her own (I did that one!)
I do think that for future events I would create threads on FriendFeed for each talk, and group my thoughts about it there, then tweet the URL of the FriendFeed post – this might make things a little less noisy.
Coming back from a conference feeling exactly how you should feel, refreshed, invigorated and excited to get on with your own work, is a great thing. For this feeling alone I will be returning to BioSysBio next year.
Saint: A lightweight SBML annotation integration environment
Allyson Lister
CISBAN, Newcastle University
This post is an homage to Ally’s own herculean note taking style, since she can’t blog her own talk.
Saint has been developed to help modellers get information into their SBML models really quickly. Ally shows a picture of a model describing neuromuscular junctions (standard biomodel). This model contains terms which are descriptions, and the mathematical model. The maths doesn’t know anything about the underlying biology. For example, actin is just a label, there is no implicit knowledge contained in that label (ie actin is a protein, invoved in the cytoskeleton etc).
Short intro to SBML: SBML is a standard format, which is widely used. it stores the maths and enables linking to the underlying biology.
So what do we know about actin -
- its a protein (UniProt)
- interactions? (Pathway Commons, STRING)
- reactions and parameters (SABIO-RK, BRENDA, KEGG)
- vocab (SBO, GO)
Now we can use the MIRIAM standard to annotate the model with the above information.
When building a model, you need to add info to things like species, name, reaction, compartment
Annotation and SBO term sit between the model and the biology information – these can be used to retrieve the information from the databases. This has to be done manually currently, this is hard and is often not done exhausively, or even at all.
Saint enables automation of this procedure. It already links to a number of data sources – MIRIAM, UniProt, STRING, SBO, Pathway Commons. Reduces effort on the part of the modeller. Saint is lightweight and easy-to-use. Useful as a first pass annotation tool, or to add annotation to an existing model.
How Saint works:
- import SBML into Saint
- Saint then searches for appropriate annotation
- and presents this annotation, and allows to to accept or reject the changes
Ally is using a model produced by Carole Proctor in CISBAN as an example run-through of Saint.
Saint does some validation via libsbml on import of an SBML model. The tool then presents a list of species found in the model, these can be hidden if you don’t want to retrieve information on them. Zoom into ‘Ctelo’ for an example – a plus next to the name of the species shows the annotation already available in the SBML model (‘known’ information). So we can se that Ctelo is a Capped Telomere. You can decide which species you want to annotate, and which datasources you want to retrieve that annotation from.
Queries are made from datasources by a Master Asynchronous Query Service – once information becomes available, it is immediately visible in the UI (as an ‘inferred’ tab), and you see a ‘New Annotation found’ message. Once Saint has retrieved annotation, the user can choose which annotation he wants to keep, and how this information links to the species in the model (is, part etc – MIRIAM terms)
CDC13 = polypeptide chain, nuclear telomere cap complex, protein binding, single-stranded telomeric DNA binding, telomerase inhibitor activity.
Future work – more data sources, use of species type, better support for non-systematic names, adding software source attribution, incorporation of SBGN (Systems Biology Graphical Notation) for better display.
Personal comments – good job Ally – hope I did it justice!
http://friendfeed.com/rooms/biosysbio
http://conferences.theiet.org/biosysbio
Ally’s standard disclaimer:
Please note that this post is merely my notes on the presentation. They are not guaranteed to be correct, and unless explicitly stated are not my opinions. They do not reflect the opinions of my employers. Any errors you can happily assume to be mine and no-one else’s. I’m happy to correct any errors you may spot – just let me know!
Save the Scientist, Save the World?
http://www.youtube.com/watch?v=7iPaiylUYW0
Gordon Brown has already saved the world once, but it didn’t take. So the world needs another solution. i humbly suggest that the way to help save the economy, of Britain at least, is to invest heavily in Science and Technology. In the following I try to justify this as anything other than pure selfishness.
Science is very often one of the first casualties of government spending in a recession. This is because it is seen as a luxury, a good-time frippery that is difficult to justify when times are hard. The reverse should be true, science and technology investment are not disposable because they are the generators of future income, the basis of a future successful economy.
The economy of this country has, for a long time now, been based in the service sector. This keeps people employed, which drives the economy because employed people buy things. But we no longer produce anything of note, we don’t generate significant external input into our economy – except through the financial sector… and I think everyone knows what happened there by now. We reached a point where confidence amongst those employed in the service sector collapsed, so they stopped buying things, this means that the service sector looked to the financial sector for support, but the financial sector, to all intents and purposes, no longer existed, so the service sector began collapsing upon itself. This is self-perpetuating, it leads to job losses, which leads to less buying of things, which leads to further job losses… and so on. (I realise this is a gross simplification of the real situation, and not 100% accurate, but it is pretty close to the real thing, and makes my point).
The government has declared its intention to follow a Keynesian approach and spend its way out of recession, taking upon itself the responsibility of injecting the cash that the economy needs to rebuild itself. This is a well recognised approach, and has merit, the new investment has to come from somewhere, and no institution has the borrowing power of the government. However, we (as a nation) must be able to recover this investment at some future point. This means we have to create wealth that is not already in the system. We have to make something that the rest of the world wants to buy.
So, invest in science, engineering and technology. Reverse the decline in these disciplines, the unpopularity of Maths and Physics in the classroom, the hemorrhage of the talent we do have overseas. Make the product the rest of the world buy into our innovation. Funding research keeps the current generation of innovators employed (the selfish bit), and creates new opportunities for the next generation. And not just for those lucky enough to have the education to pursue this route. Infrastructure is needed to surround research. Newcastle University is one of the largest employers in the North East.
At this point I am clearly in danger of getting carried away, so it’s probably best to wrap up. Since I started writing this particular perma-draft, many things have happened. Gordon Brown spoke in congress, about the need to ‘educate our way out of the downturn, invest and invent our way out of the downturn and re-tool and re-skill our way out of the downturn.‘ The US stimulus package has promised vast investment in science and technology. And just today President Obama unfroze research into Stem Cells in the US. All of these are obviously good things, let’s hope the momentum can be maintained, and the doom merchants don’t win.
Fixing Proteomics
I’ve only just discovered the Fixing Proteomics Campaign, thanks to a post on FriendFeed. It’s an initiative that I probably should have known about before, since it appears to originate, at least partly, from Nonlinear Dynamics, a Newcastle based proteomics informatics company. The campaign is also dedicated to a message I have been trying to spread among the researchers I interact with during my work: experiments must be robustly designed, and an unreproducible experimental result is meaningless.
The website for the campaign contains some useful resources for spreading this message, most effective are the analogies that illustrate the most common experimental design techniques, and the 4-step guide for Fixing Proteomics (the subject of the FF link, above). I have used something akin to the analogies in lectures I have given about experimental design (indeed I have used the apocryphal ‘Fahrenheit and the Cow’ story itself), and I will certainly be using the 4-steps in the future, and referencing the Fixing Proteomics website too.
Just one note: as Frank points out in the FriendFeed thread, the PSI could be highlighted a little more. Proteomics experiments would not be reproducible at all, particularly cross-site, without the efforts made by the standards community. As AnalysisXML enters its public comment phase, it is worth remembering the contribution they have made to opening up data formats and making data and metadata available in a non-proprietry way.
Blog for Darwin
This post forms part of the ‘Blog for Darwin’ blog carnival.
I wasn’t going to write this post. I am very much of the opinion that holding up one man as a figurehead for an entire science is a mistake, and sets up too many straw-man arguments for detractors to propound (of the nature of: x was mistaken, so his theory y must also be wrong). Darwin lived in the 19th century, limited by the 19th century’s knowledge of science. A period where ‘Biology’ as a science didn’t really exist. Of course he was wrong about some stuff, and by equating Evolution with Darwinism, we give the denialists a stick with which to beat us (and this also leads to misleading and pernicious headlines like that in the New Scientist a couple of weeks ago). I ‘believe’ in the theory of gravity (as supported by the weight (ho ho) of evidence), that doesn’t make me a Newtonist.
There is no doubting that evolution is more than just Darwin, and that the Darwinian view of evolution probably doesn’t totally hold water any more, but that is hardly a surprise. It is 150 years old (in its published form). So, much as I admire his achievements, I wasn’t totally behind the idea of ‘Darwin Day’. Grist to the mill of creationists who see Darwin as the sole pedestal for the Theory of Evolution.
But then you see the amount of pseudoscience that persists in the mainstream media, and results of surveys like this one, which suggests that around 10% of Britons believe the earth was created by a supernatural being sometime in the last 10,000 years, and you think: ‘Why should I be churlish about something which is basically pro-science, and is getting a shed-load of high quality, high profile coverage?’. So, yes, if I can increase the positive noise surrounding February 12th 2009, I will. I will shout about Darwin from the rooftops if it gets something close to actual science in the news pages for a change.
For the rest of this year, this is where the battle will be fought. The hearts and minds of the anti-science luddites must be won over by the elegance and wonder of a beautiful theory, arrived at by a brilliant man who spent many years of his life in painstaking examination of the many glorious wonders of the natural world, and slowly formulating a way in which they were all connected. He truly changed our understanding of the world. Let us celebrate that fact.
Just don’t call me a Darwinist.
Gold Standard not so shiny?
This paper caused a bit of a stir when it was published last week. The suggestion that highly curated ‘gold standard’ databases may not be as high quality as has been assumed had august figures such as Henning Hermjakob up in arms and countering as swiftly as humanly possible.
Protein-protein interactions, at the ‘interactome’ scale, are determined in two major ways: (i) high-throughput experimental studies, such as yeast-2 hybrid and TAP assays, and (ii) curating the literature to gather together many interactions found in low-throughput experiments. Neither of these approaches is capable of fully illuminating the complete interactome of any organism, and so the aim of this paper by Michael Cusick and co-workers is to examine which of the two approaches produces the most reliable results.
With high throughput experiments, the number of interactions tested vs number found is known. This is not the case for curated sets. Negatives are underreported, so a full picture of the experimental background is unclear.
Literature curated sets tend to be used for appraisal of reliability of experimental sets. They make up the gold standard positives (GSP) with which high throughput PPIs are scored. This high reliability of curated data has largely been assumed, not tested.
The study examines the superficial reliability of curated datasets, before examining and reappraising specific interaction sets. Only 25% of yeast PPIs in BioGRID are supported by more than one publication, this number is comparable for humans (15%) and Arabidopsis (lower still, only 7%). Single publication supported interactions are naturally of lower reliability than those found more than once experimentally. The authors suggest that it is assumed that even the single publication interactions come from small studies (I’m not sure that this is a valid assumption, most interactions in the literature come from high throughput datasets, I’m not convinced that someone might think that these would be under-represented in curated datasets). Large proportions of single-publication interactions do come from high-throughput studies (not well validated small-scale studies), almost 25% of yeast interactions in BioGRID come from 1% of publications detailing > 100 interactions.
MINT, IntACT and DIP do not overlap well (this is not enumerated in the text, but is in the figures). This is due to use of different manuscripts, not differential interpretation of the same corpus. This does imply that coverage of the literature is poor, but not that curation is unreliable. Different databases have different starting points in the literature, and there is a lot of it out there.
The next step for this study was to recurate ‘representative samples’ from the 3 organisms already mentioned. 35% of 100 yeast interactions were ‘incorrectly curated’, based on the criteria set out in the methods.
For humans, they chose a high confidence, multiply curated, multiply databased interactions, of which 38% of the ‘curation units’ were found to be ‘wrong’. However, these 38% correspond to only 8.5% of the ‘interaction units’, in other words, 91.5% of these 188 interactions are still supported by at least one publication.
Only 6% of the less-well studied Arabidopsis representative set were called ‘incorrect’ in the recuration.
There is no denying that, on the face of it, these are disturbing numbers. However, they are presented in the paper as much more alarming than they perhaps really are. With respect to the human reappraisal, only the 38% figure is mentioned in the main body of the text. The fact is, these unsupported annotations undermine less than 10% of the dataset.
The authors suggest the difficulty of curation is underestimated – not by those recruiting curators it isn’t. I would question what makes this ‘recuration’ more reliable than the original curation by experienced and practiced hands? Furthermore, why do the curation methods differ for different organisms, surely the questions posed in the yeast curation are valid for the human dataset?
They do point out the obscurity of the literature – universal identifiers are lacking, and it can be difficult to even determine the species a given sequence originates from, let alone the specific protein being discussed.
In the light of this, they make the very sensible suggestion that MIMIx is a good thing.
It should be noted also that, as mentioned, Henning Hermjakob sprang to the defence of curated databases, suggesting that the majority of Arabidopsis interactions reported as incorrect, are in fact, accurate. He also suggested that it was hard to debate the veracity of the claims made about yeast and humans, as there is not a direct citation for each interaction to follow up (GenomeWeb).
I think there are some valid points made about the difficulties of curation, and that not all annotations should necessarily be taken at face value, but I also don’t believe the findings of this study are as alarming as they are portrayed. It is not time to ditch all of those gold standard datasets just yet. In general I would still suggest that curated datasets are more reliable than high throughput sets.
Michael E Cusick, Haiyuan Yu, Alex Smolyar, Kavitha Venkatesan, Anne-Ruxandra Carvunis, Nicolas Simonis, Jean-François Rual, Heather Borick, Pascal Braun, Matija Dreze, Jean Vandenhaute, Mary Galli, Junshi Yazaki, David E Hill, Joseph R Ecker, Frederick P Roth, Marc Vidal (2009). Literature-curated protein interaction datasets Nature Methods, 6 (1), 39-46 DOI: 10.1038/nmeth.1284
Search
Twitter Updates
- great set of articles from Nature Methods on visualizing biological data: http://is.gd/9WMCm 1 day ago
- $60?? For a single article?? You've got to be freaking kidding me?? 4 days ago
- Extremely frustrating day. Meetings, teaching and PowerPoint hell. 6 days ago
- today: writing talk for #IB2010 1 week ago
- @attilacsordas O2 here, no complaints. Been w orange in the past w other phones, coverage not great & service poor in reply to attilacsordas 1 week ago
- More updates...
Posting tweet...
Powered by Twitter Tools
Blogroll
On Fuzzier Logic
ResearchBlogging.org
Blog Stats
Visits today: 5Total Visits: 5777













