Type

Journal Article

Authors

Carlos Pinto
Aiden Corvin
Elizabeth Heron
Ricardo Segurado
Derek Morris
Colm O'Dushlaine
Michael Gill
Michael Bridges

Subjects

Psychiatry

Topics
genetics support vector machines genes society population genetics genetic differences supervised learning sample covariance matrices case control studies genome wide association studies quality control machine learning principal components analysis supervised classification neuroscience

Genetic Classification of Populations using Supervised Learning. (2011)

Abstract There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case-control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed unsupervised. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available. In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into predefined populations, particularly in quality control for large scale genome wide association studies.
Collections Ireland -> Trinity College Dublin -> RSS Feeds
Ireland -> Trinity College Dublin -> Psychiatry (Scholarly Publications)
Ireland -> University College Dublin -> School of Public Health, Physiotherapy and Sports Science
Ireland -> University College Dublin -> College of Health and Agricultural Sciences
Ireland -> Trinity College Dublin -> Psychiatry
Ireland -> Trinity College Dublin -> RSS Feeds
Ireland -> University College Dublin -> Public Health, Physiotherapy and Sports Science Research Collection
Ireland -> Trinity College Dublin -> School of Medicine

Full list of authors on original publication

Carlos Pinto, Aiden Corvin, Elizabeth Heron, Ricardo Segurado, Derek Morris, Colm O'Dushlaine, Michael Gill, Michael Bridges

Experts in our system

1
Aiden Corvin
Trinity College Dublin
Total Publications: 190
 
2
Elizabeth Heron
Trinity College Dublin
Total Publications: 14
 
3
Ricardo Segurado
University College Dublin
Total Publications: 58
 
4
Derek Morris
Trinity College Dublin
Total Publications: 150
 
5
Colm T O'Dushlaine
Trinity College Dublin
Total Publications: 28
 
6
Michael Gill
Trinity College Dublin
Total Publications: 260