On the Class Distribution Labelling Step Sensitivity of CO-TRAINING

Matsubara, Edson T.; Monard, Maria C.; Prati, Ronaldo C.

doi:10.1007/978-0-387-34747-9_21

Edson T. Matsubara²,
Maria C. Monard² &
Ronaldo C. Prati²

Part of the book series: IFIP International Federation for Information Processing ((IFIPAICT,volume 217))

Included in the following conference series:

IFIP International Conference on Artificial Intelligence in Theory and Practice

1552 Accesses

Abstract

CO-TRAINING can learn from datasets having a small number of labelled examples and a large number of unlabelled ones. It is an iterative algorithm where examples labelled in previous iterations are used to improve the classification of examples from the unlabelled set. However, as the number of initial labelled examples is often small we do not have reliable estimates regarding the underlying population which generated the data. In this work we make the claim that the proportion in which examples are labelled is a key parameter to CO-TRAINING. Furthermore, we have done a series of experiments to investigate how the proportion in which we label examples in each step influences CO-TRAINING performance. Results show that CO-TRAINING should be used with care in challenging domains.

Download to read the full chapter text

Chapter PDF

Adapted Features and Instance Selection for Improving Co-training

A Distance-Weighted Selection of Unlabelled Instances for Self-training and Co-training Semi-supervised Methods

Towards making co-training suffer less from insufficient views

Article 30 August 2018

References

Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proc. 11th Annu. Conf. on Comput. Learning Theory, ACM Press, New York, NY (1998) 92–100
Google Scholar
Muslea, I.: Active Learning with Multiple Views (2002) PhD Thesis, University Southern California.
Google Scholar
Zhu, X.: Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison (2005) http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf.
Google Scholar
Vapnik, V.: Statistical learning theory. John Wiley & Sons (1998)
Google Scholar
Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Conference on Information and Knowledge Management. (2000) 86–93
Google Scholar
Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: Proc. of the 18th Int. Conf. on Machine Learning. (2001) 577–584
Google Scholar
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.M.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39 (2000) 103–134
Article MATH Google Scholar
Kiritchenko, S., Matwin, S.: Email classification with co-training. Technical report, University of Otawa (2002)
Google Scholar
Brefeld, U., Scheffer, T.: Co-EM Support Vector Learning. In: Proc. of the Int. Conf. on Machine Learning, Morgan Kaufmann (2004) 16
Google Scholar
Zhou, Z.H., Li, M.: Tri-training: Exploiting unlabeled data using three classifiers. In: IEEE Transactions on Knowledge and Data Engineering. Volume 17. (2005) 1529–1541
Article Google Scholar
Weiss, G.M., Provost, F.J.: Learning when training data are costly: The effect of class distribution on tree induction. J. Artif. Intell. Res. (JAIR) 19 (2003) 315–354
MATH Google Scholar
Zadrozny, B.: Learning and evaluating classifiers under sample selection bias. In Brodley, C.E., ed.: Proc of the 21st Int. Conf. on Machine Learning (ICML 2004), ACM (2004) 114–121
Google Scholar
Fan, W., Davidson, I., Zadrozny, B., Yu, P.S.: An improved categorization of classifier’s sensitivity on sample selection bias. In: Proc of the 5th IEEE Int. Conf. on Data Mining (ICDM 2005), IEEE Computer Society (2005) 605–608
Google Scholar
Blake, C., Merz, C.: UCI Repository of Machine Learning Databases (1998) http://www.ics.uci.edu/~mlearn/MLRepository.html.
Google Scholar
Melo, V., Secato, M., Lopes, A.A.: Automatic extraction and identification of bibliographical information from scientific articles (in Portuguese). In: IV Workshop on Advances and Trend in AI, Chile (2003) 1–10
Google Scholar
Matsubara, E.T., Martins, C.A., Monard, M.C.: Pretext: A pre-processing text tool using the bag-of-words approach. Technical Re-port 209, ICMC-USP (2003) (in Portuguese) ftp://ftp.icmc.sc.usp.br/pub/BIBLI0TECA/rel_tec/RT_209.zip.
Google Scholar
Matsubara, E.T., Monard, M.C., Batista, G.E.A.P.A.: Multi-view semi-supervised learning: An approach to obtain different views from text datasets. In: Advances in Logic Based Intelligent Systems. Volume 132., IOS Press (2005) 97–104
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science ICMC/USP - São Carlos, Laboratory of Computational Intelligence - LABIC, P.O. Box 668, 13560-970, São Carlos, SP, Brazil
Edson T. Matsubara, Maria C. Monard & Ronaldo C. Prati

Authors

Edson T. Matsubara
View author publications
You can also search for this author in PubMed Google Scholar
Maria C. Monard
View author publications
You can also search for this author in PubMed Google Scholar
Ronaldo C. Prati
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Portsmouth, UK
Max Bramer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Matsubara, E.T., Monard, M.C., Prati, R.C. (2006). On the Class Distribution Labelling Step Sensitivity of CO-TRAINING . In: Bramer, M. (eds) Artificial Intelligence in Theory and Practice. IFIP AI 2006. IFIP International Federation for Information Processing, vol 217. Springer, Boston, MA . https://doi.org/10.1007/978-0-387-34747-9_21

Download citation

DOI: https://doi.org/10.1007/978-0-387-34747-9_21
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-34654-0
Online ISBN: 978-0-387-34747-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On the Class Distribution Labelling Step Sensitivity of CO-TRAINING

Abstract

Chapter PDF

Similar content being viewed by others

Adapted Features and Instance Selection for Improving Co-training

A Distance-Weighted Selection of Unlabelled Instances for Self-training and Co-training Semi-supervised Methods

Towards making co-training suffer less from insufficient views

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

On the Class Distribution Labelling Step Sensitivity of CO-TRAINING

Abstract

Chapter PDF

Similar content being viewed by others

Adapted Features and Instance Selection for Improving Co-training

A Distance-Weighted Selection of Unlabelled Instances for Self-training and Co-training Semi-supervised Methods

Towards making co-training suffer less from insufficient views

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation