Practical Algorithms for Pattern Based Linear Regression

Bannai, Hideo; Hatano, Kohei; Inenaga, Shunsuke; Takeda, Masayuki

doi:10.1007/11563983_6

Hideo Bannai²¹,
Kohei Hatano²¹,
Shunsuke Inenaga^21,22 &
…
Masayuki Takeda^21,23

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3735))

Included in the following conference series:

International Conference on Discovery Science

721 Accesses
1 Citations

Abstract

We consider the problem of discovering the optimal pattern from a set of strings and associated numeric attribute values. The goodness of a pattern is measured by the correlation between the number of occurrences of the pattern in each string, and the numeric attribute value assigned to the string. We present two algorithms based on suffix trees, that can find the optimal substring pattern in O(Nn) and O(N ²) time, respectively, where n is the number of strings and N is their total length. We further present a general branch and bound strategy that can be used when considering more complex pattern classes. We also show that combining the O(N ²) algorithm and the branch and bound heuristic increases the efficiency of the algorithm considerably.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A New Approach to String Pattern Mining with Approximate Match

Introduction to Pattern Mining

Pattern Matching with Variables: A Multivariate Complexity Analysis

References

Brazma, A., Jonassen, I., Eidhammer, I., Gilbert, D.: Approaches to the automatic discovery of patterns in biosequences. J. Comput. Biol. 5, 279–305 (1998)
Article Google Scholar
Hirao, M., Hoshino, H., Shinohara, A., Takeda, M., Arikawa, S.: A practical algorithm to find the best subsequence patterns. Theoretical Computer Science 292, 465–479 (2002)
Article MathSciNet Google Scholar
Shinohara, A., Takeda, M., Arikawa, S., Hirao, M., Hoshino, H., Inenaga, S.: Finding best patterns practically. In: Arikawa, S., Shinohara, A. (eds.) Progress in Discovery Science. LNCS (LNAI), vol. 2281, pp. 307–317. Springer, Heidelberg (2002)
Chapter Google Scholar
Takeda, M., Inenaga, S., Bannai, H., Shinohara, A., Arikawa, S.: Discovering most classificatory patterns for very expressive pattern classes. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 486–493. Springer, Heidelberg (2003)
Chapter Google Scholar
Hirao, M., Inenaga, S., Shinohara, A., Takeda, M., Arikawa, S.: A practical algorithm to find the best episode patterns. In: Jantke, K.P., Shinohara, A. (eds.) DS 2001. LNCS (LNAI), vol. 2226, pp. 435–440. Springer, Heidelberg (2001)
Chapter Google Scholar
Inenaga, S., Bannai, H., Shinohara, A., Takeda, M., Arikawa, S.: Discovering best variable-length-don’t-care patterns. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS (LNAI), vol. 2534, pp. 86–97. Springer, Heidelberg (2002)
Chapter Google Scholar
Bussemaker, H.J., Li, H., Siggia, E.D.: Regulatory element detection using correlation with expression. Nature Genetics 27, 167–171 (2001)
Article Google Scholar
Bannai, H., Inenaga, S., Shinohara, A., Takeda, M., Miyano, S.: A string pattern regression algorithm and its application to pattern discovery in long introns. Genome Informatics 13, 3–11 (2002)
Google Scholar
Bannai, H., Inenaga, S., Shinohara, A., Takeda, M., Miyano, S.: Efficiently finding regulatory elements using correlation with gene expression. Journal of Bioinformatics and Computational Biology 2, 273–288 (2004)
Article Google Scholar
Zilberstein, C.B.Z., Eskin, E., Yakhini, Z.: Using expression data to discover RNA and DNA regulatory sequence motifs. In: The First Annual RECOMB Satellite Workshop on Regulatory Genomics (2004)
Google Scholar
Bannai, H., Hyyrö, H., Shinohara, A., Takeda, M., Nakai, K., Miyano, S.: An O(N²) algorithm for discovering optimal Boolean pattern pairs. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1, 159–170 (special issue for selected papers of WABI 2004)
Google Scholar
Hui, L.: Color set size problem with applications to string matching. In: Apostolico, A., Galil, Z., Manber, U., Crochemore, M. (eds.) CPM 1992. LNCS, vol. 644, pp. 230–243. Springer, Heidelberg (1992)
Google Scholar
Miyano, S., Shinohara, A., Shinohara, T.: Which classes of elementary formal systems are polynomial-time learnable? In: Proceedings of the 2nd Workshop on Algorithmic Learning Theory, pp. 139–150 (1991)
Google Scholar
Miyano, S., Shinohara, A., Shinohara, T.: Polynomial-time learning of elementary formal systems. New Generation Computing 18, 217–242 (2000)
Article Google Scholar
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge (1997)
Book MATH Google Scholar
Knuth, D.E., Morris, J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM Journal on Computing 6, 323–350 (1977)
Article MATH MathSciNet Google Scholar
Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)
Chapter Google Scholar
Kasai, T., Arimura, H., Arikawa, S.: Efficient substring traversal with suffix arrays. Technical Report 185, Department of Informatics, Kyushu University (2001)
Google Scholar
Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., Futcher, B.: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297 (1998)
Google Scholar
Shinozaki, D., Akutsu, T., Maruyama, O.: Finding optimal degenerate patterns in DNA sequences. Bioinformatics 19, ii206–ii214 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, Kyushu University, 6-10-1 Hakozaki, Higashi-ku, Fukuoka, 812-8581, Japan
Hideo Bannai, Kohei Hatano, Shunsuke Inenaga & Masayuki Takeda
Japan Society for the Promotion of Science,
Shunsuke Inenaga
SORST, Japan Science and Technology Agency (JST),
Masayuki Takeda

Authors

Hideo Bannai
View author publications
You can also search for this author in PubMed Google Scholar
Kohei Hatano
View author publications
You can also search for this author in PubMed Google Scholar
Shunsuke Inenaga
View author publications
You can also search for this author in PubMed Google Scholar
Masayuki Takeda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science & Engineering, The University of New South Wales, Sydney, Australia
Achim Hoffmann
Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, 567-0047, Ibaraki, Osaka, Japan
Hiroshi Motoda
Max Planck Institute for Computer Science, Saarbrücken, Germany
Tobias Scheffer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bannai, H., Hatano, K., Inenaga, S., Takeda, M. (2005). Practical Algorithms for Pattern Based Linear Regression. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds) Discovery Science. DS 2005. Lecture Notes in Computer Science(), vol 3735. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563983_6

Download citation

DOI: https://doi.org/10.1007/11563983_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29230-2
Online ISBN: 978-3-540-31698-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Practical Algorithms for Pattern Based Linear Regression

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A New Approach to String Pattern Mining with Approximate Match

Introduction to Pattern Mining

Pattern Matching with Variables: A Multivariate Complexity Analysis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Practical Algorithms for Pattern Based Linear Regression

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A New Approach to String Pattern Mining with Approximate Match

Introduction to Pattern Mining

Pattern Matching with Variables: A Multivariate Complexity Analysis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation