Solving the Longest Common Subsequence Problem Concerning Non-Uniform Distributions of Letters in Input Strings

Nikolic, Bojan; Kartelj, Aleksandar; Djukanovic, Marko; Grbic, Milana; Raidl, Günther

Por favor, use este identificador para citar o enlazar a este item: http://hdl.handle.net/10261/253055

COMPARTIR / EXPORTAR:

SHARE CORE BASE	Comparte tu historia de Acceso Abierto
Visualizar otros formatos: MARC \| Dublin Core \| RDF \| ORE \| MODS \| METS \| DIDL \| DATACITE
Refman EndNote Bibtex RefWorks Excel CSV PDF DataCite Send via email

Título:	Solving the Longest Common Subsequence Problem Concerning Non-Uniform Distributions of Letters in Input Strings
Autor:	Nikolic, Bojan; Kartelj, Aleksandar; Djukanovic, Marko; Grbic, Milana; Raidl, Günther
Palabras clave:	Longest common subsequences Multi-nomial distribution Probability-based search guidance
Fecha de publicación:	2021
Editor:	Multidisciplinary Digital Publishing Institute
Citación:	Mathematics 9: 1515 (2021)
Resumen:	The longest common subsequence (LCS) problem is a prominent NP–hard optimization problem where, given an arbitrary set of input strings, the aim is to find a longest subsequence, which is common to all input strings. This problem has a variety of applications in bioinformatics, molecular biology and file plagiarism checking, among others. All previous approaches from the literature are dedicated to solving LCS instances sampled from uniform or near-to-uniform probability distributions of letters in the input strings. In this paper, we introduce an approach that is able to effectively deal with more general cases, where the occurrence of letters in the input strings follows a non-uniform distribution such as a multinomial distribution. The proposed approach makes use of a time-restricted beam search, guided by a novel heuristic named Gmpsum. This heuristic combines two complementary scoring functions in the form of a convex combination. Furthermore, apart from the close-to-uniform benchmark sets from the related literature, we introduce three new benchmark sets that differ in terms of their statistical properties. One of these sets concerns a case study in the context of text analysis. We provide a comprehensive empirical evaluation in two distinctive settings: (1) short-time execution with fixed beam size in order to evaluate the guidance abilities of the compared search heuristics; and (2) long-time executions with fixed target duration times in order to obtain high-quality solutions. In both settings, the newly proposed approach performs comparably to state-of-the-art techniques in the context of close-to-uniform instances and outperforms state-of-the-art approaches for non-uniform instances.
Versión del editor:	http://dx.doi.org/10.3390/math9131515
URI:	http://hdl.handle.net/10261/253055
DOI:	10.3390/math9131515
Identificadores:	doi: 10.3390/math9131515 issn: 2227-7390
Aparece en las colecciones:	(IIIA) Artículos

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
mathematics-09-01515.pdf		539,65 kB	Adobe PDF	Visualizar/Abrir

Mostrar el registro completo

CORE Recommender

SCOPUS^TM
Citations

2

checked on 21-abr-2024

WEB OF SCIENCE^TM
Citations

1

checked on 12-feb-2024

Page view(s)

35

checked on 22-abr-2024

Download(s)

42

checked on 22-abr-2024

Google Scholar^TM

Check

Solving the Longest Common Subsequence Problem Concerning Non-Uniform Distributions of Letters in Input Strings

Ficheros en este ítem:

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Page view(s)

Download(s)

Google Scholar^TM

Altmetric

Altmetric

Solving the Longest Common Subsequence Problem Concerning Non-Uniform Distributions of Letters in Input Strings

Ficheros en este ítem:

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Page view(s)

Download(s)

Google ScholarTM

Altmetric

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM