TY - GEN
T1 - Comparison of two ASCII art extraction methods
T2 - 2015 IASTED International Conference on Computational Intelligence, CI 2015
AU - Suzuki, Tetsuya
PY - 2015
Y1 - 2015
N2 - Text based pictures called ASCII art are often used in Web pages, email text and so on. They enrich expression in text data, but they can be noise for natural language processing and large ASCII arts are deformed in small display devices. We can ignore ASCII arts in text data or replace them with other strings by ASCII art extraction methods, which detect areas of ASCII arts in a given text data. Our research group and another research group independently proposed two different ASCII art extraction methods, which are a run-length encoding based method and a byte pattern based method respectively. Both of the methods use text classifiers constructed by machine learning algorithms, but they use different attributes of text. In this paper, we compare the two methods by ASCII art extraction experiments where training text and testing text are in English and Japanese. Our experimental results show that the two methods are competitive if training text and testing text are in a same set of languages, but the run-length encoding based method works better than the byte pattern based method if training text and testing text are in different sets of languages.
AB - Text based pictures called ASCII art are often used in Web pages, email text and so on. They enrich expression in text data, but they can be noise for natural language processing and large ASCII arts are deformed in small display devices. We can ignore ASCII arts in text data or replace them with other strings by ASCII art extraction methods, which detect areas of ASCII arts in a given text data. Our research group and another research group independently proposed two different ASCII art extraction methods, which are a run-length encoding based method and a byte pattern based method respectively. Both of the methods use text classifiers constructed by machine learning algorithms, but they use different attributes of text. In this paper, we compare the two methods by ASCII art extraction experiments where training text and testing text are in English and Japanese. Our experimental results show that the two methods are competitive if training text and testing text are in a same set of languages, but the run-length encoding based method works better than the byte pattern based method if training text and testing text are in different sets of languages.
KW - ASCII art
KW - Information extraction
KW - Natural language processing
KW - Pattern recognition
UR - http://www.scopus.com/inward/record.url?scp=85015616989&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85015616989&partnerID=8YFLogxK
U2 - 10.2316/P.2015.827-026
DO - 10.2316/P.2015.827-026
M3 - Conference contribution
AN - SCOPUS:85015616989
T3 - Proceedings of the IASTED International Conference on Computational Intelligence, CI 2015
SP - 269
EP - 276
BT - Proceedings of the IASTED International Conference on Computational Intelligence, CI 2015
PB - Acta Press
Y2 - 16 February 2015 through 17 February 2015
ER -