DOI:​10.1007/978-3-642-14770-8_33
Corpus ID: 29431480
TectoMT: Modular NLP Framework
Published in IceTAL 2010
Computer Science
In the present paper we describe TectoMT, a multi-purpose open-source NLP framework. It allows for fast and efficient development of NLP applications by exploiting a wide range of software modules… Expand
View Via Publisher
Ufal.Mff.Cuni.Cz
Share This Paper
135 Citations
Highly Influential Citations
20
Background Citations
28
Methods Citations
69
Figures, Tables, and Topics from this paper
Figure 1
Table 1
Figure 2
Figure 3
Natural language generation
Natural language processing
Anaphora (linguistics)
Machine translation
Named-entity recognition
Parallel text
Language-independent specification
Multi-Purpose Viewer
Sentence boundary disambiguation
Open-source software
Tokenization (data security)
Parsing
Text corpus
Part-of-speech tagging
135 Citations
Treex - an open-source framework for natural language processing
Z. Žabokrtský
Computer Science
ITAT

2011
TLDR
Treex facilitates the development by exploiting a wide range of software modules already integrated in Treex, such as tools for sentence segmentation, tokenization, morphological analysis, part-of-speech tagging, shallow and deep syntax parsing, named entity recognition, anaphora resolution, sentence synthesis, word-level alignment of parallel corpora, and other tasks.Expand
16 Citations
PDF
Adding syntactic structure to bilingual terminology for improved domain adaptation
Mikel Artetxe, Gorka Labaka, +4 authors Eneko Agirre
Computer Science
DMTW

2016
TLDR
This work enrich source and target multiword terms with syntactic structure, and seamlessly integrate them in the tree-based transfer phase of TectoMT, an open source framework for transfer-based MT which works at the deep tectogrammatical level and combines linguistic knowledge and statistical techniques. Expand
Deep-syntax TectoMT for English-Spanish MT
Gorka Labaka, Oneka Jauregi, M. Ustaszewski, Nora Aranberri, Eneko Agirre
2015
Deep-syntax approaches to machine translation have emerged as an alternative to phrase-based statistical systems, which seem to lack the capacity to address essential linguistic phenomena for… Expand
PDF
New Language Pairs in TectoMT
Ondrej Dusek, Luís Manuel dos Santos Gomes, M. Novák, M. Popel, Rudolf Rosa
Computer Science
WMT@EMNLP

2015
TLDR
This work submitted translations by the Englishto-Czech and Czech-to-English TectoMT pipelines to the WMT shared task and included a simple, non-parametric way of combining TECToMT’s transfer model outputs. Expand
17 Citations
PDF
2nd Deep Machine Translation Workshop Program Committee Adding Syntactic Structure to Bilingual Terminology for Improved Domain Adaptation . . . . . . 39
Gertjan van Noord, P. Osenova
2016
Moses is a well-known representative of the phrase-based statistical machine translation systems family, which are known to be extremely poor in explicit linguistic knowledge, operating on flat… Expand
HamleDT: Harmonized multi-language dependency treebank
Daniel Zeman, Ondrej Dusek, +5 authors Jan Hajic
Computer Science
Lang. Resour. Evaluation
2014
TLDR
It is claimed that transformation procedures can be designed to automatically identify most such phenomena and convert them to a unified annotation style, which is beneficial both to comparative corpus linguistics and to machine learning of syntactic parsing. Expand
54 Citations
PDF
Bilingual English-Czech Valency Lexicon Linked to a Parallel Corpus
Zdenka Uresová, Ondrej Dusek, Eva Fucíková, Jan Hajic, J. Sindlerová
Computer ScienceLAW@NAACL-HLT
2015
TLDR
This paper presents a resource and the associated annotation process used in a project of interlinking Czech and English verbal translational equivalents based on a parallel, richly annotated dependency treebank, namely the Prague Czech-English Dependency Treebank. Expand
12 Citations
PDF
Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation
N. Green
Computer Science
ACL

2011
TLDR
This paper examines this noun phrase structure's effect on dependency parsing, in English, with a maximum spanning tree parser and shows a 2.43%, 0.23 Bleu score, improvement for English to Czech machine translation. Expand
6 Citations
PDF
Czech Legal Text Treebank 1.0
Vincent Kríz, B. Hladká, Zdenka Uresová
Computer Science
LREC

2016
TLDR
The Czech Legal Text Treebank 1.0 is a morphologically and syntactically annotated corpus of 1,128 sentences that contains texts from the legal domain, namely the documents from the Collection of Laws of the Czech Republic. Expand
4 Citations
PDF
Coreference Resolution System Not Only for Czech
M. Novák
Computer Science
ITAT

2017
TLDR
Treex CR is introduced, a coreference resolution (CR) system not only for Czech but also for English, that operates on the tectogrammatical layer, a representation of deep syntax that allows for natural handling of elided expressions. Expand
6 Citations
PDF
...
1
2
3
4
...
References
SHOWING 1-10 OF 36 REFERENCES
Linguistic Processing Pipelines : Problems and Solutions
G. Wilcock
2009
Many of the typical tasks in linguistic processing pipelines can be done with tools from OpenNLP (http://opennlp.sourceforge.net). There are OpenNLP components for sentence detection, tokenization,Expand
3 Citations
PDF
Automatic alignment of Czech and English deep syntactic dependency trees
D. Mareček, Z. Žabokrtský, Václav Novák
Computer Science
EAMT

2008
TLDR
The results of the experiments show that shifting the alignment task from the word layer to the tectogrammatical layer both increases the inter- annotator agreement on the task and allows to construct a feature- based algorithm which uses sentence structure and which outperforms the GIZA++ aligner in terms of f-measure on aligned tECTogrammatical node pairs. Expand
16 Citations
PDF
CzEng 0.9: Large Parallel Treebank with Rich Annotation
Ondrej Bojar, Z. Žabokrtský
Computer SciencePrague Bull. Math. Linguistics
2009
TLDR
The paper provides full details on the current version of CzEng 0.9 and focuses on its new features, which provide a sentence-aligned automatic parallel treebank of about 8.0 million sentences, 93 million English and 82 million Czech words. Expand
48 Citations
PDF
Multilinguality in ETAP-3: Reuse of Lexical Resources
I. Boguslavsky, L. Iomdin, V. Sizov
Computer Science
2004
TLDR
The paper presents the work done at the Institute for Information Transmission Problems (Russian Academy of Sciences, Moscow) on the multifunctional linguistic processor ETAP-3 and emphasis is laid on multiple use of lexical resources in the multilingual environment. Expand
Dependency Treebank : A Word on the Million Words
Otakar Smrž Viktor Bielický Iveta Kouřilová Jakub Kráčmar Zemánek
2008
Prague Arabic Dependency Treebank (PADT) consists of refined multi-level linguistic annotations over the language of Modern Written Arabic. The kind of morphological and syntactic information… Expand
43 Citations
PDF
The Prague Dependency Treebank
Alena Böhmová, Jan Hajic, E. Hajicová, B. Hladká
Computer Science
2003
TLDR
Inspired by the Penn Treebank, the most widely used syntactically annotated corpus of English, this work decided to develop a similarly sized corpus of Czech with a rich annotation scheme. Expand
425 Citations
PDF
MaltParser: A Language-Independent System for Data-Driven Dependency Parsing
Joakim Nivre, Johan Hall, +5 authors E. Marsi
Computer Science, Economics
Natural Language Engineering
2005
TLDR
Experimental evaluation confirms that MaltParser can achieve robust, efficient and accurate parsing for a wide range of languages without language-specific enhancements and with rather limited amounts of training data. Expand
576 Citations
PDF
Converting Russian Treebank SynTagRus into Praguian PDT Style
D. Mareček, Natalia Kljueva
Computer Science
2009
TLDR
A work in progress on transforming syntactic structures from the SynTagRus corpus into tectogrammatical trees in the Prague Dependency Treebank (PDT) style is reported. Expand
5 Citations
PDF
The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages
Jan Hajic, Massimiliano Ciaramita, +11 authors Y. Zhang
Computer ScienceCoNLL Shared Task
2009
TLDR
This shared task combines the shared tasks of the previous five years under a unique dependency-based formalism similar to the 2008 task and describes how the data sets were created and show their quantitative properties. Expand
Free/Open-Source Resources in the Apertium Platform for Machine Translation Research and Development
Francis M. Tyers, F. Sánchez-Martínez, Sergio Ortiz Rojas, M. Forcada
Computer SciencePrague Bull. Math. Linguistics
2010
TLDR
The resources available in the Apertium platform, a free/open-source framework for creating rule-based machine translation systems, take the form of finite-state morphologies for morphological analysis and generation, bilingual transfer lexica, probabilistic part-of-speech taggers and transfer rule files, all in standardised formats.Expand
29 Citations
PDF
...
1
2
3
4
...
SORT BY
Related Papers
Natural Language Processing: State of The Art, Current Trends and Challenges
Diksha Khurana, Aditya Koli, K. Khatter, Sukhdev Singh
Computer Science
ArXiv

2017
TLDR
The paper distinguishes four phases by discussing different levels of NLP and components of Natural Language Generation (NLG) followed by presenting the history and evolution ofNLP, state of the art presenting the various applications of N LP and current trends and challenges.
50 Citations
Tiburon: A Weighted Tree Automata Toolkit
Jonathan May, Kevin Knight
Computer Science
CIAA

2006
TLDR
A weighted finite-state tree automata toolkit is introduced, which incorporates recent developments in weighted Tree automata theory and is useful for natural language applications such as machine translation, sentence compression, question answering, and many more.
76 Citations
Show More
2/10
Abstract
Figures, Tables, and Topics
135 Citations
36 References
Related Papers
Stay Connected With Semantic Scholar
What Is Semantic Scholar?
Semantic Scholar is a free, AI-powered research tool for scientific literature, based at the Allen Institute for AI.
Learn More
About
About Us
Publishers
Beta Program
Contact
Research
Team
Datasets
Open Corpus
Supp.ai
Resources
Librarians
Tutorials
FAQ
API
Proudly built by AI2
Terms of ServicePrivacy Policy
By clicking accept or continuing to use the site, you agree to the terms outlined in our Privacy Policy, Terms of Service, and Dataset License
ACCEPT & CONTINUE