Using SYSTRAN Dictionary Manager’s (SDM’s) Import feature, you can open dictionaries created with a spreadsheet application, such as Microsoft Excel, or a common text editor. These dictionaries must be carefully formatted before they can be imported into SDM.

Microsoft Excel files

To import dictionaries created with Microsoft Excel, the files must consist of two worksheets named for the tabs in the UD: Multilingual and Do Not Translate.

As with formatted text files, the Microsoft Excel file column headings for the Languages and information columns for the UD must be entered as you want them to appear in SDM.

Sample Excel Spreadsheet

excel sample

After the Excel file is imported, it appears in SDM as shown below:

excel sample SDM

Formatted text files

Formatted text files for import into SDM include the document header and the dictionary content.

  • The header part of the dictionary is a sequence of lines starting with the “#” character and containing a header field followed by its value.
  • The content part is a sequence of lines, with each line representing a dictionary entry whose fields are separated by tab characters

The field types are defined in the header. It is important that each line has the same number of fields, even if they are empty.

Required and optional fields for importing files into SDM
Header Description of Input
#AUTHOR= Optional: contains the name of the creator of the dictionary.
#EMAIL= Optional: contains the email address of the creator of the dictionary.
#COVERED DOMAINS= Optional header: lists all domains configured in the dictionary.
#ENCODING= Required: defines the encoding of the file. UTF-8 encoding is recommended.
#GENERAL DICTIONARY DOMAINS= Optional header: lists the system domains associated with the dictionary.
#SUMMARY= Required: the name of the UD file.
#MULTI/TM/NORM/DNT

#<Languages><Informational columns>=

Required: These two lines are the end of the header section.

#MULTI defines that the dictionary is a User Dictionary,

#TM defines that the dictionary is a Translation Memory, #NORM defines that the dictionary is a Normalization Dictionary.

#DNT is used to separate in a User Dictionary, multilingual entries from DNT entries.

The second line describes the list of columns in the content section. It is a list of codes separated by tab characters as described in the following table.

Description of the different codes defining the content fields
Code Description
XX Where XX is a 2-letter ISO 639 code in uppercase. This represents a language (see Appendix B. Language Pairs and ISO 639 Codes). The source language is always the first column, with target languages as the following columns.
XX_NO For Normalization Dictionaries only. XX corresponds to the ISO 639 code for the source language. These columns represent the Normalized columns.
UPOS User Part of Speech. This entry corresponds to the SDM Category column.
HEADWORD_XX This column is generated when doing an export. It contains the headword of the corresponding XX field. During import, this column is ignored.
PRIORITY Priority column
DOMAINS Domains column. Domains are comma separated.
FREQUENCY Frequency column.
EXAMPLE Example column.
PROPOSAL STATUS Status of the entry (the entry automatically extracted has a candidate status).
COMMENT Additional comment on the entry.
EXTRACTION CONFIDENCE Applies to automatically extracted entries; confidence of the extraction in an escalating scale of 0-1.
PREVIOUS TRANSLATION Applies to automatically extracted entries; the default SYSTRAN translation.

Sample formatted text file

The following sample text file is formatted for importing as a User Dictionary into SDM.  Note that <TAB> indicates the tab character.

#ENCODING=UTF-8
#AUTHOR=SYSTRAN
#EMAIL=smith@systran.fr
#COVERED DOMAINS=Computers/Data Processing,Perso
#GENERAL DICTIONARY DOMAINS=Computers/Data Processing
#PRIORITY=1
#SUMMARY=Demo Computer
#MULTI
#EN<TAB>FR<TAB>NOTE<TAB>DOMAINS<TAB>PRIORITY<TAB>UPOS 
write cycle<TAB>cycle d'écriture<TAB>Note<TAB>1<TAB>noun
write enable<TAB>validation écriture<TAB><TAB><TAB>noun
#DNT
#EN<TAB>NOTE<TAB>DOMAINS
Print 2000<TAB>It is a DNT<TAB>Perso

The following sample text file is formatted for importing into SDM as a Translation Memory.

#AUTHOR=SYSTRAN
#EMAIL=smith@systran.fr
#ENCODING=UTF-8
#SUMMARY=Demo
#TM
#EN<TAB>FR<TAB>DE
My name is Smith<TAB>Mon nom est Smith<TAB>Mein Name ist Smith
(Visited 921 times, 1 visits today)