Neural machine translation has come a long way. Today’s MT engines can produce fluent, contextually aware output across dozens of language pairs — but fluency does not automatically mean accuracy, especially when it comes to the specialised terminology your brand depends on. A neural MT engine trained on general web data may confidently translate your product name into something unrecognisable, render a legal term incorrectly, or ignore the specific phrasing your company has spent years establishing with clients.
That is where a custom glossary becomes one of the most powerful tools in your localisation workflow. By feeding your MT engine a curated list of approved terms and their target-language equivalents, you give the system the constraints it needs to stay on-brand, on-spec, and on-message. This guide walks you through exactly what a custom glossary is, why it matters for neural MT quality, and how to build and integrate one effectively — whether you are translating a website, a legal document, or a product suite across multiple Asian languages.
What Is a Custom Glossary in Neural Machine Translation?
A custom glossary, sometimes called a termbase or terminology database, is a structured list of source-language terms paired with their approved target-language translations. In the context of neural machine translation, it functions as a hard constraint or a strong preference signal — telling the engine that whenever it encounters a specific term, it should render it in a predetermined way rather than guessing based on statistical patterns alone.
Unlike a general dictionary, a custom glossary is organisation-specific. It reflects the particular language choices your brand, legal team, or industry has standardised. For example, a pharmaceutical company might require that a specific drug name always appear untranslated in the target language, while a financial services firm might need its product names rendered in a precise way that differs from how a general MT engine would interpret them. The glossary enforces these rules at scale, across every document the engine processes.
Most modern neural MT platforms — including DeepL API, Google Cloud Translation Advanced, Amazon Translate, and ModernMT — support custom glossary functionality to varying degrees. Some treat glossary entries as strict overrides; others weight them heavily but may still adapt context-dependently. Understanding how your chosen engine handles glossary integration is an important first step before you begin building your term list.
Why Custom Glossaries Matter for MT Quality
Neural MT engines are remarkably good at producing natural-sounding output, but they are optimised for fluency across broad datasets — not for your company’s specific vocabulary. Without a custom glossary, the engine will make its best guess at technical, branded, or industry-specific terms, and those guesses can range from slightly off to completely wrong. In high-stakes content like legal contracts, medical instructions, or regulatory filings, a single mistranslated term can have serious consequences.
Consistency is another critical factor. Even if an MT engine translates a term correctly in one document, it may use a different rendering in the next, depending on surrounding context. This inconsistency is jarring for end users and can undermine trust in your brand. A glossary locks in the approved translation and ensures that the same term appears the same way throughout a document and across your entire content library.
For companies operating in multilingual markets across the Asia Pacific region — where languages like Simplified Chinese, Traditional Chinese, Japanese, Korean, Bahasa Indonesia, and Thai all have distinct conventions and script systems — this consistency is particularly vital. Glossaries help bridge the gap between what an MT engine can produce generically and what your target audience actually expects to see.
Types of Terms That Belong in Your Glossary
Not every word in a document needs a glossary entry. The goal is to focus on terms where incorrect or inconsistent translation would cause confusion, brand damage, or compliance issues. The most important categories to include are:
- Brand and product names: Company names, product lines, software features, and trademarked terms that must remain consistent across all markets.
- Industry-specific terminology: Technical terms in legal, financial, pharmaceutical, IT, or engineering content that have approved translations within your industry or jurisdiction.
- Do-not-translate terms: Terms, acronyms, or proper nouns that should remain in the source language regardless of the target language.
- Regulatory and compliance language: Phrases required by government agencies or industry bodies that must match exactly approved wording.
- Internal jargon: Company-specific terminology, internal process names, or job titles that have specific approved translations in each market.
- Preferred style variants: Cases where multiple valid translations exist but your brand prefers one specific rendering for tone or positioning reasons.
Starting with a focused list of 50 to 200 high-priority terms is far more effective than attempting to build an exhaustive glossary from day one. A leaner, well-curated glossary tends to perform better than a bloated one full of general vocabulary that the MT engine already handles competently on its own.
How to Build a Custom Glossary Step by Step
Building a glossary that actually improves your MT output requires a structured process. Rushing through it or delegating it entirely to non-subject-matter experts tends to produce a list that the engine either ignores or misapplies. Here is a practical process that works across industries and language pairs.
- Audit your existing content: Start by reviewing your highest-volume and highest-risk documents — product manuals, legal agreements, marketing materials, and website copy. Identify terms that are repeatedly translated differently, or that you know carry specific meaning your MT engine currently gets wrong.
- Consult subject-matter experts and in-market reviewers: For each language pair, work with a certified translator or in-market expert to validate the preferred target-language rendering of each term. This is especially important for languages with significant regional variation, such as Portuguese (Brazil vs. Portugal) or Chinese (Simplified vs. Traditional).
- Define the term record structure: A well-structured glossary entry typically includes the source term, the target-language equivalent, a definition or usage note, the domain or content type it applies to, and any restrictions (e.g., “use only in legal documents”). Keeping this metadata makes the glossary easier to maintain and audit over time.
- Format the glossary for your MT platform: Different engines accept glossaries in different formats. DeepL API accepts CSV files with source and target columns. Google Cloud Translation Advanced supports TSV or CSV formats with specific column headers. Amazon Translate uses a CSV with source, target, and comment fields. Check your platform’s documentation carefully and validate your file before uploading.
- Test before deploying at scale: Run a sample set of documents through the engine with the glossary active and compare the output against your expected translations. Check that glossary terms are being applied correctly, that surrounding context is not being distorted, and that the overall fluency of the translation has not suffered.
- Establish a review and update cycle: A glossary is not a one-time project. As your product lines evolve, regulations change, and your brand language shifts, your glossary needs to keep pace. Set a quarterly or biannual review schedule and assign ownership to a specific team member or language service partner.
Integrating Your Glossary with Neural MT Engines
Once your glossary is built and validated, integration with your chosen neural MT engine is typically straightforward — but the technical details matter. Most enterprise MT APIs allow you to upload a glossary file and then reference it by ID when making translation requests. This means you can maintain multiple glossaries for different content types or departments and apply the appropriate one per workflow.
For organisations translating large volumes of content — such as e-commerce product catalogues, multilingual websites, or ongoing regulatory documentation — integrating the glossary within a broader translation management system (TMS) is often the most efficient approach. A TMS can apply the glossary automatically at the point of MT pre-translation, flag any terms that fall outside the glossary for human review, and feed approved post-edits back into a translation memory that further improves consistency over time.
If you are working with website translation at scale, glossary integration is particularly impactful because web content is often highly repetitive — navigation labels, call-to-action buttons, and product descriptions follow predictable patterns where consistent terminology is immediately visible to users. A well-integrated glossary ensures that your brand voice and terminology remain stable from page to page and update to update.
For document-based workflows, especially in legal, financial, or government contexts, glossary integration should be paired with robust proofreading by a qualified human reviewer. Even the best glossary cannot account for every contextual nuance, and in high-stakes documents, human oversight remains essential.
Common Mistakes to Avoid
Several patterns consistently undermine glossary effectiveness, and most of them stem from treating the glossary as a set-and-forget tool rather than a living resource. Being aware of these pitfalls before you begin will save significant rework later.
- Including too many terms too early: Overloading a glossary with general vocabulary interferes with the MT engine’s contextual decision-making and can actually reduce translation quality. Focus on genuinely problematic or high-stakes terms first.
- Skipping native-speaker validation: Using back-translation or bilingual dictionaries alone to populate a glossary risks introducing calques or unnatural phrasings that native speakers will immediately notice.
- Ignoring morphological variation: Some MT platforms require you to enter multiple forms of a term (singular and plural, different verb conjugations) to ensure consistent application. Failing to account for this is a common oversight, particularly for highly inflected languages.
- Not testing glossary application in context: A term that translates correctly in isolation may produce awkward or incorrect output when the engine tries to fit it into a complex sentence structure. Always test with real content samples.
- Neglecting updates after product or regulatory changes: An outdated glossary can actively introduce errors if it contains deprecated terms or superseded regulatory language. Assign clear ownership for ongoing maintenance.
Why Human Review Still Matters
Even with a precisely built and well-integrated custom glossary, neural MT output should not be published without human oversight for most professional use cases. A glossary constrains terminology, but it does not address register, cultural appropriateness, idiomatic naturalness, or the subtle shifts in meaning that can occur when a technically correct term appears in an unexpected syntactic context.
The most effective workflow combines the speed and scalability of neural MT with the precision of professional human review — a model known as machine translation post-editing (MTPE). In this approach, the MT engine with your glossary applied produces a first draft, and a certified translator then reviews, corrects, and refines the output. The result is faster turnaround and lower cost than pure human translation, without sacrificing the accuracy and cultural sensitivity that your audience expects.
For organisations operating across the Asia Pacific region, this human layer is especially important. Languages like Thai, Japanese, and Bahasa Melayu carry significant cultural and contextual nuance that glossaries alone cannot fully address. A localisation approach — one that adapts not just words but cultural framing, imagery references, and tone — is what separates good translation from truly effective multilingual communication.
Whether your content is destined for regulatory submission, a multilingual website, or a published marketing campaign, pairing your MT glossary workflow with qualified human expertise through a professional language translation service is the most reliable way to achieve consistently high-quality results at scale.
Final Thoughts
Building a custom glossary for your neural MT engine is one of the highest-impact investments you can make in your translation quality. It brings consistency to your brand terminology, reduces post-editing effort, and gives your localisation team a shared reference point that grows more valuable over time. The process requires upfront effort — particularly in term validation and platform integration — but once established, a well-maintained glossary becomes a core asset in your multilingual content strategy.
The key is to treat your glossary as a living document, not a one-time deliverable. Pair it with human expertise, keep it aligned with your evolving products and markets, and integrate it thoughtfully into your broader localisation workflow. When you do, you will find that neural MT becomes a genuinely powerful tool rather than a source of frustrating corrections — one that delivers speed and scale without sacrificing the accuracy your audience deserves.
Need Expert Help with Your Translation Workflow?
At Translated Right, our team of over 5,000 certified translators and localisation specialists helps businesses across Singapore and the Asia Pacific region build accurate, consistent, and culturally appropriate multilingual content — whether you need MT post-editing, custom terminology management, or end-to-end translation services across 50+ languages.






