Analytical Introduction to Introduction to Sinitic-Vietnamese

dchph in collaboration with Copilot

Introduction

The foundational Sinitic-Vietnamese (VS) lexical stratum of the Vietnamese language, as examined in What Makes Chinese So Vietnamese and outlined in its Executive Summary, is not merely a passive accumulation of Chinese loanwords. Rather, it constitutes a dynamic, internally layered, and etymologically rich stratum shaped by sustained and multifaceted contact between northern Chinese lects and a Yue-derived proto-Vietic substrate.

This report presents a comprehensive analytical review of the Chapter 1: Introduction to Sinitic-Vietnamese corpus, encompassing all cited etymons, semantic chains, and polysyllabic annotations. Its purpose is to demonstrate how the corpus substantiates the thesis that Sinitic-Vietnamese forms a core indigenous layer of the language, not a superficial literary veneer. 

The structure of the report follows the established research directives: corpus architecture, etymological analysis, register stratification, comparative linguistic features, and semantic chain mapping. Each section unpacks the methodologies and findings of Chapter 1 with precision and clarity.

Corpus structure

Chapter 1’s investigation is anchored in a rigorously curated corpus designed to illuminate the stratified evolution and contact-induced dynamics of Vietnamese vocabulary. Far more than a static collection of lexical entries, the corpus functions as a multidimensional analytical framework—archiving, contextualizing, and stratifying vocabulary by origin, phonological development, and sociolinguistic integration.

Comprising roughly 800 to 900 lexical items, the corpus is intentionally selective rather than exhaustive. Its entries are drawn from a diverse array of sources: early written records such as ChữNôm texts, comparative data from both modern and archaic Vietic lects, contemporary Vietnamese usage, and systematic cross-referencing with Old and Middle Chinese reconstructions. Its internal architecture reflects both genetic lineage and areal convergence, with structural segmentation guided by principles of historical linguistics and contact typology.

The analytic scaffolding of the corpus includes:

  • Tripartite Sinitic stratification: Early Sino-Vietnamese (ESV), Late Sino-Vietnamese (LSV), and Recent Sino-Vietnamese (RSV) are aligned with recognizable sociohistorical periods: Han/Jin, Tang/Song, and Ming/post-Ming eras, respectively1 2
  • Etymon-centered annotation: Each entry is subjected to rigorous multi-level annotation, specifying VS/SV forms, Middle Chinese (MC) and Old Chinese (OC) reconstructions, meanings, and critical comparative notes. 
  • Polysyllabic annotation protocols: Compound and reduplicative structures are recorded with morpheme-by-morpheme glossing, accommodating both native analytic patterns and contact-induced forms3
  • Layer-tagged lexical indexing: Items are explicitly tagged for stratum, register (literary, colloquial, vernacular), and-when sufficient data allow-socio-regional provenance and alternative readings. 
  • Phonological and semantic extension: Sinitic-Vietnamese words derived from Sinitic compounds will produce more meanings than the original ones via sound changes conveying their associated semantic roots.

This corpus serves as both a philological instrument and a digital-ready platform for algorithmic annotation, enabling granular etymological tracing and macro-level register mapping.

The corpus is supported by a series of structured tables, each serving a critical analytical function:

  • Cross-referenced comparative reconstructions of Old Chinese (OC)16 vs. alongside Mark J. Alves’s proto-Sino-Vietnamese onset inventory9
  • Core indices for identifying lexical strata based on rime, onset, and tonal behavior,
  • Cross-referenced tables linking Early Sino-Vietnamese (ESV) and Late Sino-Vietnamese (LSV) candidates, phonetic indices, and register-based doublets4 5

Additionally, selected entries are annotated with comparative data from conservative Vietic languages such as “Rục” and “Mường”, allowing for stratified anchoring across historical layers. This layered design results in a digitizable corpus well-suited for both philological analysis and algorithmic annotation workflows.

In summary, the corpus is both structurally rigorous and analytically versatile, designed to support micro-level etymological tracing as well as macro-level register mapping. It provides the evidentiary foundation necessary to distinguish the internally stratified, contact-generated Sinitic-Vietnamese stratum from later, more superficial borrowings4 6.

Etymological analysis

At the heart of Chapter 1 lies the etymological mapping of lexical entries, which underpins the broader argument concerning the depth, modality, and register of Sinitic influence in Vietnamese. The analysis organizes the corpus into reliable strata by applying phonological, semantic, and historical filters, referencing both Middle Chinese and Old Chinese reconstructions alongside comparative Vietic data.

Key criteria and techniques:

  • Phonological correspondences: The diagnostic use of segmental correspondences, especially in initial (onset) and final (rime and coda) positions, allows differentiation between early and late borrowings. For example, words with Vietnamese voiced fricative onsets (v-, d-, gi-, g-) correlating with complex OC clusters are proven markers of deeper, older integration, frequently paralleling conservative forms in non-Vietnamese Vietic lects (Rục, Mường)2.
  • Tonal evolution: Leveraging Haudricourt’s model, the chapter scrutinizes how OC final *-s and *-ʔ gave rise to the three major Vietnamese tonal sets (ngang/huyền, hỏi/ngã, sắc/nặng) and matches tone shifts with historical borrowing periods2.
  • Compound and reduplication annotation: Etymons that recur as components of compounds or reduplicants are tagged for polysyllabic annotation, revealing not only contact-induced formations but also morphosemantic layering3.
  • Comparative etymology: Each etymon is cross-referenced for parallel forms in MC, OC15, Sino-Tibetan18, and other Vietic or Austroasiatic languages, ensuring that supposed borrowings are not, in fact, retentions or autochthonous innovations16 .

Below is a representative extract from the compiled etymon table for selected cornerstone entries.

Table 1: Select register-layered monosyllabic Sinitic entries

Etymon
(漢字)
VS/SV
Form
MC ReconstructionOC ReconstructionGlossComparative Notes
劍 jiàn(thanh)
gươm / gươm /
kiếm

kjəm
**kə.ms >
*kams
swordLenition [ɣ-], Rục təkɨəm → sesquisyllabic Vietic parallel
鏡 jìngs-kương / gương / kiếng / 
kính
kiajŋ**sk’raŋs >
*kraŋs
mirrorPrefix **s- reflects Sino-Tibetan instrumentality; doublet preserved
唱 chàngʔ-ɕướng / xướng / khoantɕʰiɐŋ**d̥raps >
*tʰjaŋs
to chantDummy prefix **ʔ- with Vietic comparanda
公 gōngcồ /
ông /
trống /
công
kəwŋ ***qˤoŋ >
*klo:ŋ
public;
grandpa;
male;
duke
Initial k- robust in SV; traceable in Mường, etc.Cf. Old Khmer khloñ, Proto-Tai *luŋᴬ 
魚 yúngá /
ngư /
ŋɨə̆*ŋafishNasal onset across SV and MC > glottal ʔ- > k-; Vietic “ngá”; cf. MinNan 魚汁 yúzhī ‘catsup’
海  hǎikhơi /
bể /
biển / 
hải
həj**hmɯːs >
*hmlɯːʔ
seaDoublets mapped with LMC onset traits; Cf. ‘mệ’, ‘mẹ’  母  mǔ (SV mẫu) , 每 (OC *mɯːʔ), 晦 (OC *hmɯːs, “dark”); in numerous Zhou texts 海 = 晦 huì (OC *hmɯːs)
龍 lóng, lǒng, máng(thuồng-)
luồng /
long /
rồng
luawŋ**r-loŋ >
*b-loŋ
dragon;
aquatic
monster
serpent;
Rime /loŋ/ shares features with ESV and LSV; Cf. VS ‘thuồngluồng’,  Khmer រោង (roong, “year of the dragon”), Thai มะโรง (má-roong, “dragon; year of the dragon”)
大 dàđại /
thái /
to /
cả
daj, da**lats >*da:dsbig;
full;
eldest
Appears in all strata; LMC tonal distinctions observed and doublets preserved; Wang (1982) also lists 誕 OC *l’aːnʔ as cognateCf. VS ‘lớn’ (big)
心 xīntâm /
tim /
lòng /
õi
sim**slɯm >*sə.mheart,
soul;
mind; core
Partial ESV preservation, comparable with early loans/transfers
酒 jǐutửu /
rượu
tsɨu**ʔsluʔ >
*tsuʔ
wineTexture identifies as RSV candidate (late borrowing)

These and other entries are annotated with detailed Middle Chinese and Old Chinese phonological data drawn from the work of renowned scholars, including Bernhard Karlgren, Wang Li 王力, Li Rong 李榮, Zhao Rong-Fen 邵榮芬, Zhengzhang Shangfang 鄭張尚芳, Pan Wuyun 潘悟雲, Edwin G. Pulleyblank16, and the Baxter-Sagart reconstruction system7 5, and importantly Sino-Tibetan etymology colossus work by Shafer17, a goldmine of comparative etymology. Each etymon is cross-verified with reflexes found in conservative Vietic dialects such as “Rục” and “Mường” 8, as well as many Khmer lects, ensuring stratified accuracy and comparative depth.19 20.

Selected standout examples where Vietnamese words show strong cognate relationships with Sino-standout examples where Vietnamese words show strong cognate relationships with Sino-Tibetan roots, especially those from Chinese, Bodic, Burmic, and Daic branches17. The etymology runs cross virtually all categories of human languages such as Body Parts & Physicality , Verbs & Actions, Kinship & Social Roles, Food & Agriculture!

Table 2: Select extract of full ST etymons with Kukish and Bodish as representatives 

Body Parts & Physicality

Viet             Meaning   Sino-Tibetan Cognates    Chinese EtymaNotes
lưỡi        tongue    Kukish lei,
    Bodish ltśe
        舌 shé,
        脷 lěi
    脷 more plausible for VS;
    Cf. Cantonese ‘lei6’
chân        leg/foot    Bodish rkań,
    Kukish kʿoń
        腳 jiǎo,
        足 
    Also linked to
    脛 jìng (VS cẳng)
bụng        belly    Kukish puk,
    Burmish puik
        腹     Rated *****
mắt        eye    Bodish mig,
    Kukish mik
        目     Rated *****
mũi        nose    Bodish mtśʿul,
    Kukish tśʿul
        鼻     Rated **

Food & Agriculture

Viet            Meaning Sino-Tibetan Cognates         Chinese EtymaNotes
bánh        cake        Daic pɛń,
        Kukish piŋ
        餅 bǐng    Rated ******
muối        salt        Kukish tśi,
        Burmish śo-ra
        鹽 yán,
        硝 xiāo
    Complex etymology
gừng        ginger        Daic khiŋ,
        Kukish khiń
        薑 jiāng    Rated ******
        fish        Burmish ńa,
        Kukish kʿai
        魚     Rated ****
cơm        rice        Burmish tśa,
        Lolo tsa-
        膳 shàn    Also linked to 飯 fàn

Kinship & Social Roles

Viet            Meaning      Sino-Tibetan Cognates        Chinese Etyma    Notes
grandma    Kukish pi,
    Bodish pʿyi-mo
        婆 妣  cited but less plausible
bốfather    Kukish pu,
    Burmish bʿui
        父 Strong phonetic match
chịelder sister    Kukish tśar,
    Bodish ʾa-tśʿe
        姊 Also compared with 姐 jiě
cháunephew;
niece
    Kukish tʿu,
    Burmish tu    
        姪 zhíSemantic alignment with SV ‘tỉ’
cậumaternal uncle    Kukish kʿu,
    Bodish kʿu-bo
        舅 jìuRated ******

Verbs & Actions

Viet        Meaning        Sino-Tibetan Cognates    Chinese EtymaNotes
cắt    cut    Bodish tśa,
    Daic kăt
割 Rated ******
dẫn    lead    Daic tśuń,
    Burmish tsiń
引 yǐnRated *****
sỏ    play    Kukish tśai,
    Luśei tśai
耍 shuǎRate ****
chọn    choose    Daic khɔń,
    Kukish dzək
選 xuǎnRated ***
lấy    take    Kukish laʾ,
    Burmish lu
拿 Rated ****

Observations

  • The author uses a star rating system (from * to ******) to indicate degrees of cognateness.
  • Cognates often span multiple Sino-Tibetan branches, cited only two instances herein, reinforcing the hypothesis of deeper substratal connections.
  • Many Vietnamese words show stronger alignment with Chinese etyma than with Mon-Khmer roots, say, “đầu; trốc, troốc” [ M 頭 tóu < MC dəw < OC *do: ], “thân, mình” [ M 身 shēn < MC ɕin < OC *qʰjin], “chân, cẳng” [ M 脛 (踁) jìng, héng, xìng < MC ɦɛjŋ < OC *ɡeːŋʔ, *ɡeːŋs  ]21, etc.

Through this precise etymological annotation, the corpus substantiates several key claims: 

  1. that Sinitic influence on Vietnamese was driven by widespread oral contact rather than limited to literary borrowing, 
  2. that phonological features such as preinitials, rime alternations, and tonal reversals are best understood as outcomes of sustained bilingualism within a multi-register society, and
  3. that many so-called Sinitic etyma in Vietnamese are in fact deeply indigenized, functioning as core components within native semantic chains and compounding structures9.

The following table of randomly selected Sinitic disyllabic lexemes serves to substantiate the preceding claims.

Table 3: Folk lexical extensions by core components

Etymon
(漢字)
VS/SV
Form
MC
Reconstruction
OC
Reconstruction
GlossEtymological Notes
贏錢
yíngqián
ăntiền
jiajŋdziɛn
*leŋʔslenʔ win money賭輸贏 dǔshūyíng (ănthuađủ) ‘put a bet on’
彼時
bǐshí
bấygiờbǐdʑɨ*pralʔdjɯ by thenCf. 彼一時此一時. Bǐyīshícǐyīshí. (Bấygiờ khác, bâygiờ khác.) ‘It’s different now.’
白鴿
báigē
bồcâubaɨjkkəp*bra:gkuːb white pigeon白鴿 成為 和平的 象徵. Báigē chéngwéi hépíngde xiàngzhēng. (Bồcâu tượngtrưng cho hòabình.) ‘White dove became a symbol of peace.’
邊界
biānjiè
bờcõi penkəɨj*mpeːnkre:dsfrontier民族 為了 保衛 國家 的 邊界 而 戰鬥. Mínzú wèile bǎowèi guójiā de biānjiè ér zhàndòu. (Dântộc vìlà bảovệ bờcõi nướcnhà mà chiếnđấu.) ‘The people fought to defend the nation’s borders.’
阻隔
zǔgé
cáchtrởtʂə̆kɯæk *ʔsraʔkreːɡ  separateNote the reverse order of the disyllabic word. Ex.: 《詩經》: 邊關 阻隔 千里, 情懷 相連. ‘Shījīng’: Biānguān zǔgé qiānlǐ, qínghuái xiānglián. (‘Thikinh’: Quansan cáchtrở muônngàn, tuyxamàgần.) ‘Book of Odes: Frontiers may stretch across vast distance, yet sentiment flows unbroken.’
殘羹剩飯
cángēng-
shèngfàn
cơmcặn-canhthừadzankaiŋ-ʑiŋbwan*za:nskraŋ-ɦljɯŋsbonʔ  hand-downsIdiomatic expression reserves all semantic and contour of sound.
休想
xīuxiăng
chớhònghɨusɨaŋ*qʰuslaŋʔ don’t you ever think of你 想 騙 我? 休想! Nǐ xiǎng piàn wǒ? Xīxiǎng! (Anh muốn bịp tôi hả? Đừnghòng!) ‘You want to fool me? Don’t even think about it!’
露底
lòudǐ
đểlộ /
lộtẩy / lộxì
luotei*ɡraːɡstiːlʔlet out a secret, unveil a secret. Also: (informal), expose one’s underwear,  All doublets are reserved here just to show how the Vietnamese adapt well all semantic variants of the disyllabic word.
甭想
béngxiăng
đừnghòngbjuawŋsɨaŋ [ non-existent ]don’t even think aboutSemantically “休想 xīuxiǎng”. This alignment underscores how Vietnamese expressions—at their core—resonate more closely with northern Sinitic colloquialism.
Ex. ‘他國 要是 趁亂 佔領 邊界? 甭想! Tāguó yào chènluàn zhànlǐng biānjiè? Béngxiǎng! (Nướclạ muốn nhânlúc hỗnloạn chiếm biêngiới hả? Đừnghòng!) ‘Foreign country wants to encroach our border during chaos? Not a chance!’
孝道
xiàodào
hiếuthảohaɨwdaw *qʰruːsl’uːʔfilial piety Cf. 孝順. xiàoshùn (hiếuthảo). Both are dissyllabic derivatives expressing the same semantic core—filial piety—yet the lexical preference remains at the discretion of Vietnamese speakers.

The robustness and granularity of such etymological annotation, especially in cross-referencing with polysyllabic compounds and derived forms, is critical for demonstrating the depth, not superficiality, of Sinitic integration in Vietnamese.5 6.

Register layering

One of the report’s core findings, anchored in corpus evidence, is the persistent and nuanced layering of Sinitic-Vietnamese vocabulary across formal, colloquial, and vernacular registers. This register stratification is visible in both the phonological and sociolinguistic domains, as revealed through doublets, tone reversals, and regional variants.

Principal findings include:

  • Grassroots bilingualism: Early Sinitic borrowings, particularly at the ESV level, entered Vietnamese chiefly via grassroots oral bilingualism, rather than via an elite “reading pronunciation” tradition. These words were nativized both phonetically and pragmatically and permeated the vernacular register, functioning as high-frequency, “native-feeling” items, e.g., “mũ” (帽 mào, ‘hat’), “giày” (鞋 xié, ‘shoe’), “vợ” (婦 fù ,‘wife’)10
  • Literary-literacy layer: With the institutionalization of Literary Sinitic in administration and scholarship (especially from the Tang period on), a formally codified register emerged. This LSV register preserved systematic MC-derived readings, crystallized through rhyme dictionaries like Qieyun, and maintained consistent phonological and tonal patterns. Essentially, these items were circulated within educated and written registers, e.g., “phụ” (婦 fù, ‘woman’), “pháp” (法, ‘law’), “ký” (記 jì, ‘to record’)5 10
  • Colloquialization and doublets: A key side effect of this history is the proliferation of doublets-pairs of lexical items traceable to a single Chinese source but split across temporal and register boundaries. For instance, an item like ‘gươm’ (劍 jiàn, ‘sword’, ESV: grassroots, vernacular) stands against ‘kiếm’ (sword, LSV: literary), ‘vợ’ (婦 fù, ‘wife’, OSV/vernacular) opposed to ‘phụ’ (LSV/literary), and ‘mùi’ (smell, OSV/native) to ‘vị’ (未 wèi, ‘taste’, LSV/formal)10
  • Regional and social layering: The corpus also tracks how socio-regional dialects and strata align with different Sinitic layers; for instance, items retained in conservative Mường or North-Central Vietnamese suggest pre-literary embedding. Some words retained in these regions show older, non-tonal, or weakly tonal forms, in contrast to standardized MC-based forms prominent in Hanoi or written Vietnamese8 5, e.g., Mường/North-Central form: ‘chài’ (net) vs. ‘võng’ (網 wǎng, ‘net’, ‘hammock’), ‘chàm’ (藍 lán, ‘indigo’, also in vernacular Vietnamese) vs. ‘lam’ (藍 lán, ‘blue’), ‘chài’ vs. ‘lưới’ (羅 luó, ‘net’), etc.
  • Stratification in word formation: Compounds like ‘giáosư’ (professor, 教師), ‘thưviện’ (library, 書院), and ‘bácsĩ’ (doctor, 博士) show not only register stratification but also semantic specialization within the high-register layer, often diverging semantically from their Chinese or Japanese analogues10.
  • This evidence directly supports the chapter’s thesis that VS is not a single, uniform layer, but a stratified system reflecting centuries of bilingualism, diglossia, and social differentiation. It also illustrates the unique capacity of the Vietnamese lexicon to synthesize and innovate, even as it inherits imported morphemes6 10.

Comparative features of Sino-Vietic contact

The corpus methodology and comparative approach undertaken in Chapter 1 provide robust evidence for contact-induced convergence, divergence, and substratum influence between Chinese, Vietic, and other East/Southeast Asian languages. This analysis relies on meticulous cross-referencing and reconstruction, leveraging (in particular) the AMC (Annamese Middle Chinese) hypothesis-that a southern Chinese lect became nativized and absorbed into proto-Vietic during the first millenium CE.

Comparative findings across domains

  • Phonological systems: Vietnamese is one besides of very few non-Sinitic languages to preserve the palatal/retroflex sibilant distinction of Early Middle Chinese (a distinction lost in most modern Chinese dialects). In LSV, reflexes of MC labiodentalization (e.g., SV v- < MC *v-) and grade II palatalization (e.g., -y- medial) appear more regularly than in corresponding Cantonese or Mandarin forms, but also preserve conservative features not shared with these counterparts, due to their southern, “Annamese” lect origins5 10
  • Lexical inheritance & innovation: While hundreds of core words in Vietnamese are securely assigned to Sinitic origin (either as ancient loans or as systematic SV readings), a small proportion of the basic lexicon remains demonstrably Austroasiatic, especially in numerals and agricultural terms11.
  • Semantic drift and chain innovation: Vietnamese, far more than Japanese or Korean, systematically coins new compounds out of SV morphemes, e.g., ‘linhmục’ (靈牧 língmù, ‘priest’), ‘giảkimthuật’ (冶金術 yějīnshù, ‘alchemy’), establishing semantic innovations not paralleled in Chinese itself. These reflect not just borrowing but creative recombination in a diglossic area10
  • Proof from conservative dialects: Numerous items regarded as “Sinitic” in Vietnamese find their closest parallels in conservative Vietic languages-especially Rục, Thavung, and Mường, which preserve presyllabic structure (e.g., Rục ‘təkɨəm’, Rục prefixal formations) and pre-tonal syllabification, thus serving as a living laboratory for contact phonology12 1
  • Typological convergence: Vietnamese displays the analytic, morphemic-syllabic, non-inflectional profile typical of the Mainland Southeast Asia Sprachbund (areal grouping), with a small number, though, but with the persistence of polysyllabic and sesquisyllabic forms in non-standard dialects points to contact-induced convergence and layered histories, not simplistic replacement11 2

Table 4: Comparative etymon examples

Etymon
(漢字)
VS/SVFormMC
Reconstruction
OC
Reconstruction
GlossComparative Notes
榮光 róngguāngquangvinh,
vangbóng,
vẻvang /
vinhquang
ɦwiajŋkwaŋ*ɢʷreŋkʷaːŋsgloriousThis is compelling case study that intersects multiple phonological trajectories from Early Middle Chinese (EMC) into Literary Sino-Vietnamese (LSV) and vernacular Vietnamese (VS).
望文生義  wàngwén-shēngyìtrôngvăng-đặtnghĩa /vọngvăn-sanhnghĩamaŋsmiun-ʂaɨjŋŋjiə̆ *maŋmɯn-shleŋŋrals folk etymologyThis idiom is the strong case of Vietnamese reservation of palatal-retroflex sibilant distinctions from Early Middle Chinese (EMC), reflexes of labiodentalization (e.g., SV v- < MC v-) and of grade II palatalization (e.g., medial -y-)
木偶戲 mù’ǒuxìkịchmúarối / mộcngẫuhíməwkŋəwhjiə̆*moːɡŋoːsqʰralspuppetryThis is a compelling example of lexical inheritance and innovation, especially when viewed through the lens of Sinitic-Vietnamese (SV) transmission and vernacular adaptation. Shift in phonological, semantic, and structural innovation exemplifies lexical innovation: the SV compound is semantically reinterpreted and replaced by a native phrase that better fits vernacular usage and cognitive framing.
四姊  sìjiě chịtư,
chếtư
vs.chị ‘bốn/ tứtỷ
sɨtsiɪ*hljidsʔsiʔ sister fourchế‘, ‘chị’, ‘tư’ are a Sinitic-Vietnamese word, but ‘bốn’ is cognate with Mường ‘pổn’, Khmer ‘buən’.
Cf.  ’emba’ (三妹 sānmēi. ‘sister three.’), and note that the following words reserve all cultural context: ‘chịcả‘ (大姐 dàjiě ‘, eldest sister) , ‘anhcả‘ (大兄  dàxiōng, ‘eldest brother’), ‘anhhai‘ (二兄 èrqiōng, ‘second older brother’),
仔細  zǐxìtỉmỉ /tửtếtsɨsɛj *tsɨse:skind; meticulousThis is a case of innovation. Cf. 慈濟 cíjì (SV ‘từtế‘, VS ‘tửtế‘) = ‘kindhearted’.
名聲 míngshēngthanhdanh,  tiếngtăm,
danhtiếng,
vangtiếng,
tiếngvang /danhthanh
miajŋɕiajŋ*meŋqʰjeŋfame, renownThe case of 名聲 → thanhdanh in Literary Sino- Vietnamese (LSV) does not directly exemplify the preservation of the palatal-retroflex sibilant distinction of Early Middle Chinese (EMC), but it does intersect with other phonological conservatisms that LSV retains—particularly in labiodentalization and medial palatalization. Cf. 望文生義 wàngwénshēngyì (SV vọngvănsinhnghĩa, ‘fork terminology’).
善良
shànliáng
hiềnlương/lươngthiệndʑianlɨaŋ*ɡjenʔraŋmorally good and kindMultiple Vietnamese reflexes across Sino-Vietnamesevernacular, and semantic analogs, each reflecting distinct etymological strata via Vietnamization @ 善 shàn (SV thiện) ~ ‘hiền‘ 賢 xián (hiền), @ 良 liáng ~ ‘lành‘.

These comparative patterns both validate and complicate the notion of Sinitic-Vietnamese as a distinct stratum, showing not only what was borrowed or nativized, but how Sinitic features were remixed with enduring Yue/Vietic structures and semantic fields.

Semantic chains and polysyllabic annotation

A critical dimension underpinning the chapter’s thesis is the presence of semantic chains-both diachronic (layered developments across time) and synchronic (coexisting derivatives and compounds) — most vividly observed in how Sinitic and vernacular roots combine, diverge, and radiate across registers.

Characteristic examples:

  • Direct semantic chaining: Lexical roots such as  (gươm/kiếm, ‘sword’) recur across derived expressions and technical compounds—for example, gươmđao (‘swords and sabers’)—highlighting indigenous compounding practices distinct from donor languages and evidencing deep assimilation of Sinitic material9
  • Compounding of Sino-Vietnamese roots: The construction of polysyllabic compounds—such as giáosư (professor, 教師), thuỷngư (‘aquatic animal’, combining ‘water’ and ‘fish’), and nhạcsĩ (‘musician’, 樂士)—exemplifies both the generative capacity and inventive reconfiguration of Sinitic morphemes within Vietnamese morphosyntactic and semantic frameworks10
  • Semantic divergence within lexical chains: Certain chains reveal functional differentiation across registers and diachronic layers. For example, vị (味, ‘taste’, formal) contrasts with mùi (vernacular, ‘smell’); lạy (Old SV, ‘kowtow, bow’) diverges from lễ (SV, ‘ceremony’); and việc (Old SV, ‘work, event’) stands apart from dịch (SV, ‘service, corvée’). These pairings illustrate how semantic domains were restructured over time, with shifts in usage, tone, and sociolinguistic context.9
  • Macro-domain chaining: The corpus documents extensive “macro-chains”—entire semantic fields such as metallurgy (vàng, bạc, sắt, đồng, gang, thép), agronomy, and kinship—that can be reconstructed diachronically. These domains were progressively enriched or supplanted by Sinitic vocabulary as waves of social, technological, and administrative innovation permeated proto-Vietic speech communities, reshaping lexical landscapes at the systemic level9
  • Polysyllabic annotation: Morpheme-level annotation practices—exemplified by compounds such as giáosư (giáo ‘teach’ + sư ‘master’) and nhiệtkế (nhiệt ‘heat’ + kế ‘device’, i.e., thermometer)—not only illuminate the internal compositional logic of Sinitic formations but also trace the pathways through which Chinese-derived roots were refunctionalized within indigenous Vietnamese semantic and syntactic frameworks.9

The ability to identify, annotate, and analyze these chains-their branching, looping, and sometimes discontinuous nature, demonstrates that the Sinitic-Vietnamese layer is not just an inert deposit but a creative substratum that Vietnamese speakers exploited for both lexical innovation and semantic extension.

Below is the requested table, designed as a synthesis of the chapter’s approach and data. Each entry represents compounds formed from either a major Sinitic root or a core comparative form in Table 1 above, analyzed in-depth in Chapter 1.

Table 5: Etyma by core corpus extraction

Etymon
(漢字)
VS/SV
Form
MC
Reconstruction
OC
Reconstruction
GlossComparative Notes
刀劍 dāojiànđaogươm,
gươmđao,
gươmdao, daokiếm / kiếmđao, đaokiếm
tawkjəm *ta:wkamsbladed weaponsCf. 
劍刀 jiàndāo as attested in Chinese classics, e.g., 《三國演義》第九一回:「或為 刀劍 所 傷,魄 歸 長夜,生 則 有 勇,死 則 成名。」 ‘Sānguó Yǎnyì– Dìjǐushíyì Huí’: ‘Huòwéi dāojiàn suǒ shāng, pò guī chángyè, shēng zé yǒuyǒng, sǐ zé chéngmíng. (‘Tamquốc Diễnnghĩa’ — Hồi Thứ Chínmốt: Hoặcbị gươmđao chém thương, làm ma vấtvưởng; sống thì anhdũng, chết được thànhdanh.)
眼鏡  yǎnjìng vs. 目鏡 mùjìng mắtkính, mắtkiếng,
mắtkính,
kínhmắt,kiếngmắt /
mụckính, nhãnkính
eye glasses(Hakka, Southern and Puxian Min, Hainanese) Reflects reordering and substrate phonology; similar to Hainanese /matkɛng/ in morphemic order and phonotactics as opposed to VS “kiếngmắt” — which reflects reordering and substrate phonology. “kiếng” is a Southern variant of “kính”Rục /matkɛng/ and Hainanese forms preserve older phonological features, offering “living laboratory” evidence for contact phonology.
唱和 chànghéxướnghoạ,xướnghò,khoanhò, hòkhoan/
xướnghoà
tɕhaŋɦwa*tʰjaŋsɡoːlschantCf. 你們 倆 一唱一和. Nǐmen liǎ yīchàngyīhè. (Haiđứa mày kẻxướngngườihò.) ‘The two of you chant in the chorus collusively.’
微算機wēisuànjīmáyvitính / vitoáncơmjɪswankɨj*mɯjsloːnsʔkɨjkɯlmicro computer微算機 is a Vietnamese compound of [ (vi) – micro] + [  (toán) – compute ] +  () – machine] = ‘micro computer‘, a polysyllabic compound with SV and VS roots.  OC and MC sounds are what to make up the pronunciation.
魚汁 yúzhīsốtcá, nướccá, mắmcá, nướcmắm /  ngưtrấpŋjotɕip*ŋakjubfish sauceThere is a noteworthy etymological case in English—the word ketchup (or catsup) (note the syllable cat)—which has a Sino-origin meaning of “fish sauce” (魚汁 yúzhī). The British originally borrowed it from the Fujian region in earlier times, where locals used fermented fish sauce for seasoning. However, when this item was brought back to England, the English language transformed it into “tomato sauce.”
公雞gōngjī vs雄雞.xióngjī gàcồ / gàtrống côngkê / hùngkêkəwŋkiej*klo:ŋke: roosterThey are strong cases of Semantic divergence within lexical chains with variants across numerous Sinitic lects. Cf. 母雞 mǔjī (gàmái, gàmẹ, ‘hen’), not to mention ancient local forms such as 雞公 jīgōng雞母 jīmǔ (gàmẹ).
龍飛鳳舞
lóngfēi-fèngwǔ
 rồngbay-phượngmúa /longphi-phụngvũluawŋpwyj-
buwŋwǔ 
*b·roŋpɯl-
bumsmaʔ
grand and flamboyant styleAnother case of direct semantic chaining.
大志
dà​zhì​ (đạichí)
 chícả,chílớn /đạichídajtɕɨ*da:dstjɯshigh aimsAnother case of semantic divergence within lexical chains, cf, variant: 胸無大志. xiōngwúdàzhì. (ngườikhôngcóchí.)
顆心kēxīntráitim,contim,quảtim / khoảtâmkʰwasim*kʰloːlʔslɯmheartThis should be a case of macro-domain chaining within the semantic divergence within lexical chains. Cf. 果 guǒ (SV quả) ‘fruit’, hence ‘trái‘ (clipping of VS ‘quảtrái‘ 果實  guǒshí (SV quảthực) that gives rise to a classifier 顆 kē for ‘con’, ‘quả’, ‘trái’ (small round objects), Vietnamized as a morphemic-modifier syllable in 顆心 kēxīn.
酒席
jǐuxí
rượutiệc,tiệcrượu /tửutịchtsɨuziajk*ʔsluʔljaːɡbanquetExample: 富家 一 席酒 窮漢 半年 糧. Fùjiā yì xíjǐu, qiónghàn bànnián liáng. (Một bữa tiệcrượu của ngườigiàu bằng lương nửanăm kẻnghèo.)
力氣lìqìsứclực,hơisức /lựckhílɨkhɨj*rɯɡkʰɯdsstrengthNative parallels in Vietic, while VS ‘sức’ is, a native Vietnamese Proto-Vietic k-rək, cognate of Chinese 力, not merely a translation or borrowing. It reflects a shared etymological ancestry and semantic continuity, while lực is the Sino-Vietnamese literary reflex directly borrowed from Middle Chinese. In this case, 氣  (SV khí, VS hơi, ‘steam’) is associated with ‘sức‘. Cf. 氣力 qìlì (SV khílực, VS hơisức) ‘power’.
本錢běnqiántiềnvốnvốnliếng /bảntiền bổntiềnpwəndziɛn*pɯːnʔslenʔroot/fundsThis is a case of semantic divergence within lexical chains with phonological innovation on the second syllable, which is commonplace in Sinitic-Vietnamese. Ex.  她的 本錢 是 青春 美麗. Tāde běnqián shì qīngchūn měilì. (Vốnliếng của nàng là thanhxuân trẻđẹp.) ‘Her capital is youth and beauty.’
工役  gōngyìcôngviệc /côngdịchywek*wjekservice / worka compound that reflects both Sino-Vietnamese inheritance and vernacular semantic fusion, ahybrid compound typical of Vietnamese lexical layering. For ‘việc’ (役 yì, SV dịch) semantic layering of 為 (OC *ɢʷal, ‘to do’) with k-extension, its etymology can be traced directly to the Classical Chinese compound 工役 (gōngyì), though. Cf. 公務 gōngwù (SV côngvụ, VS côngviệc) ‘business’
幸福  xìngfúphướclành,phúclành /hạnhphúchạnhphước ɦəɨjŋpʰuw *ɡreːŋʔpɯɡbliss幸福 (xìngfú) and Vietnamese expressions like hạnhphúcphướclành, and phúclành are historically and semantically related, though they represent distinct etymological layers. In fact they are made up of compounding of Sinitic-Vietnamese on Sino-Vietnamese roots. Cf. 良 (liáng, SV lương, VS lành)
妙法 miàofǎ pháp / phépmàu,phépmầu /diệuphápmiawfǎ*mewspqabmiraclephép and pháp are reflexes of 法, stratified by register: phép for colloquial/magical, pháp for formal/legal/religious. SV diệupháp, preserves Buddhist doctrinal nuance while in Sinitic-Vietnamese phépmàu, reflecting emotional and magical connotations.
寡婦guǎfù
goáphụ, goábụa,bàgoá,bàgiá,ởgoá,ởvậy /quảphụ
kʷɯabuw*kwra:ʔbɯʔwidowed womanA rich array of Vietnamese reflexes across Sino-Vietnamesevernacular, and idiomatic registers, each encoding different layers of phonological inheritance, semantic drift, and cultural framing via Vietnamization @ 寡 guă ~ ‘ở’ 於 yú (vu), ‘giá’, @ 婦 fù ~ ‘vợ’, ‘bụa’, ‘bà’ 婆 pó (bà)


Each row is immediately explicated and cross-referenced in chapter analysis, not only in terms of phonology but also register, semantic chain (noting compounds or domain expansions), and, when relevant, polysyllabic annotation (e.g., máyvitính – ‘computer,’ nướcmắm – ‘fish sauce’).

Digital tools and annotation practices

The methodological innovations in Chapter 1 extend beyond theoretical linguistics, grounded in explicit annotation protocols and strategic digital tool integration. Key components include:

  • Corpus annotation standards: Utilization of structured templates enables hierarchical, coded annotation of etymological trees, register transitions, and semantic chain pathways. These formats are fully compatible with automated parsing and scalable lexicon development.
  • Spreadsheet and case-by-variable formatting: Items are compiled in tabular form with columns for VS/SV representation, stratum/register classification, Middle and Old Chinese reconstructions, glosses, and comparative notes—facilitating both statistical modeling and graphical visualization.
  • Comparative database linkage: The corpus integrates with external linguistic databases, including classical Chinese rime books, Kangxi Dictionary, Shafer’s Sino-Tibetan research, numerous authors’ works on Middle Chinese and Old Chinese reconstruction, Austroasiatic etymological inventories, and compiled Vietnamese word lists by international linguists, which enables robust synchronic and diachronic querying across language families.
  • Register and tonogenesis tagging: Annotation protocols explicitly mark register (colloquial vs. literary), lexical stratum (ESV/LSV/RSV), and tonal origin—capturing features such as coda type, pitch contour, and onset class—in alignment with contemporary corpus linguistic standards.

These practices yield a replicable and extensible annotation framework, essential for long-term comparative research, digital humanities applications, and empirical testing of the AMC and substratum hypotheses advanced in the chapter.

Linking corpus findings to the chapter’s thesis

The corpus analysis in Chapter 1 provides compelling, multi-axis support for its central thesis: that the Sinitic-Vietnamese lexical stratum is not a superficial overlay derived from distant literary Chinese, but a deeply embedded, generative system. This system emerged through sustained bilingual contact with regional Sinitic lects and a Yue-derived proto-Vietic substrate—forming a foundational layer in the Vietnamese lexicon that reflects centuries of linguistic convergence, adaptation, and innovation.

Key points of thesis substantiation:

Depth and breadth of Sinitic integration: The presence of Early Sino-Vietnamese (ESV) and even pre-ESV elements—particularly within foundational lexical domains—demonstrates that the Sinitic layer extends far beyond scholarly or technical vocabulary. It constitutes a structural substratum of the Vietnamese lexicon. The diachronic continuity from pre-tonal, presyllabic forms preserved in conservative Vietic dialects to modern standardized VS/SV readings reflects not episodic borrowing, but a sustained and dynamic process of linguistic accretion—a gradual “lacquering” of the Vietnamese language over centuries of contact and adaptation.4 8.

Stratification over replacement: The coexistence of overlapping registers, recurrent doublets, and regionally differentiated variants reflects a layered lexical system in which native Vietic forms, Sinitic borrowings, and bilingual innovations actively interact. Rather than a linear process of substitution or overwriting, the Vietnamese lexicon reveals a dynamic stratification shaped by sustained contact, functional differentiation, and contextual adaptation.10

Creative adaptation and innovation: The use of compounding, semantic chaining, and native coinage with Sinitic material—especially within technical, administrative, and scholarly registers—attests to the active linguistic agency of Vietnamese speakers. Rather than passively transmitting reading glosses from Chinese texts, they domesticated and recontextualized imported morphemes, embedding them within indigenous syntactic and semantic frameworks to generate novel, functional expressions10

Comparative validation of the AMC model: Cross-linguistic data from Southwest Chinese lects, conservative Vietic varieties, and other Austroasiatic branches affirm the Red River Delta as a sustained contact zone. These comparisons substantiate the AMC model’s premise: that a localized Sinitic variety exerted deep phonological, lexical, and semantic influence on emerging Vietnamese, shaping its structure through prolonged and multidirectional interaction12

Synthesis and Conclusion

At its core, the corpus and its stratified analysis offer extensive, multi-dimensional, and empirically grounded validation of the argument that the Sinitic layer in Vietnamese is not a passive residue of literary Chinese, but a dynamic, creative, and foundational substrate shaped by localized, heterogeneous contact.

The Chapter 1 corpus—meticulously curated and annotated—serves as a benchmark for analytic and comparative corpus linguistics in the context of East and Southeast Asian language contact. Through structural precision, multi-tiered etymological mapping, detailed register and phonological annotation, and the explicit tracing of semantic chains and polysyllabic innovation, it reveals the embeddedness and generative capacity of the Sinitic-Vietnamese stratum within the broader Vietnamese lexicon.

Most importantly, the corpus affirms the central thesis: that Sino-Vietnamese (SV) is not merely a superimposed layer of foreign vocabulary, but a deeply indigenized stratum—lacquered into the linguistic fabric of Vietnamese through centuries of adaptation, semantic reconfiguration, and creative agency. This insight not only reframes the historical trajectory of Vietnamese but also establishes a new paradigm for approaching language contact and lexical stratification in global linguistic research.

References 

  1. Vietic languages .Wikipedia
  2. Mark Alves. Early Sino-Vietnamese Lexical Data and the Relative Chronology of Tonogenesis In Chinese And Vietnamese
  3. Mark Alves. 2009. Vietnamese Vocabulary. WOLD
  4. Mark Alves. Identifying Early Sino-Vietnamese Vocabulary via Linguistic, Historical Archaeological and Ethnological Data 
  5. Mark Alves. Notes on Sino-Vietnamese Historical Phonology
  6. John Phan. Lacquered Words: The Evolution Of Vietnamese Under Sinitic Influences From The 1st Century BCE Through The 17Th Century CE
  7. John Phan. Lacquered Words: The Evolution Of Vietnamese Under Sinitic…
  8. The Baxter-Sagart reconstruction of Old Chinese
  9. Mark Alves. From Vietic Presyllables To Vietnamese Simplex Onsets
  10. Sino-Vietnamese vocabulary.Wikipedia
  11. The Etymologies of Vietnamese Numeral Terms and Implications of …
  12. Historical Ethnolinguistic Notes on Proto-Austroasiatic and Proto ….
  13. Template: etymon.Wikipedia
  14. Baxter, William H. and Sagart, Laurent 2011. (STEDT)
  15. Stefan Th. Gries and Magali Paquot.Chapter 26 Writing up a Corpus-Linguistic Paper
  16. 漢典 zdic.net
  17. Nguyễn, Ngọc San. 1993. Tìm hiểu về Tiếng Việt Lịch sử. TP HCM: NXB Giáo dục.
  18. Shafer, Robert. 1966-1974. Introduction to Sino-Tibetan (4 volumes). Wiesbaden: Otto Harrassowitz.
  19. Thomas, David D. 1966. “Mon-Khmer Subgroupings in Vietnam,” in Norman Zide (ed.) Studies in Comparative Austroasiatic Linguistics. The Hague: Mouton.
  20. Luce, Gordon Hannington. 1965. “Danaw, a Dying Austroasiatic Language” in “Historical Linguistics” Indo-Pacific Linguistic Studies
  21. Han-Viet.com

Leave a Reply

Discover more from ziendan.com

Subscribe now to keep reading and get access to the full archive.

Continue reading