Large-Scale Knowledge Graph Construction from Multilingual Encyclopedia Corpora: Methods and Evaluation
This paper presents a novel framework for constructing large-scale knowledge graphs from multilingual encyclopedia corpora. We introduce a three-stage pipeline combining transformer-based entity extraction, cross-lingual alignment via contrastive learning, and graph-based disambiguation. Evaluated on 140+ languages, our approach achieves state-of-the-art results on benchmark datasets while maintaining computational efficiency suitable for continuous integration into living encyclopedia platforms.
Contrastive Fact Verification: Leveraging Adversarial Examples for Robust Claim Validation
We propose a contrastive learning approach to automated fact verification that uses adversarially generated counter-examples to improve model robustness. Our method trains verifiers to distinguish between subtle paraphrases of true and false claims, achieving a 23% improvement on cross-domain generalization benchmarks.
Epistemic Equity: Measuring and Mitigating Geographic Bias in Crowdsourced Knowledge Systems
This study presents a comprehensive audit of geographic representation across major crowdsourced knowledge platforms. We develop an epistemic equity index and demonstrate systematic underrepresentation of content from Global South perspectives, proposing algorithmic and community interventions to address these gaps.
Temporal Knowledge Representation: Modeling the Evolution of Scientific Concepts Across Decades
We introduce a temporal knowledge representation framework that models how scientific concepts evolve over time. Using a corpus of 2 million encyclopedia articles spanning 50 years, we trace the development of 14,000+ concepts and identify patterns of conceptual drift, convergence, and paradigm shifts.
Low-Resource Language Support in Semantic Search: A Zero-Shot Transfer Approach Using Cross-Lingual Embeddings
Addressing the language gap in semantic search, we present a zero-shot transfer framework using cross-lingual embeddings that supports 87 low-resource languages without any language-specific training data. Our approach leverages massively multilingual models combined with adaptive retrieval to achieve competitive search quality.
Causal Reasoning in Natural Language Explanations: A Framework for Automated Explanation Quality Assessment
This thesis develops a causal reasoning framework for evaluating the quality of natural language explanations in educational and encyclopedia contexts. We introduce the Explanatory Causal Fidelity (ECF) metric and demonstrate its correlation with human judgments of explanation quality across diverse domains.
Federated Knowledge Editing: Collaborative Update Mechanisms for Distributed Encyclopedia Systems
We propose a federated editing framework that enables consistent, conflict-free knowledge updates across distributed encyclopedia instances. Our protocol achieves 99.7% consistency while supporting concurrent edits from thousands of contributors, with real-time conflict resolution using version vector lattices.
Neurosymbolic Knowledge Acquisition: Combining Neural Language Models with Formal Ontology Reasoning
This paper bridges the gap between neural language understanding and symbolic reasoning for knowledge acquisition. We present a neurosymbolic architecture that uses LLMs for information extraction and description logic reasoners for consistency validation, achieving both scalability and formal guarantees.