Welcome to TsvetShop! Yulia Tsvetkov's research group with members at the Paul G. Allen School of Computer Science & Engineering at the University of Washington and the Language Technologies Institute of Carnegie Mellon University. Our group works on multidisciplinary research at the nexus of machine learning, computational linguistics and the social sciences to develop practical solutions to natural language processing problems that combine sophisticated learning and modeling methods with insights into human languages and the people who speak them.




Yulia Tsvetkov
Associate Professor

Graduate Students

Vidhisha Balachandran
PhD Student, CMU
Chan Young Park
PhD Student, CMU
Xiaochuang Han
PhD Student, UW
Orevaoghene Ahia
PhD Student, UW
Co-advisor: Noah A. Smith
Melanie Sclar
PhD Student, UW
Co-advisor: Yejin Choi
Lucille Njoo
PhD Student, UW
Shangbin Feng
PhD Student, UW
Tianxing He
Postdoc, UW
Fatemeh Mireshghallah
Postdoc, UW
Co-advisor: Yejin Choi
Kabir Ahuja
PhD student, UW
Stella Li
PhD student, UW
Farhan Samir
Visting student, UW


David helps Goliath: Inference-Time Collaboration Between Small Specialized and Large General Diffusion LMs
Xiaochuang Han, Sachin Kumar, Yulia Tsvetkov, and Marjan Ghazvininejad. Proc. NAACL 2024.
P3Sum: Preserving Author's Perspective in News Summarization with Diffusion Language Models
Yuhan Liu, Shangbin Feng, Xiaochuang Han, Vidhisha Balachandran, Chan Young Park, Sachin Kumar, and Yulia Tsvetkov. Proc. NAACL 2024.
Extracting Lexical Features from Dialects via Interpretable Dialect Classifiers
Roy Xie, Orevaoghene Ahia, Yulia Tsvetkov, and Antonios Anastasopoulos. Proc. NAACL 2024.
BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer
Akari Asai, Sneha Kudugunta, Xinyan Velocity Yu, Terra Blevins, Hila Gonen, Machel Reid, Yulia Tsvetkov, Sebastian Ruder, and Hannaneh Hajishirzi. Proc. NAACL 2024.
Trusting Your Evidence: Hallucinate Less with Context-aware Decoding
Weijia Shi, Xiaochuang Han, Mike Lewis, Yulia Tsvetkov, Luke Zettlemoyer, and Scott Wen-tau Yih. Proc. NAACL 2024.
SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation
Abe Bohan Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hongwei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, and Yulia Tsvetkov. Proc. NAACL 2024.
LatticeGen: Hiding Generated Text in a Lattice for Privacy-Aware Large Language Model Generation on Cloud
Mengke Zhang, Tianxing He, Tianle Wang, Lu Mi, Fatemehsadat Mireshghallah, Binyi Chen, Hao Wang, and Yulia Tsvetkov. Proc. NAACL 2024, findings.
KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models
Yuyang Bai, Shangbin Feng, Vidhisha Balachandran, Zhaoxuan Tan, Shiqi Lou, Tianxing He, and Yulia Tsvetkov. Proc. WebConf 2024.
Gen-Z: Generative Zero-Shot Text Classification with Contextualized Label Descriptions
Sachin Kumar, Chan Young Park, and Yulia Tsvetkov. Proc. ICLR 2024.
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory
Niloofar Mireshghallah, Hyunwoo Kim, Xuhui Zhou, Yulia Tsvetkov, Maarten Sap, Reza Shokri, and Yejin Choi. Proc. ICLR 2024, spotlight.
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting
Melanie Sclar, Yejin Choi, Yulia Tsvetkov, and Alane Suhr. Proc. ICLR 2024.
Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized Language Models
Shangbin Feng, Weijia Shi, Yuyang Bai, Vidhisha Balachandran, Tianxing He, and Yulia Tsvetkov. Proc. ICLR 2024, oral.
MatFormer: Nested Transformer for Elastic Inference
Fnu Devvrit, Sneha Kudugunta, Aditya Kusupati, Tim Dettmers, Kaifeng Chen, Inderjit S Dhillon, Yulia Tsvetkov, Hannaneh Hajishirzi, Sham M. Kakade, Ali Farhadi, and Prateek Jain. Proc. ENLSP @ NeurIPS 2023, best paper award.
GlobalBench: A Benchmark for Global Progress in Natural Language Processing
Yueqi Song, Catherine Cui, Simran Khanuja, Pengfei Liu, Fahim Faisal, Alissa Ostapenko, Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Yulia Tsvetkov, Antonios Anastasopoulos, and Graham Neubig. Proc. EMNLP.
Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models
Orevaoghene Ahia, Sachin Kumar, Hila Gonen, Jungo Kasai, David R. Mortensen, Noah A. Smith, and Yulia Tsvetkov. Proc. EMNLP.
FactKB: Generalizable Factuality Evaluation using Language Models Enhanced with Factual Knowledge
Shangbin Feng, Vidhisha Balachandran, Yuyang Bai, and Yulia Tsvetkov. Proc. EMNLP.
BotPercent: Estimating Twitter Bot Populations from Groups to Crowds
Zhaoxuan Tan, Shangbin Feng, Melanie Sclar, Herun Wan, Minnan Luo, Yejin Choi, and Yulia Tsvetkov. Proc. Findings of EMNLP.
Toward Human Readable Prompt Tuning: Kubrick's The Shining is a good movie, and a good prompt too?
Weijia Shi, Xiaochuang Han, Hila Gonen, Ari Holtzman, Yulia Tsvetkov, and Luke Zettlemoyer. Proc. Findings of EMNLP.
On the Zero-Shot Generalization of Machine-Generated Text Detectors
Xiao Pu, Jingyu Zhang, Xiaochuang Han, Yulia Tsvetkov, and Tianxing He. Proc. Findings of EMNLP.
TalkUp: A Novel Dataset Paving the Way for Understanding Empowering Language
Lucille Njoo, Chan Young Park, Octavia Stappart, Marvin Thielk, Yi Chu, and Yulia Tsvetkov. Proc. Findings of EMNLP.
Can Language Models Solve Graph Problems in Natural Language?
Heng Wang, Shangbin Feng, Tianxing He, Zhaoxuan Tan, Xiaochuang Han and Yulia Tsvetkov. Proc. NeurIPS, spotlight.
LEXPLAIN: Improving Model Explanations via Lexicon Supervision
Orevaoghene Ahia, Hila Gonen, Vidhisha Balachandran, Yulia Tsvetkov and Noah A. Smith. Proc. StarSEM.
Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker
Melanie Sclar, Sachin Kumar, Peter West, Alane Suhr, Yejin Choi and Yulia Tsvetkov. Proc. ACL, outstanding paper award.
From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models
Shangbin Feng, Chan Young Park, Yuhan Liu and Yulia Tsvetkov. Proc. ACL, best paper award.
SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control
Xiaochuang Han, Sachin Kumar and Yulia Tsvetkov. Proc. ACL.
KALM: Knowledge-Aware Integration of Local, Document, and Global Contexts for Long Document Understanding
Shangbin Feng, Zhaoxuan Tan, Wenqian Zhang, Zhenyu Lei and Yulia Tsvetkov. Proc. ACL.
Understanding In-Context Learning via Supportive Pretraining Data
Xiaochuang Han, Daniel Simig, Todor Mihaylov, Yulia Tsvetkov, Asli Celikyilmaz and Tianlu Wang. Proc. ACL.
On the Blind Spots of Model-Based Evaluation Metrics for Text Generation
Tianxing He, Jingyu Zhang, Tianle Wang, Sachin Kumar, Kyunghyun Cho, James Glass and Yulia Tsvetkov. Proc. ACL.
Examining Risks of Racial Biases in NLP Tools for Child Protective Services
Anjalie Field, Amanda Coston, Nupoor Gandhi, Alexandra Chouldechova, Emily Putnam-Hornstein, David Steier and Yulia Tsvetkov. Proc. FAccT.
Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey
Sachin Kumar, Vidhisha Balachandran, Lucille Njoo, Antonios Anastasopoulos and Yulia Tsvetkov. Proc. EACL.
Unsupervised Keyphrase Extraction via Interpretable Neural Networks
Rishabh Joshi, Vidhisha Balachandran, Emily Saldanha, Maria Glenski, Svitlana Volkova and Yulia Tsvetkov. Proc. EACL.
Correcting Diverse Factual Errors in Abstractive Summarization via Post-Editing and Language Model Infilling
Vidhisha Balachandran, Hannaneh Hajishirzi, William Cohen and Yulia Tsvetkov. Proc. EMNLP.
Referee: Reference-Free Sentence Summarization with Sharper Controllability through Symbolic Knowledge Distillation
Melanie Sclar, Peter West, Sachin Kumar, Yulia Tsvetkov and Yejin Choi. Proc. EMNLP.
Gradient-based Constrained Sampling from Language Models
Sachin Kumar, Biswajit Paria and Yulia Tsvetkov. Proc. EMNLP.
Gendered Mental Health Stigma in Masked Language Models
Wanyin Lin, Lucille Njoo, Anjalie Field, Ashish Sharma, Katharina Reinecke, Tim Althoff and Yulia Tsvetkov. Proc. EMNLP.
Challenges and Opportunities in Information Manipulation Detection: An Examination of Wartime Russian Media
Chan Young Park, Julia Mendelsohn, Anjalie Field and Yulia Tsvetkov. Proc. Findings of EMNLP.
Threat Scenarios and Best Practices to Detect Neural Fake News
Artidoro Pagnoni, Martin Graciarena, and Yulia Tsvetkov. Proc. COLING.
An Analysis of Emotions and the Prominence of Positivity in #BlackLivesMatter Tweets
Anjalie Field, Chan Young Park, Antonio Theophilo, Jamelle Watson-Daniels, and Yulia Tsvetkov. Proc. PNAS.
Speaker Information Can Guide Models to Better Inductive Biases: A Case Study On Predicting Code-Switching
Alissa Ostapenko, Shuly Wintner, Melinda Fricke, and Yulia Tsvetkov. Proc. ACL.
Controlled Analyses of Social Biases in Wikipedia Bios
Anjalie Field, Chan Young Park, Kevin Z. Lin, and Yulia Tsvetkov. Proc. TheWebConf, Wikimedia Foundation Research Award of the Year.   [demo]
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang, Jiahui Yu, Adams Wei Yu, Zihang Dai, Yulia Tsvetkov, and Yuan Cao. Proc. ICLR.
Controlled Text Generation as Continuous Optimization with Multiple Constraints
Sachin Kumar, Eric Malmi, Aliaksei Severyn, and Yulia Tsvetkov. Proc. NeurIPS.
SelfExplain: A Self-Explaining Architecture for Neural Text Classifiers
Dheeraj Rajagopal, Vidhisha Balachandran, Eduard Hovy, and Yulia Tsvetkov. Proc. EMNLP.
Evaluating the Morphosyntactic Well-formedness of Generated Texts
Adithya Pratapa, Antonios Anastasopoulos, Shruti Rijhwani, Aditi Chaudhary, David R. Mortensen, Graham Neubig, and Yulia Tsvetkov. Proc. EMNLP.
Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates
Xiaochuang Han and Yulia Tsvetkov. Proc. Findings of EMNLP.
Detecting Community Sensitive Norm Violations in Online Conversations
Chan Young Park, Julia Mendelsohn, Karthik Radhakrishnan, Kinjal Jain, Tushar Kanakagiri, David Jurgens, and Yulia Tsvetkov. Proc. Findings of EMNLP.
Efficient Test Time Adapter Ensembling for Low-resource Language Varieties
Xinyi Wang, Yulia Tsvetkov, Sebastian Ruder, and Graham Neubig. Proc. Findings of EMNLP.
Simple and Efficient ways to Improve REALM
Vidhisha Balachandran, Ashish Vaswani, Yulia Tsvetkov, and Niki Parmar. Proc. MRQA.
Improving the Diversity of Unsupervised Paraphrasing with Embedding Outputs
Monisha Jegadeesan, Sachin Kumar, John Wieting, and Yulia Tsvetkov. Proc. MRL.
Improving Span Representation for Domain-adapted Coreference Resolution
Nupoor Gandhi, Anjalie Field, and Yulia Tsvetkov. Proc. CRAC.
A Survey of Race, Racism, and Anti-Racism in NLP
Anjalie Field, Su Lin Blodgett, Zeerak Waseem, and Yulia Tsvetkov. Proc. ACL.
Machine Translation into Low-resource Language Varieties
Sachin Kumar, Antonios Anastasopoulos, Shuly Wintner, and Yulia Tsvetkov. Proc. ACL.
Synthesizing Adversarial Negative Responses for Robust Response Ranking and Evaluation
Prakhar Gupta, Yulia Tsvetkov, and Jeffrey P. Bigham. Proc. Findings of ACL.
Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics
Artidoro Pagnoni, Vidhisha Balachandran, and Yulia Tsvetkov. Proc. NAACL-HLT.
Controlling Dialogue Generation with Semantic Exemplars
Prakhar Gupta, Jeffrey P. Bigham, Yulia Tsvetkov, and Amy Pavel. Proc. NAACL-HLT.
DialoGraph: Incorporating Interpretable Strategy-Graph Networks into Negotiation Dialogues
Rishabh Joshi, Vidhisha Balachandran, Shikhar Vashishth, Alan Black, and Yulia Tsvetkov. Proc. ICLR.
Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models
Zirui Wang, Yulia Tsvetkov, Orhan Firat, and Yuan Cao. Proc. ICLR.
StructSum: Incorporating Latent and Explicit Sentence Dependencies for Single Document Summarization
Vidhisha Balachandran, Artidoro Pagnoni, Jay Yoon Lee, Dheeraj Rajagopal, Jaime Carbonell, and Yulia Tsvetkov. Proc. EACL.
Cross-Cultural Similarity Features for Cross-Lingual Transfer Learning of Pragmatically Motivated Tasks
Jimin Sun, Hwijeen Ahn, Chan Young Park, Yulia Tsvetkov, and David R. Mortensen. Proc. EACL.
Multilingual Contextual Affective Analysis of LGBT People Portrayals in Wikipedia
Chan Young Park, Xinru Yan, Anjalie Field, and Yulia Tsvetkov. Proc. ICWSM.
An Exploration of Data Augmentation Techniques for Improving English to Tigrinya Translation
Lidia Kidane, Sachin Kumar, and Yulia Tsvetkov. Proc. AfricaNLP.
End-to-End Differentiable GANs for Text Generation
Sachin Kumar and Yulia Tsvetkov. Proc. ICBINB.
Understanding Linguistic Accommodation in Code-Switched Human-Machine Dialogues
Tanmay Parekh, Emily Ahn, Yulia Tsvetkov, and Alan W. Black. Proc. CoNLL.
Automatic Extraction of Rules Governing Morphological Agreement
Aditi Chaudhary, Antonios Anastasopoulos, Adithya Pratapa, David R. Mortensen, Zaid Sheikh, Yulia Tsvetkov, and Graham Neubig. Proc. EMNLP.
Unsupervised Discovery of Implicit Gender Bias
Anjalie Field and Yulia Tsvetkov. Proc. EMNLP.
On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment
Zirui Wang, Zachary C. Lipton, and Yulia Tsvetkov. Proc. EMNLP.
Fortifying Toxic Speech Detectors Against Veiled Toxicity
Xianchuang Han and Yulia Tsvetkov. Proc. EMNLP.
A Computational Analysis of Polarization onIndian and Pakistani Social Media
Aman Tyagi, Anjalie Field, Priyank Lathwal, Yulia Tsvetkov, and Kathleen M. Carley. Proc. SocInfo.
LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for Multi-Granular Propaganda Span Identification
Sopan Khosla, Rishabh Joshi, Ritam Dutt, Alan W. Black, and Yulia Tsvetkov. Proc. SemEval.
A framework for the computational linguistic analysis of dehumanization
Julia Mendelsohn, Yulia Tsvetkov, and Dan Jurafsky. Frontiers in Artificial Intelligence.
Demoting Racial Bias in Hate Speech Detection
Mengzhou Xia, Anjalie Field, and Yulia Tsvetkov. Proc. SocialNLP.
A Generative Approach to Titling and Clustering Wikipedia Sections
Anjalie Field, Sascha Rothe, Simon Baumgartner, Cong Yu, and Abe Ittycheriah. Proc. WNGT.
A Deep Reinforced Model for Cross-Lingual Summarization with Bilingual Semantic Similarity Reward
Zi-Yi Dou, Sachin Kumar, and Yulia Tsvetkov. Proc. WNGT.
Balancing Training for Multilingual Neural Machine Translation
Xinyi Wang, Yulia Tsvetkov, and Graham Neubig. Proc. ACL.
Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions
Xiaochuang Han, Byron C. Wallace, and Yulia Tsvetkov. Proc. ACL.
Stress and Burnout in Open Source: Toward Finding, Understanding, and Mitigating Unhealthy Interactions
Naveen Raman, Minxuan Cao, Yulia Tsvetkov, Christian Kästner, and Bogdan Vasilescu. International Conference on Software Engineering -- New Ideas Track (ICSE-NIER).
Augmenting Non-Collaborative Dialog Systems with Explicit Semantic and Strategic Dialog History
Yiheng Zhou, Yulia Tsvetkov, Alan W Black, and Zhou Yu. Proc. ICLR.
What Code-Switching Strategies are Effective in Dialog Systems?
Emily Ahn, Cecilia Jimenez, Yulia Tsvetkov, and Alan W Black. Proc. SCiL.
Where New Words Are Born: Distributional Semantic Analysis of Neologisms and Their Semantic Neighborhoods
Maria Ryskina, Ella Rabinovich, Taylor Berg-Kirkpatrick, David Mortensen, and Yulia Tsvetkov. Proc. SCiL.
Topics to Avoid: Demoting Latent Confounds in Text Classification
Sachin Kumar, Shuly Wintner, Noah A. Smith, and Yulia Tsvetkov. Proc. EMNLP.
Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts
Luke M. Breitfeller, Emily Ahn, David Jurgens, and Yulia Tsvetkov. Proc. EMNLP.
Learning to Generate Word- and Phrase-Embeddings for Efficient Phrase-Based Neural Machine Translation
Chan Young Park and Yulia Tsvetkov. Proc. WNGT.
A Margin-based Loss with Synthetic Negative Samples for Continuous-output Machine Translation
Gayatri Bhat, Sachin Kumar, and Yulia Tsvetkov. Proc. WNGT.
A Dynamic Strategy Coach for Effective Negotiation
Yiheng Zhou, He He, Alan W Black, and Yulia Tsvetkov. Proc. SIGdial.
Entity-Centric Contextual Affective Analysis
Anjalie Field and Yulia Tsvetkov. Proc. ACL.
CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology
Aditi Chaudhary, Elizabeth Salesky, Gayatri Bhat, David R. Mortensen, Jaime G. Carbonell, and Yulia Tsvetkov. Proc. SIGMORPHON.
Quantifying Social Biases in Contextual Word Representations
Keita Kurita, Nidhi Vyas, Ayush Pareek, Alan W Black, and Yulia Tsvetkov. Proc. of Workshop on Gender Bias for NLP.
Contextual Affective Analysis: A Case Study of People Portrayals in Online #MeToo Stories
Anjalie Field, Gayatri Bhat, and Yulia Tsvetkov. Proc. ICWSM.
Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings
Thomas Manzini, Yao Chong, Yulia Tsvetkov, and Alan W Black. Proc. NAACL.
Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs
Sachin Kumar and Yulia Tsvetkov. Proc. ICLR.
Framing and Agenda-setting in Russian News: a Computational Analysis of Intricate Political Strategies
Anjalie Field, Doron Kliger, Shuly Wintner, Jennifer Pan, Dan Jurafsky, and Yulia Tsvetkov. Proc. EMNLP.
RtGender: A corpus for studying differential responses to gender
Rob Voigt, David Jurgens, Vinodkumar Prabhakaran, Dan Jurafsky, and Yulia Tsvetkov. Proc. LREC'18.
Native Language Cognate Effects on Second Language Lexical Choice
Ella Rabinovich, Yulia Tsvetkov, and Shuly Wintner. TACL.
Style Transfer Through Back-Translation
Shrimai Prabhumoye, Yulia Tsvetkov, Ruslan Salakhutdinov, and Alan W Black. Proc. ACL.

Download BibTeX.


Our work has been supported by the following organizations/companies:

Google Logo DARPA Logo NSF Logo AWS Logo Adobe Logo NVF Logo

IARPA Logo PNNL Logo Workhuman Logo Sloan Foundation Logo