Career Profile
- In the field of large language models, my work spans several key areas: optimizing attention mechanism, optimizing cross-entropy like objective function, optimizing reasoning ability for complex downstream tasks like Math Solving, Code Generation.
- In the field of efficient deep learning, my work spans several key areas: model compression
- In the field of multimodal large language model, my work spans several key areas: multimodal alignment and fusion, hallucination mitigation
Publications
VQ-logits:Compressing the Output Bottleneck of Large Language Models via Vector Quantized Logits
arxiv preprint
ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention
arxiv preprint
Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective
arxiv preprint
Power-Law Decay Loss for Large Language Model Finetuning: A Theory Perspective
arxiv preprint
Experiences
- Under the supervision of Professor Mingkai Zheng. My research primarily focuses on the Large Language Models, Efficient Deep Learning, Multimodal Large Language Model.
- Developing a reliable, high performance, low latency c++ trading system
- Developing a recalling module of search system which helps to enhance the indicators like AUC, Recall, etc
- Developing a A/B Testing Platform which helps to refine products of PCG(eg: Tencent Vedio, Tencent News, Wesee, etc)
- Lead the team to investigate and analyze performance issues with benchmarks;
- Maintained Mobile Cloud Module: solve cross-originproblem.
- Added observability facilities toDiDiFarm analyze bottlenecks and implemented a tool called performance monitor, which reduced the tail latency by 10.