Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Posts

[5-Day Gen AI Intensive Course] Day 1: Prompting

2 minute read

Published: November 10, 2024

What are the interesting things in the first day of 5-Day Gen AI Intensive Course?

datasets

Dataset Details

MultiMed-ST

Published: April 04, 2025

This dataset is an extended version of leduckhai/MultiMed.

projects

publications

MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation

Published in The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025), 2025

Multilingual speech translation (ST) in the medical domain enhances patient care by enabling efficient communication across language barriers, alleviating specialized workforce shortages, and facilitating improved diagnosis and treatment, particularly during pandemics. In this work, we present the first systematic study on medical ST, to our best knowledge, by releasing MultiMed-ST, a large-scale ST dataset for the medical domain, spanning all translation directions in five languages: Vietnamese, English, German, French, Traditional Chinese and Simplified Chinese, together with the models. With 290,000 samples, our dataset is the largest medical machine translation (MT) dataset and the largest many-to-many multilingual ST among all domains. Secondly, we present the most extensive analysis study in ST research to date, including: empirical baselines, bilingual-multilingual comparative study, end-to-end vs. cascaded comparative study, task-specific vs. multi-task sequence-to-sequence (seq2seq) comparative study, code-switch analysis, and quantitative-qualitative error analysis. All code, data, and models are available online: this https URL.

Recommended citation: Le-Duc, K., Tran, T., Tat, B.P., Bui, N.K.H., Dang, Q., Tran, H.P., Nguyen, T.T., Nguyen, L., Phan, T.M., Tran, T.T.P. and Ngo, C., 2025. MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation. EMNLP 2025 arXiv:2504.03546.
Download Paper

Transformer Encoder and Multi-features Time2Vec for Financial Prediction

Published in The 33rd European Signal Processing Conference (EUSIPCO 2025), 2025

Financial prediction is a complex and challenging task of time series analysis and signal processing, expected to model both short-term fluctuations and long-term temporal dependencies. Transformers have remarkable success mostly in natural language processing using attention mechanism, which also influenced the time series community. The ability to capture both short and long-range dependencies helps to understand the financial market and to recognize price patterns, leading to successful applications of Transformers in stock prediction. Although, the previous research predominantly focuses on individual features and singular predictions, that limits the model’s ability to understand broader market trends. In reality, within sectors such as finance and technology, companies belonging to the same industry often exhibit correlated stock price movements. In this paper, we develop a novel neural network architecture by integrating Time2Vec with the Encoder of the Transformer model. Based on the study of different markets, we propose a novel correlation feature selection method. Through a comprehensive fine-tuning of multiple hyperparameters, we conduct a comparative analysis of our results against benchmark models. We conclude that our method outperforms other state-of-the-art encoding methods such as positional encoding, and we also conclude that selecting correlation features enhance the accuracy of predicting multiple stock prices.

Recommended citation: Bui, N.K.H., Chien, N.D., Kovács, P. and Bognár, G., 2025. Transformer Encoder and Multi-features Time2Vec for Financial Prediction. EUSIPCO 2025 arXiv:2504.13801.
Download Paper

Bùi Nguyễn Kim Hải

Sitemap

Pages

Page Not Found

Who am I?

Archive Layout with Content

Posts by Category

Posts by Collection

CV

Datasets

Markdown

Page not in menu

Page Archive

Portfolio

Projects

Publications

Sitemap

Posts by Tags

Talk map

Terms and Privacy Policy

Write-ups

Jupyter notebook markdown generator

Posts

[5-Day Gen AI Intensive Course] Day 1: Prompting

datasets

VietMed-ID

Dataset Details

MultiMed-ST

projects

publications

MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation

Transformer Encoder and Multi-features Time2Vec for Financial Prediction