Aarne Talman


I'm a computational linguist and natural language processing researcher working on language understanding, machine translation and search engines.

     

Learn more

About me


I have over 16 years of experience in software development, consulting, startup business and academic research. I'm a co-founder of Basement AI, a private AI research lab and consultancy. I'm also a part-time researcher and PhD student in Language Technology at University of Helsinki. My research focuses on natural language understanding, machine translation and machine learning. I currently work as a Senior NLP Engineer at Silo AI where I focus on developing machine translation models for our clients.

Research


My research focuses on Natural Language Processing and Machine Learning.

Natural Language Semantics

I study computational models of natural language meaning using machine learning. My research focuses on sentence representation models in the natural language inference task.

Multilingual NLP

I conduct research on representation learning of natural language meaning in a multilingual setting.

Neural Machine Translation

I develop machine translation models and state-of-the-art neural machine translation systems.

Papers and Talks


Papers

  1. Aarne Talman, Marianna Apidianaki, Stergios Chatzikyriakidis, Jörg Tiedemann. 2022. How Does Data Corruption Affect Natural Language Understanding Models? A Study on GLUE datasets. arxiv. [bibtex] [pdf] [data and code]
  2. Aarne Talman, Marianna Apidianaki, Stergios Chatzikyriakidis, Jörg Tiedemann. 2021. NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance. Proceedings of the 23rd Nordic Conference on Computational Linguistics. [bibtex] [pdf] [data and code]
  3. Aarne Talman, Antti Suni, Hande Celikkanat, Sofoklis Kakouros, Jörg Tiedemann and Martti Vainio. 2019. Predicting Prosodic Prominence from Text with Pre-trained Contextualized Word Representations. Proceedings of the 22nd Nordic Conference on Computational Linguistics. [bibtex] [pdf] [corpus and code]
  4. Aarne Talman, Umut Sulubacak, Raúl Vázquez, Yves Scherrer, Sami Virpioja, Alessandro Raganato, Arvi Hurskainen, and Jörg Tiedemann. 2019. The University of Helsinki submissions to the WMT19 news translation task. Proceedings of the Fourth Conference on Machine Translation: Shared Task Papers. [bibtex] [pdf]
  5. Aarne Talman and Stergios Chatzikyriakidis. 2019. Testing the Generalization Power of Neural Network Models Across NLI Benchmarks. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. [bibtex] [pdf]
  6. Aarne Talman, Anssi Yli-Jyrä and Jörg Tiedemann. 2019. Sentence Embeddings in NLI with Iterative Refinement Encoders. Natural Language Engineering 25(4). [bibtex] [pdf] [code]

Talks

  1. NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance. 2 June 2021, The 23rd Nordic Conference on Computational Linguistics, Reykjavik. [pdf]
  2. Predicting Prosodic Prominence from Text with Pre-trained Contextualized Word Representations. 14 November 2019, Research Seminar in Language Technology, University of Helsinki. [pdf]
  3. Predicting Prosodic Prominence from Text with Pre-trained Contextualized Word Representations. 2 October 2019, The 22nd Nordic Conference on Computational Linguistics, Turku. [pdf]
  4. Neural Network models of NLI fail to capture the general notion of inference, 8 March 2019, CLASP Seminar, University of Gothenburg. [pdf]
  5. State-of-the-Art Natural Language Inference Systems Fail to Capture the Semantics of Inference, 25 October 2018, Research Seminar in Language Technology, University of Helsinki. [pdf]
  6. Natural Language Inference with Hierarchical BiLSTM’s, 28 September 2018, FoTran 2018. [pdf]
  7. Natural Language Inference - Another Triumph for Deep Learning?, 23 November 2017, Research Seminar in Language Technology, University of Helsinki. [pdf]

Teaching


University of Helsinki

Other

Software and Data


Visit my GitHub profile for a more complete collection of software and data.

Software

  1. NLI Data Sanity Check: Data and scripts for our 2021 NoDaLiDa paper.
  2. Prosody: A system for predicting prosodic prominence from written text.
  3. Natural Language Inference: Natural language inference system written in Python and PyTorch implementing the HBMP sentence encoder.

Data

  1. Helsinki Prosody Corpus: The prosody corpus contains automatically generated, high quality prosodic annotations for the LibriTTS corpus (Zen et al. 2019) using the Continuous Wavelet Transform Annotation method (Suni et al. 2017).
    • Language: English
    • License: CC BY 4.0
    • Paper

CV


Download my full CV here.

Education

Employment

  • 2021 - present, Senior AI Engineer, Silo AI
    Working on natural language processing and search.
  • 2018 - present, Doctoral Candidate, Language Technology, University of Helsinki
    Working on computational semantics and natural language processing.
  • 2019 - present, Founder & CEO, Basement AI
    Basement AI is a Nordic artificial intelligence research lab and consulting company specializing in natural language processing and machine learning.
  • 2020 - 2021, UK CTO, Nordcloud
    Nordcloud is a leading public cloud professional and managed services company. Lead- ing a team of architects and engineers.
  • 2015 - 2018, Associate Director, Consulting, Gartner.
  • 2012 - 2015, Consultant, Accenture.
  • 2011 - 2012, Research Student, London School of Economics.
  • 2009 - 2011, Product Manager, Nokia.
  • 2008 - 2009, Manager, Nokia.
  • 2006 - 2008, Systems Analyst, Tieto.
  • 2006 - 2006 (2 months), Software Developer, Valuatum.

Let's Get In Touch!


Contact me by email or on social media!