CoML | Resources

Discussions and opinions

For discussions about current topics in reverse engineering and infant development, follow and comment our Synthetic Learner Blog.

Teaching

The team is involved in teaching, see ITI-PSL Cognitive Engineering.

Software

The bootphon team develops pipelines for data analysis, speech processing or machine learning and distribute them as free software on github.

Databases

Articulation Index upgrade

… under construction …

Buckeye corpus speech recognition layer

… under construction …

Other zero resource tools

Here is a list of papers and open source implementations of these papers regarding unsupervised speech learning. This is given for documentation purposes without any warranty that these implementations will actually work or do anything on a new corpus. However, we are very interested in large scale testing and evaluation of these algorithms. Please report to us what you’ve found.

Discovery of subword units or subword representations

Discrete units, Bayesian approaches:

Lee, C. & Glass, J. (2012). A Nonparametric Bayesian Approach to Acoustic Model Discovery, ACL. [github]
Ondel, L., Burget, L., & Cernocky, J. (2016). Variational Inference for Acoustic Unit Discovery. Procedia Computer Science, 81, 80-86. [github]

Continuous representations, posteriorgrams:

Chen, H., Leung, C. C., Xie, L., Ma, B., & Li, H. (2015). Parallel inference of dirichlet process gaussian mixture models for unsupervised acoustic modeling: A feasibility study. In Proceedings of Interspeech. [code]
Michael Heck, Sakriani Sakti, Satoshi Nakamura (2016). Unsupervised Linear Discriminant Analysis for Supporting DPGMM Clustering in the Zero Resource Scenario. Procedia Computer Science, Volume 81, pp73-79. [the code is the same, plus kaldi]

Continuous representations, DNNs (this requires spoken term discovery):

Synnaeve, G., Schatz, T., & Dupoux, E. (2014, December). Phonetics embedding learning with side information. In Spoken Language Technology Workshop (SLT), 2014 IEEE (pp. 106-111). IEEE. [github]
Thiolliere, R., Dunbar, E., Synnaeve, G., Versteegh, M., & Dupoux, E. (2015). A hybrid dynamic time warping-deep neural network architecture for unsupervised acoustic modeling. In Sixteenth Annual Conference of the International Speech Communication Association. [github]

Spoken Term Discovery

DTW-based:

MODIS: Catanese, L., Souviraa-Labastie, N., Qu, B., Campion, S., Gravier, G., Vincent, E., Bimbot, F. (2013, August). MODIS: an audio motif discovery software. In Show & Tell-Interspeech 2013. [code]
Jansen, A., Van Durme, B. (2011, December). Efficient spoken term discovery using randomized algorithms. In Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on (pp. 401-406). IEEE. [github]

Bayesian approaches:

Johnson, M., Griffiths, T. L., Goldwater, S. (2006). Adaptor grammars: A framework for specifying compositional nonparametric Bayesian models. In Advances in neural information processing systems (pp. 641-648). [website]
Lee, C., O’Donnell, T., Glass, J. (2015). Unsupervised Lexicon Discovery from Acoustic Input, Transactions of Association for Computational Linguistics (TACL). [github]