8th NYAS Machine Learning Symposium 2014



home · about · subscribe

March 28, 2014 · -

I attended to NYAS 8th Machine Learning Symposium and here are the notes that I took from the event. It may contain errors and mistakes. If you find any, please let me know.
On personal view, it was worse than the previous machine learning symposium(7th) in both posters and also talks. Last year, the posters and talks were much more interesting to me. That being said, I could not visit all of the posters so take my word with a grain of salt.
The abstracts in pdf in here.

Machine Learning for Powers of Good

by Rayid Ghani

Following Spotlight Talks

Graph-Based Posterior Regularization for Semi-Supervised Structured Prediction:

Graph-propagation and CRF estimation => Joint objective, then to optimize and look at the KL divergence as well for both world parameters.

Relevant work is Posterior Regularization(PR) Linear Ganchev

She showed that it performs better than both CRF and Graph based approaches in her poster, but she did not compare speed of this approach with CRF or Graph based approaches. It is likely the method is slower than CRF but, I am not very familiar Graph based approaches and joint objective could be quite hard to optimize. So, the speed is could be much worse more than 2 times than CRF.

Learning from Label Proportions(LLP):

It attacks Binary learning problem with an extension of bag approach where bags represent the ratio of the labels that are known but individual labels are unknown. They try to solve the problem in a large margin framework trying to model the instances belonging to a particular label and try to increase margin with the other label(smv-like). - Extension of supervised learning objective with Bag Proportion Loss with model parameters with a proportion loss.

Generalization Error of LLP

Generative Image Models For Visual Phenotype Modeling

They have genome types of fish and they have features of the fish. In order to learn which genome type has effect on which fish trait, they propose an admixture model which tries to correlate the traits and genome.

Large Scale Learning - Scaling Graph-based semi supervised-learning

Structured Classification Criteria for Deep Learning for Speech Recognition (Second Keynote)

Before this talk, I knew that IBM is strong in deep learning(if I recall correctly, they had a poster last year for speech recognition) but I did not know that they published also strong papers for speech recognition last year. Google and Facebook get a lot of coverage for deep learning and that maybe rightly so, but IBM is also a strong player in the area.

Talk Structure

Bayesian Modeling for Speech Recognition

Sequence of phones are nice because if have a word to classify and you did not have that sample in the training set, you could “guess” the word from the sequence of phones.

Context affects the acoustic realization of a phone in the speech.

Context-dependent modeling

Structured Loss Functions

Stochastic Gradient Optimization

Speeding Training

Preconditioning in Sampling

Take-Home Mesages

Hessian-Free Optimization

Learning Guarantees of the Optimization

Large Scale Machine Learning(Accelerated)

Fast Scalable Comment Moderation on NYT

Active Learning at the New York Times

Key Observations

Optimal rate is achieved also for testing cos as long as each data point is seen only once.

To learn More about Stochastic Gradient Descent

Sorry about other spotlight talks, I am sure they were as interesting as the ones above but I was only able to take notes and follow this much.

All Rights Reserved

Copyright, 2020