MLBP Joining the IBM Data Science Community!

Mike Tamir, PhD

SVP/Chief ML Scientist, ML Faculty at UC Berkeley

Published Aug 1, 2019

We’re excited to announce that the Machine Learning Blueprint is joining the IBM DataScience Community! We’ve always strived to source high quality content across the web and put deep thought into our curations. However, continually delivering this every week is not easy, so after two years of publishing, we decided to take a pause.

Through our work with the IBM Community we identified an opportunity to join forces to grow a community of machine learning practitioners, we were thrilled at the prospect. Their mission was clear: provide a place for data scientists to interact with other experts, share support and insights and start dialogue around relevant topics. This aligns with our priorities.

We encourage you to check out the IBM Data Science Community, where you'll find:

Archives of all of the MLBP past issues, fully searchable & tagged
Tutorials, courses, demos, how-to guides, videos, contests, campaigns, in-person events, webinars, podcasts, AMAs and technical articles where you can network and grow your skills in data science
Thriving discussions forums with over 1,000 posts a month and a rapidly growing population of 114,000 members across the community platform.
A community leaderboard and badging program recognizing users for their engagement and contributions

To continue receiving the newsletter

Click to Join the IBM Data Science Community and Continue to Receive the Newsletter.

After today, MLBP will send out reminders of the full content available there.

It’s our sincerest intention to maintain the quality and consistency that Machine Learning Blueprint subscribers are accustomed to, and to grow this community into the premier outlet for all things machine learning. We’re looking forward to this journey, and we thank you for joining us on it.

- The Machine Learning Blueprint Editors

Spotlight Articles

An AI App that “Undressed” Women Shows How DeepFakes Harm the Most Vulnerable

Founders of the “DeepNude” app announced this month that it would officially be taken offline. The app used Generative Adversarial Networks (GANs) to conduct image translation and modification, allowing users to generate a fake image of a woman with her clothing removed from an original (clothed) image. Vice reported that the app was trained to specifically target women, evidently generating an image of a female body when provided images of men.

Machine Learning Blueprint's Take

While not the first instance of deepfakes being intimately misused against women , the DeepNude app is a startling example of how widely the technology can be abused. As with other audio and visual deepfake manipulation, this represents an especially concerning example of the risks inherent in generative modeling. Practitioners have a responsibility to lead open conversations about standards as well as methods of using machine learning to fight these potential abuses (see next spotlight), as the technology will no doubt improve, making it more difficult to detect such fakes with the human eye.

[Link]

Detecting Photoshopped Images in Adobe

UC Berkeley and Adobe team up to create a deep learning based approach (ResNet architecture) for detecting images of human faces that have been altered with the advanced PhotoShop tool Face Aware Liquify. It can detect with 95% accuracy which images have been altered, and identify the specific areas and methods used. Humans scored 53%, slightly better than guessing. Code and paper here

Machine Learning Blueprint's Take

While photoshop may not be a traditional machine learning approach, it certainly has dual-use consideration, but perhaps not at scale. This matters because we have an example of someone who releases a tool and the decoder ring to detect when it’s being used; this sets a healthy precedent (albeit a little bit late). Will we see more detection techniques for uncovering transformations with other tools?

[Link]

Boston Dynamics Robots Learn to Fight Back

You may have noticed some of the abuse the BostonDynamics robots receive during their training/demonstration to help foster resilience to environmental factors in the past. Due to a change in their loss function, the robots have done some strange behavior to achieve their tasks that could be chalked up to fighting back. There’s no saying it’s malicious at this point, but a robot might not have a notion of that. If you’ve read this far, Spoiler Alert: it’s a parody, and a clever one at that.

[Link]

To continue receiving the newsletter from IBM:

Click to Join the IBM Data Science Community and Continue to Receive the Newsletter.

Learning Machine Learning

Artists Tutorial to Using GAN’s

An end-to-end guide on creating art with a Cycle GAN that covers the deep learning mechanics, tips for artistically tuning, data-selection (arguably one of the most important aspects for honing in on a style), to even the computation setup and workflow.

Machine Learning Blueprint's Take

This tutorial might be a little advanced for the non-computationally inclined, so help your friends with getting setup. It’ll be a democratizing moment of this technology when we see more accessible toolkits available. Adobe could integrate technologly like this into their photoshop workflows, but this could bring up more dual-use concerns.

[Link]

Modern Deep Learning Techniques Applied to Natural Language Processing

A guide to all the SOTA methods, their lead-up technologies and some of the fundamentals employed for an array of NLP tasks. It covers all the papers and provides references to code libraries where available.

Techniques to Curb Overfitting in Self-Driving RC-Cars

A case study in how the training data matters more than complex model architectures, these hackathon participants were finding that tried and true optimal control methods were outperforming their deep learned RL methods because the models learned the race background, instead of how to follow race course lines! This breakdown outlines all the methods they applied and how they improved or didn’t improve race performance. In the end, it turns out using a style-transfer GAN improved performance by visually popping the course lines, allowing the model to ignore background images.

Introducing TensorFlow Privacy: Learning with Differential Privacy for Training Data

Differential Privacy in machine learning entails that the model does not learn or remember details about any particular data point, or user, during training; this is something a highly parameterized neural network is capable of. Learn how to use new features in TensorFlow Privacy out of the box to protect your users and be ahead of the data privacy curve.

Targeted Dropout - Finding Efficient Subnetworks in Over-Parameterized Models

Machine Learning News

An Open Source Toolkit for Debugging and Monitoring Neural Network Training

Microsoft open sources a library for real time DNN training monitoring with visualizations in Jupyter Notebooks. By treating all objects as streams and implementing lazy logging, you can observe almost any number of variables since they're only observed, not stored. You can transform or combine streams to make more meaningful observations, or can opt to store certain ones. There's also a number of pre & post-training task helpers like architecture visualization, layer stats, and dimensionality reduction visualizations for dataset exploration.

[Link]

Machine Learning has Been Used to Automatically Translate Long-Lost Languages

Use of a constrained word2vec type model is proving useful in helping linguists decode ancient languages, particularly those with limited corpora. They’re also able to use machine translation, leveraging the fact that language evolution is slow; they can use one ancient language and it’s structures to help decode another.

[Link]

Neural Code Searching

Using either unsupervised or supervised methods, Facebook Research introduces a way to search a corpus of code from normal search-engine style queries. The unsupervised method extracts key tokens from method snippets, and embeds them with fastText to create document vectors so similar code snippets are mapped nearby. Queries are also similarly embedded, and a FAISS search algorithm finds queries + code snippets with a close cosine similarity. The supervised approach is implemented differently, but requires a corpus of queries and accurate answers (think Stack Overflow); it unsurprisingly performs better. So far it does not seem like there is a published library for this. When it comes to ML on source code in general, check out this awesome-list

[Link]

Apple MLCore 3.0 Release Details

A huge update that adds new countless models, neural network layers, varying model precision, and a new model definition format using protobuf. iOS apps leveraging ML can seriously step up their game.

[Link]

TensorFlow2.0 Beta Released

Using Deep Learning to Curb Checkout Theft

Intel & Baidu Release on AI Training Processor - Nervana

Underworld of AI-gig workers

Interesting Research

Language, Trees, and Geometry in Neural Networks; A Visualization Technique to Understand BERT.

Weight Agnostic Neural Networks

Cloud Based Image Recognition Services are Still Highly Subjectable to Adversarial Attacks Through Simple Transformations

Please pass along to your family & friends, and join the IBM Data Science Community by following the below link:

Click to Join the IBM Data Science Community and Continue to Receive the Newsletter.

MLBP Joining the IBM Data Science Community!

Mike Tamir, PhD

SVP/Chief ML Scientist, ML Faculty at UC Berkeley

Spotlight Articles

Learning Machine Learning

Machine Learning News

Interesting Research

More articles by this author

Insights from the community

Others also viewed

Issue #291 - The ML Engineer 🤖

The Gradient Boosted Algorithm Explained!

Breaking BERT — How to break into Machine Learning

The Data Prep Kit and Open Source RAG

DATA Pill #049 - 91% of ML Models degrade in time, MLflow 2.3 and Secrets of Deep Reinforcement Learning

The six most painstaking steps in machine learning – what your team isn’t telling you

How to evaluate and select ML models + other resources

Your First Steps in Data Science: Top 10 Machine Learning Algorithms for Beginners

Between Test & Train

5 Best Machine Learning APIs for Data Science

Explore topics

Spotlight Articles

Learning Machine Learning

Machine Learning News

Interesting Research

MLBP 9: ONNX Shakes up the Deep Learning Landscape and Numpy Drops Support for Python 2.7

Dec 1, 2017

MLBP 8: Uber AI Open Sources Pyro- Probabilistic Deep Learning in Python

Nov 17, 2017

MLBP 7: TensorFlow’s moves towards PyTorch + How Hinton’s new CapNets might change everything

Nov 12, 2017

2016 Most Popular: Data Science and Machine Learning Articles

Dec 26, 2016

Top 10: Data Science and Machine Learning Articles in Aug

Sep 18, 2016

Top 10: Data Science and Machine Learning Articles in July

Aug 2, 2016

Top 10 posts: Data Science and Machine Learning in May

Jun 8, 2016

Top 10 most popular Data Science and Machine Learning posts in March

Apr 12, 2016

What Counting Jelly Beans Can Teach Us About Machine Learning

Mar 15, 2016

A Short Introduction to Using Word2Vec for Text Classification

Feb 21, 2016

Insights from the community

Others also viewed

Issue #291 - The ML Engineer 🤖

The Gradient Boosted Algorithm Explained!

Breaking BERT — How to break into Machine Learning

The Data Prep Kit and Open Source RAG

DATA Pill #049 - 91% of ML Models degrade in time, MLflow 2.3 and Secrets of Deep Reinforcement Learning

The six most painstaking steps in machine learning – what your team isn’t telling you

How to evaluate and select ML models + other resources

Your First Steps in Data Science: Top 10 Machine Learning Algorithms for Beginners

Between Test & Train

5 Best Machine Learning APIs for Data Science

Explore topics