Image by: Pixabay
Facebook is open sourcing a tool chain of machine learning (ML) and artificial intelligence (AI) tools that it uses to power many of its own products, including Translate, its open source project based on the company’s machine translation systems.
Facebook calls the tool chain “PyTorch 1.0.” It includes PyTorch, the open source deep learning framework Facebook pioneered around 15 months ago, the deep learning framework called Caffe2 launched two years ago, and finally, the Open Neural Network Exchange (ONNX).
Facebook’s PyTorch 1.0 Tool Chain
During F8, three presentations regarding NMT at Facebook were delivered by Engineering Managers Necip Fazil Ayan and Ves Stoyanov, and Research Scientist Juan Pino.
Pino explained that two years ago, Facebook experimented on ideas with PyTorch, and manually re-implemented any successful tests into Caffe2 for production. However, according to him, this process took a lot of time and did not scale.
So they developed ONNX, “an industry-wide effort led by Facebook, Amazon, and Microsoft,” Pino said.
“In our particular use-case,” he explained, “we leverage ONNX to essentially export a model from one framework to the other.” Pino said ONNX improved model deployment by becoming a middleman that made the process automatic instead of manual, speeding up re-implementation of PyTorch ideas into Caffe2 production environments.
This same tool chain that Facebook will be open-sourcing—PyTorch, ONNX, and Caffe2—is currently in use across the company’s products, such as Facebook and Messenger, Instagram, and Workplace.
A post in Facebook’s developer’s blog reveals that PyTorch 1.0 will be available in beta within the next few months, and “will include a family of tools, libraries, pre-trained models, and datasets for each stage of development, enabling the community to quickly create and deploy new AI innovations at scale.”
Facebook’s research on Multilingual Unsupervised and Supervised Embeddings (MUSE) is likewise being open-sourced. Stoyanov added: “we are also open sourcing our datasets for multilingual understanding.”
Training data limitations are a major hurdle for so-called low-resource languages, language pairs with very little parallel corpora between them. This makes it difficult to build fluent NMT systems for such pairs since neural network engines typically require a large amount of parallel corpora to be effective.
Facebook CTO Schroepfer said their early work in MUSE among many areas of research is “promising work to bring all the tools and technologies they have to 6,000 languages all around the world.”
Since last year, the social media giant has been besieged by a mountain of problems concerning data privacy, hate speech, bullying, propaganda, and fake news on the platform, which led to CEO Mark Zuckerberg’s testimony at Capitol Hill before the US Senate.