Meet MultiRay: Meta AI’s new platform for efficiently running large-scale artificial intelligence (AI) models | Tech Rasta


Today’s state-of-the-art AI systems for handling text, images, and other methods achieve optimal performance by training a massive model with massive amounts of data and then training that model to be good at a single job (for example, detecting malicious speech). The result is a high-quality, high-priced specialty tool. The operating cost of very large models can quickly grow out of control if there are too many problems to solve. As a result, huge high-end models are rarely used in production and instead very small and simple models are usually used.

A new meta AI research multiray is designed to implement state-of-the-art AI models at scale to make AI systems more effective. With MultiRay, multiple models can share the same input. Only a fraction of the processing time and resources are used for each model, reducing the overall cost of these AI-based operations. By centralizing a business’s computing resources into one model, AI accelerators can easily scale and strategically trade off between computing resources and data storage. The universal models in MultiRay are fine-tuned to excel in a wide variety of applications.

Teams across Meta can develop and refine machine learning (ML) models with the help of MultiRay for a variety of uses, including subject tagging of posts and detecting hate speech. This method is more time and labor intensive than having multiple teams build massive end-to-end models independently.

MultiRay increases accessibility to Meta’s big core models by offloading calculations to specialized hardware such as graphics processing units (GPUs) and by keeping frequently used data in memory (cache), reducing the time and energy spent on recomputation. MultiRay currently runs 125 use cases across Meta, supports 20 million queries per second (QPS) and 800 billion daily queries.

Multiray uses massive, foundational models to accurately represent the input as a point in a high-dimensional vector space. Embedding represents input that is more suitable for machine learning. To simplify the processing of task-specific patterns, MultiRay provides embedding of input data (such as text and images) that can be used in place of raw input. MultiRay’s core models are trained to perform well on a variety of tasks, including similarity and classification. Due to the need to convey additional information, our embeddings are large (several kilobytes in size).

Centralized, massive models offer the following advantages:

  1. Multi-team amortization
  2. Reduced complexity in production and operation
  3. Shorter times between innovation and commercialization: a localized pace of change

A single request can be made concurrently using the MultiRay external API. To handle high volume requests from multiple customers simultaneously, MultiRay uses a cross-request batching mechanism inside. The logic only needs to be written once and can be fine-tuned to generate batches of the right size for the model and hardware. This batching is completely transparent to clients issuing requests, even when making significant improvements to performance, such as using a larger batch size when migrating to the latest generation of GPU accelerator hardware.

To reduce the time and energy spent recomputation, MultiRay uses a cache. It’s a multi-level cache designed to save money and time, with higher hit rates at the expense of slower access times. Each MultiRay server has its own fast but limited RAM-based local cache. Those caches are topped by a slower but more extensive flash memory-based globally distributed cache.

Check it out Reference Essay. All credit for this research goes to the researchers in this project. Also, don’t forget to join Our Reddit page And Dissent channelHere we share the latest AI research news, cool AI projects and more.

Tanushree Shenwai is a Consulting Intern at MarkTechPost. She is currently pursuing her B.Tech at the Indian Institute of Technology (IIT), Bhubaneswar. She is a data science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring new advancements in technologies and their real-life application.


Source link