Tel Aviv and IBM teams question the conventional benchmarking practices, which typically involve training models from scratch with random initialization as modeling long-range dependencies in sequences has led to notable architectural advancements, with state space models (SSMs) emerging as a significant alternative to Transformers. According to the team, this method may overestimate the differences between architectures. The researchers propose pretraining models using standard denoising objectives with downstream task data, a method they term selfpretraining (SPT). This approach significantly narrows the performance gap between Transformers and SSMs. For example, pretrained vanilla Transformers can match the performance of advanced SSMs like S4 on benchmarks such as the Long Range Arena (LRA). Specifically, SPT improved the best reported results of SSMs on the PathX-256 task by 20 points. Key findings from the study include: 1. Transformers vs. SSMs: Properly pretrained vanilla Transformers can achieve performance comparable to S4 on LRA tasks, challenging the notion that Transformers are less capable of modeling long-range dependencies. 2. Redundancy of Structured Parameterizations: Structured parameterizations in SSMs become mostly redundant with data-driven initialization through pretraining, suggesting that simpler models can match the performance of more complex architectures. 3. Effectiveness Across Data Scales: SPT is particularly beneficial when training data is scarce, with relative gains more pronounced with smaller datasets. 4. Adaptability of Convolution Kernels: Data-driven kernels learned via SPT adapt to specific task distributions, enhancing performance on long-sequence tasks. The study emphasizes the importance of incorporating a pretraining stage in model evaluation to ensure accurate performance estimation and simplify architecture design. This approach not only provides a fair comparison between different architectures but also highlights the efficiency of pretraining in leveraging task data. Arxiv: https://lnkd.in/enaH3mhu
George Z. Lin’s Post
More Relevant Posts
-
Retrieval Augmented Generation Architecture Retrieval augmented generation (RAG) is an architectural pattern that enables foundation models to produce factually correct outputs for specialized or proprietary topics that were not part of the model's training data. By augmenting users' questions and prompts with relevant data retrieved from external data sources RAG gives the model 'new' (to the model) facts and details on which to base its response. https://lnkd.in/dhdvqF48 IBM
To view or add a comment, sign in
-
🚀 Pioneering the Future of Data Storage: IBM Flash Systems 🚀 In today's digital age, where data is the new oil, the need for speed, efficiency, and reliability in data storage has never been more critical. IBM's Flash Systems are at the forefront of this revolution, setting new standards for innovation and performance in the industry. 🔹 Blazing Fast Speed: With ultra-low latency and high throughput, IBM Flash Systems are designed to handle the most demanding workloads. Whether it's AI, big data analytics, or high-frequency trading, these systems ensure your applications run at lightning speed. 🔹 Unmatched Reliability: Downtime is not an option. IBM Flash Systems offer enterprise-grade reliability with advanced data protection features. This ensures your critical data is always available, safeguarding your business operations. 🔹 Cost Efficiency: Think high performance comes with a high price tag? Think again. IBM's innovative architecture delivers exceptional value by optimizing performance and reducing total cost of ownership. It's a smart investment for any forward-thinking organization. 🔹 Scalability: As your business grows, so does your data. IBM Flash Systems offer seamless scalability, allowing you to expand your storage capacity without compromising on performance. This flexibility makes it easier to adapt to changing business needs. 🔹 Sustainability: In an era where sustainability matters, IBM is leading the way with energy-efficient designs that reduce power consumption and environmental impact. It's high performance with a conscience. 🔹 Innovative Technology: Powered by IBM's cutting-edge technology, including AI-driven analytics and predictive insights, Flash Systems enable smarter, data-driven decisions. It's not just about storing data; it's about leveraging it to drive innovation and growth. In summary, IBM Flash Systems are not just about storage; they're about empowering your business to achieve more. With unparalleled speed, reliability, cost efficiency, and scalability, these systems are redefining what's possible in the world of data storage. 🌟 Unlock the full potential of your data with IBM Flash Systems. The future is now. 🌟 #DataStorage #Innovation #IBMFlashSystems #TechRevolution #EnterpriseIT #FutureOfData #Sustainability #AI #BigData
To view or add a comment, sign in
-
As enterprise data continues to be critical for #AI, grow in complexity and volume, understanding its origin, movement, and transformations is crucial for making informed decisions. This is where Data Lineage comes in – a critical component of data governance that helps organizations track data from its source to its destination. 🤔 So, what is Data Lineage? 🤔 Data Lineage is the process of documenting the journey of data from its creation to its consumption, including all transformations, aggregations, and modifications along the way. It provides a complete and accurate picture of data's life cycle, enabling organizations to: 1️⃣ Improve data quality: By identifying data sources, processing steps, and potential errors, organizations can enhance data accuracy and reliability. 2️⃣ Enhance transparency and trust: Data Lineage provides a clear understanding of data's origin, helping to build trust among stakeholders, customers, and regulatory bodies. 3️⃣ Support regulatory compliance: By maintaining a record of data's life cycle, organizations can demonstrate compliance with regulations such as GDPR, HIPAA, and CCPA. ⚙ How does Data Lineage work? ⚙ Data Lineage solutions use metadata to create a visual representation of data's journey, providing insights into: - Data sources and ingestion - Data processing and transformations - Data storage and management - Data consumption and analytics 🚀 Benefits of Data Lineage 🚀 - Improved data governance and quality increased transparency and trust - Enhanced regulatory compliance - Better decision-making and analytics - Reduced data-related risks and costs And IBM has a product for it: IBM Manta Data Lineage! Watch the video from this page to learn more: https://lnkd.in/d7Bbi-Mt #DataLineage #DataGovernance #DataQuality #Transparency #Compliance #IBM #Manta
To view or add a comment, sign in
-
Modernize your IBM Z applications. Facilitate access to relational and traditional non-relational mainframe data and other data sources. Build AI applications directly from your IBM Z data. Learn more here: https://lnkd.in/eesGUEhA #AI #ApplicationModernization
IBM Data Virtualization Manager for z/OS
ibm.com
To view or add a comment, sign in
-
Dive into the details of IBM's Granite foundation model with our latest blog post! Learn about its training data and how it's shaping the future of technology. #IBM #RedHat
IBM's Granite foundation model: A detailed look at its training data
redhat.com
To view or add a comment, sign in
-
IBM is one of the largest corporate networks known to mankind. 🌐 With the health of that network riding on their back, the CIO Network Engineering team needs continuous, reliable, and transparent operational data that teams around the world can use in real-time. When their existing DataOps solution wasn't cutting the mustard, they turned to our smart data pipeline platform for rescue! 🦸 “We use StreamSets because it’s the only technology that handles volume at scale.” – Stephan Barabasi, big data, cloud architect, and data scientist. What's next? Plans to scale StreamSets beyond the CIO Network Engineering team. 🚀 https://bit.ly/49j4aLa #DataPipelines #DataOps #CIO
IBM: How Self-Service Data Supports Operational Excellence
https://streamsets.com
To view or add a comment, sign in
-
Modernize your IBM Z applications. Facilitate access to relational and traditional non-relational mainframe data and other data sources. Build AI applications directly from your IBM Z data. Learn more here: https://lnkd.in/dYQXsSfT #AI #ApplicationModernization
IBM Data Virtualization Manager for z/OS
ibm.com
To view or add a comment, sign in
-
See what IBM Client Engineering can do for your business. As a recent example, a global banking client is interested in converting custom payment file formats from numerous clients into industry standard formats by leveraging generative AI instead of manual conversions. Currently, any conversion from custom payment file formats involves a laborious manual process taking multiple weeks to develop code to execute the transformation. To add to this, the growing volume of clients and custom formats make the current manual process difficult to scale. The IBM Client Engineering. team co-created a 4 week Pilot leveraging watsonx.ai LLMs to generate Python scripts that can map custom payment file formats to ISO format with minimal human intervention. The team showcased that watsonx.ai was able to optimize worker productivity and labor costs, ultimately creating a scalable and more efficient file transformation process. With the adoption of watsonx.ai, this banking client will: 80% Reduction in processing time Reducing development time to ~2 days instead of 2-3 weeks Save over 27K+ hours in over 3 years, Attain ~$3.7MM in associated labor cost savings That’s the power of IBM. What can we do for your organization? #ibm #clientengineering #innovation watsonx #ai #financialservices #showdonttell https://lnkd.in/emhvcMK4
Client Engineering | IBM
ibm.com
To view or add a comment, sign in
-
What can your organization achieve with a modern #observability solution? Data from a new Forrester Consulting study showed that a composite organization that used the IBM Instana Observability platform achieved a 219% ROI over three years. Likewise, it saw a 90% reduction in troubleshooting time by providing high-fidelity data to the right people at the right time. #Instana #IBM IBM
Average 219% ROI: The Total Economic Impact™ of IBM Instana Observability - IBM Blog
https://www.ibm.com/blog
To view or add a comment, sign in
-
🚀 Discover how to design a scalable and stable on-premise infrastructure for distributed computing, big data, artificial intelligence, and scalable software projects. Read our article on maturity layers in technology infrastructure! #ITInfrastructure #DistributedComputing #BigData #AI #ScalableSoftware #Software #OnPremise #Jofrantoba 🔍 Want to build robust and efficient systems for your business? Learn how to follow a layered approach to design an on-premise infrastructure that meets your business and IT needs! #OnPremiseInfrastructure #EnterpriseTechnology #ITEfficiency #Software #Jofrantoba 🛠️ Unlock the keys to a solid and reliable technology infrastructure in our latest article. Don't miss out on the maturity layers for a scalable and stable on-premise infrastructure in cutting-edge projects! #EnterpriseTechnology #TechnologyInfrastructure #ITEfficiency #Software #Jofrantoba https://lnkd.in/eFGBRKwU
(PDF) Maturity Layers for a Scalable and Stable On-Premise Infrastructure in Distributed Computing, Big Data, Artificial Intelligence, and Scalable Software Projects - @jofrantoba
researchgate.net
To view or add a comment, sign in