Jinyu H.
Greater Boston
859 followers
500+ connections
View mutual connections with Jinyu
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
View mutual connections with Jinyu
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
About
I am an infrastructure engineer with a passion for unsolved, difficult production…
Experience
Languages
-
Chinese
-
View Jinyu’s full profile
Other similar profiles
-
xin zhang
Issaquah, WAConnect -
Arnab Roy
Mercer Island, WAConnect -
Michael Zhang
San Francisco Bay AreaConnect -
Fengnan Yue
Mountain View, CAConnect -
Atanu Ghosh
Santa Clara, CAConnect -
Aishvarya Singh
Sunnyvale, CAConnect -
Xuetao Yin
Greater Seattle AreaConnect -
Yanning Liu
Mountain View, CAConnect -
Vivek Hamirwasia
London Area, United KingdomConnect -
Ziyao Liu
Software Engineer
San Francisco Bay AreaConnect -
Ming Liu
Kirkland, WAConnect -
Vignesh Narayanan
San Francisco Bay AreaConnect -
Ding Ma
San Francisco Bay AreaConnect -
Bhargav Krishna yb
Palo Alto, CAConnect -
Hong Lei
Engineering Manager at Meta
San Francisco Bay AreaConnect -
Harsh Ranjan
Bellevue, WAConnect -
Lifei Chen
Mountain View, CAConnect -
Alec H.
farming YOE
Mountain View, CAConnect -
Qihui Li
Seattle, WAConnect -
Hsuan-Yi Chu
United StatesConnect
Explore more posts
-
Matthew O'Keefe, Ph.D.
Although some #Kakfa vendors are considering S3 Express One Zone for write caching, a model developed by my colleague Jack Vanlightly suggests this works only in a few cases. And with cloud vendors like Microsoft reducing or eliminating cross availability zone networking charges, traditional Kafka replicated logs still work well for many workloads. https://lnkd.in/gHrNPfSZ How do the costs compare of implementing a low-latency write-ahead-log (WAL) on S3 Express One Zone and one implemented as a State-Machine-Replication (SMR) system (such as Paxos/Raft/Kafka)? I built a cost model for both to find out.
4 -
Thomas Cleberg
Chip Huyen on Twitter today below. I 100% agree. Metric hygiene is system hygiene. LLM-Generated metrics require a higher standard of care than they're currently receiving, and much more scrutiny. Mind the compasses. 🐦 A big issue I see with AI systems is that people aren't spending enough time evaluating their evaluation pipeline. 1. Most teams use more than one metrics (3-7 metrics in general) to evaluate their applications, which is a good practice. However, very few are measuring the correlation between these metrics.If two metrics are perfectly correlated, you probably don't need both of them. If two metrics strongly disagree with each other, either this reveals something important about your system, or your metrics just aren't trustworthy.2. Many (I estimate 60 - 70%?) use AI to evaluate AI responses, with common criteria being conciseness, relevance, coherence, faithfulness, etc. I find AI-as-a-judge very promising, and expect to see more of this approach in the future.AI-as-a-judge scores aren’t deterministic the way classification F1 scores or accuracy are. They depend on the judge's model, the judge's prompt, and the use case. Many AI judges are good, but many are bad.Yet, very few are doing experiments to evaluate their AI judges. Are good responses given better scores? How reproducible the scores are -- if you ask the judge twice, do you get the same score? Is the judge's prompt optimal? Some aren’t even aware of the prompts their applications are using, because they use prompts created by eval tools or by other teams.Also fun fact I learned from a (small) poll yesterday: some teams are spending more money on evaluating responses than on generating responses 🤯 https://lnkd.in/gYnTRZW3
2 -
Niko Skrypnik
Well, that's an interesting, but using more sophisticated ChatGPT o1 model for coding actually leads to exponentially more time spent for debugging and understanding the code. Here we're facing a problem with CoT approach - perhaps it really scores high on benchmarks, yet the problem with benchmarks is that they all assume just theoretical right/wrong answers within known knowledge, and all the tests and benchmarks assumes concise answer. Yet the software development is far above that as most of the modern software systems which are created for humans and humans only includes myriad aspects which should be tailored just right, so magic is happening when users are using the software. Although using o1 may benefit for solving concise tasks which can be formulated in the concise way it still far from even moving toward replacing e.g. software engineers and perhaps other jobs as well. It is yet again far from even be more useful than ChatGPT 4. So unfortunately we're facing again all this AI marketing hype. Yes, the new approach is interesting, but it suffers from diminishing usefulness. It generates more code, explanations to the code, but... this generated output is getting harder and harder to use right away with assumption that it will just work. AI can be integrated really into the processes as already a lot of ML is, yet we need to have humans capable of understanding what does it do to make sure that what it does, suggests, writes can be applied e.g. for solving a business task and it won't do more harm than good. OK strawberry you've passed PhD tests which I didn't even hear about. But now you remind me rather crazy professor which should be kept on guard so even before you can be integrated we need to hire two another guys into our company to keep you on check make sure your output is rather useful at all
1 -
Mengtao Yuan
Just did a keynote in PyTorch conference, on LLM deployment. Thanks for the great work of PyTorch team, our partners, and the interests from the audience! Please try out our deployment ecosystem of torchchat, torch.compile, torch.export, AOT Inductor, torchao and ExecuTorch, and checkout all the amazing talks, posters and demos in the conference!
503 Comments -
Sushanth Reddy
🚀 Big News for Mac Users! 🚀 🔍 LM Studio(https://lmstudio.ai/) now ships an MLX(https://lnkd.in/gUeed3TP) backend! If you have an M1, M2, or M3 silicon chip MacBook's, you can run any large language model (LLM) from the Hugging Face Hub on your Mac with blazing speed. 🏎️💨 Say goodbye to lag—optimize your AI workflows and bring the power of LLMs right to your desktop. Whether you're coding, building apps, or diving into data science, this update makes it easier than ever to tap into cutting-edge machine learning. 🎉 💡 Get ready to supercharge your Mac’s performance! #LLMs #HuggingFace #MacOS #AppleSilicon #mlx
181 Comment -
Peng Wu
Our partners have asked for this many times before. Now, we have answered your requests. We have just shared a whole set of the Meta PyTorch team's 2024 H2 roadmaps. We defined two roadmaps for PyTorch Compiler: the PyTorch Compiler Core roadmap (for everything torch.compile()) and the PyTorch Compiler Deployment roadmap (for everything PT2 export-path). Enjoy! We also invite our partners to share their PyTorch roadmaps with the community. https://lnkd.in/eJDxemXw
1122 Comments -
Gabriel Martín Blázquez
One of the most exciting features that we released with Argilla distilabel 1.3.0 is allowing to scale and distribute synthetic data generation pipelines with Anyscale Ray. We've added a section in the docs explaining how to execute a pipeline using Ray, and also with slurm:
12 -
Caixue Lin
I've been quite occupied this week, but I haven't forgotten to summarize the key takeaways from last week's Kubecon 24 held in Salt Lake City. Both I and the team enjoyed the excellent chances to interact and exchange ideas with numerous Kubernetes innovators and open source advocates from all over the world. Notably, our team successfully presented two interesting talks regarding Bytedance's open source project Kubewharf, specifically focusing on the Godel scheduler and Katalyst resource management aspects. These talks have generated much influence within the community. Huge thanks to Lintong Jiang, Yue 'Abby' Y., Jia Deng & @Mingmeng Luo for their great delivery of the talks. Please check out the following links for the details: - Godel scheduler (Bing Li, Yue Yin, Lintong Jiang): Talk link - https://lnkd.in/gmXsgGFY; Github: https://lnkd.in/gR-z32xk - Service profiling & Katalyst (Jia Deng, Cong Xu, Mingmeng Luo): Talk link - https://lnkd.in/gHN-Q33x; Github: https://lnkd.in/gZ4FTZ8V Thanks to the feedback from the community: https://lnkd.in/gDXqipRZ It's so rewarding to have conversations with our internal open source advocates, namely Ting Zou (Tina Tsou), He Cao, James Xiang, Ye Xu. Particularly Ting Zou (Tina Tsou) has provided useful connections on how to interact with the community and also rich ideas on how to improve our contribution & influence back to the community. Additionally, I had the opportunity to meet external K8s open source innovators like Yin Ding from Google, Wei Huang from Apple, Richard Sikang Bian from Ants Group, Chen Wang from IBM/Redhat, and a dozen of people from LinkedIn, CoreWeave, Datadog, and many others. I was so grateful for the wonderful time spent sharing your insights on all the fascinating k8s topics. Looking ahead, we'll aim to make even greater open source contributions to give back to the remarkable K8s community. #kubecon #CloudNativeCon #CloudNative #Kubernetes #K8s
48 -
Ted Werbel
Many folks often say that LLMs are just pattern matching… and contrary to some, I would actually agree with that sentiment! However, I really don’t think that scaling LLMs alone will help us breakthrough the next intelligence plateau. Surely, bigger models running faster and faster on smaller hardware will help.. but let’s take a moment to draw a comparison to human cognition and the brain 🧠 Pattern matching is the essence of reasoning but take the human brain for example - even though pattern matching is inherent to everything our brain does, we still have dedicated “modules” for various functions on top of pattern matching. And this is exactly the kind of system design we’re heading towards with agentic systems and cognitive architecture! Let's compare... > Prefrontal Cortex: Helps in analyzing trades offs, long-term consequences and planning complex actions (LLM Compiler/LDB, Self-Discover) > Hippocampus: Involved in forming new memories and integrating past experiences (stored as patterns) into current reasoning processes (GraphRAG + LGGMs + CLIN and other long-term memory systems for agents that use LLMs to pre-process, cluster and route to certain memories) > Basal Ganglia: Contributes to pattern recognition and procedural learning, helping automate certain processes based on recognized patterns (Automatic routing to SLMs, deterministic workflows and non-deterministic action planning) > Parietal Cortex: Integrates sensory information from different modalities, helping with spatial reasoning and the understanding of numerical patterns (multi-modal models, tools, environmental grounding and routing to prompting techniques specialized for math or other more nuanced capabilities) > Anterior Cingulate Cortex: Monitors conflict, makes decisions when there are competing patterns or information and adjusts strategies (Continuously learning models and systems inspired by CLIN and MedAgent-Zero + guardrails) > Default Mode Network: Active when the brain is at rest but also plays a role in reflective thought, daydreaming and consolidation of patterns into larger conceptual frameworks (Layered graph-of-thought consolidation in clusters / abstractions of thought with MCTS) I'm no neuroscientist so take what I say with a grain of salt... but I do think of LLM exploration of latent space and pattern matching as an essential substrate. However, it's the interplay between pattern matching and other cognitive processes that leads to authentic reasoning ✨ In 1-2 years time with enough speed + compute, these post training optimization techniques with environmentally grounded agentic systems will be what brings about authentic, human-like reasoning. And once these reasoning traces and agent trajectories get baked into the underlying models, this will only get better and better…
52 Comments -
Gayathri G
🚨 Big News from Meta Connect! 🚨 Meta just dropped its Llama 3.2 models on Watsonx, including the first multimodal 11B and 90B Vision models! Key Highlights: - Image-in, text-out support for tasks like document understanding and image captioning. - High-res image support (up to 1120x1120) for classification, OCR, and more. - Lightweight Llama 3.2 models (1B & 3B) run on most devices with low latency and enhanced privacy. - New multimodal safety model: Llama Guard. Meta continues leading with efficient, open-source AI models! Groq now support Llama 3 models for even faster AI Performance #AI #Meta #Llama32 #TechInnovation #MultimodalAI #Watsonx #genai
26 -
Parag Paul
#LLM sizes will be a significant consideration in the next generation of products that we will be witnessing in the coming years. Router models cannot be the go-to solution eventually. As a founder in this domain, one needs to think beyond constrained boxes and democratize these powerful systems as much as possible for small and medium business. We are planning to set things straight in the coming months. #Angel #angelfunding #VC #investments #VCfunding #AI #AIStartup #Quantization #LLMs #LLM
4 -
Chi Wang
🚀 AG2: AutoGen New Chapter Announcement 🚀 When Prof. Qingyun Wu from Penn State University and myself started AutoGen, we set our mission to enable next-gen AI agents to solve complex real problems, beginning with building a foundational OSS framework for agentic AI programming. The achievements of AutoGen since then have been nothing short of extraordinary with all the support from this amazing community. But this is just the first step. There is a long road ahead to reach the North Star. With more members from diverse organizations in both industry and academia (including Google, Meta, NVidia, Berkeley, MIT, UW, Cambridge, Princeton, etc) joining the mission, we believe it’s time for a bold new chapter – AutoGen is becoming AG2 (https://lnkd.in/gwPRRgTk), a new home to foster open collaboration in accomplishing our mission. This isn’t just a rebrand; it’s a reimagination. AG2 represents our commitment to push boundaries, drive innovation, and focus even more sharply on what our community needs to thrive. The new structure will amplify our collective impact and open new avenues for growth. → NEW REPO: https://lnkd.in/gwPRRgTk (please give it a star ⭐ ) → CURRENT PACKAGES: `ag2`, `autogen` and `pyautogen` (they're identical, just pick your favorite alias) → CURRENT VERSION: v0.3.2 → OFFICIAL DOCS: https://lnkd.in/gg3tUAzi → SAME DISCORD: https://lnkd.in/gxD6c-Uk What this means for users: → If you're using autogen or pyautogen packages → You're good to keep using them → These packages are now maintained at AG2 (https://lnkd.in/gwPRRgTk) → No breaking changes planned for v0.4 → For support/issues going forward, use GitHub (https://lnkd.in/gwPRRgTk) & Discord (https://lnkd.in/gxD6c-Uk) Full announcement on Discord: https://lnkd.in/gRR7Qviv
41266 Comments -
Jeremy Owen
The more research I read and the more I interact with LLMs, the more I wonder if some issues with combinatorial logic could be addressed by creating cyclic decoder-only models with a gating function for output. This approach might also enable a dramatic reduction in overall parameter count requirements for comparable performance. My understanding is that as model depth increases, higher-level abstractions form in the upper layers while some lower-level details also need to be propagated for use in the output token distribution. It seems larger models are effectively "unrolled" and need a fairly wide top to retain concept density as depth increases. By cycling a smaller graph, concepts and relations can intermingle more easily. If more layers of the smaller model are combined as input to generate the token distribution, they can specialize, reducing the need to propagate as much information up the graph for similar quality. At least, that's my hypothesis. The cycling could have either a learned gating function and/or a set number of cycles. I saw one research paper that seemed to touch on some of concepts, but I can't find it now. If anyone knows of other existing papers related to this idea, please let me know. If I've learned one thing, it's that there are almost no original ideas in model architecture. :)
9 -
Ganesh Venkatesh
Come visit Cerebras booth at #Neurips 2024! Along with all the work on the training front, for the first time we will also be talking about some of the ML research we are doing for LLM inference. If you want to know how we are using synthetic data to make our inference efficient and blazing fast, then check out the poster by Vithu Thangarasa.
26 -
Kunal Verma
Structured output generation from LLMs ? LLMs often struggle with generating structured outputs consistently, impacting reasoning and classification differently. A research paper “Let me Speak Freely? A study on the Impact of Format Restrictoins on Performance of Large Language Models" reveals key trade-offs and strategies to balance flexibility and structure for better performance. Results: TL:DR Reasoning Tasks : Use NL-to-F, maintain a balance between reasoning and structured output. In simple words, get the response in one-call and then format the response in second-call. Classification Tasks : Use Strict-Mode, the performance thrives under strict constraints. Detailed analysis : https://lnkd.in/g-72KeEr
13 -
🤖 Jacky Liang
🔥 𝐈 𝐚𝐦 𝐬𝐞𝐞𝐤𝐢𝐧𝐠 𝐝𝐞𝐯 𝐫𝐞𝐥 𝐩𝐨𝐬𝐢𝐭𝐢𝐨𝐧𝐬: 𝐡𝐢𝐫𝐞 𝐦𝐞 𝐭𝐨 𝐡𝐞𝐥𝐩 𝐲𝐨𝐮𝐫 𝐝𝐞𝐯𝐞𝐥𝐨𝐩𝐞𝐫 𝐜𝐮𝐬𝐭𝐨𝐦𝐞𝐫𝐬 𝐬𝐮𝐜𝐜𝐞𝐞𝐝! My name is Jacky, and I am 𝐚𝐜𝐭𝐢𝐯𝐞𝐥𝐲 seeking developer relations roles. My last post garnered some awesome convos with folks, so I wanted to put the feelers out again. 📮 𝐋𝐞𝐭'𝐬 𝐜𝐡𝐚𝐭: https://lnkd.in/eBHf5_DT Most recently, I worked for Pinecone as a senior developer advocate under the incredible Bear Douglas and Grigoriy (Greg) Kogan where I learned from the best of the industry to build + run a world-class developer advocacy team. I built and launched the viral Pinecone Shop The Look (https://lnkd.in/dxs6zJ5t) sample app that even got Mark Zuckerberg himself responding to netting over 300,000 impressions! I am ready for hire - I have my resume, work samples, and professional references ready. My core tenants as a dev advocate is: 1. Build 2. Communicate 3. Achieve company goals 4. Mentor 👉 𝘐 𝘸𝘰𝘶𝘭𝘥 𝘥𝘦𝘦𝘱𝘭𝘺 𝘢𝘱𝘱𝘳𝘦𝘤𝘪𝘢𝘵𝘦 𝘪𝘧 𝘺𝘰𝘶 𝘤𝘰𝘶𝘭𝘥 𝘭𝘪𝘬𝘦, 𝘤𝘰𝘮𝘮𝘦𝘯𝘵, 𝘢𝘯𝘥 𝘴𝘩𝘢𝘳𝘦 𝘵𝘩𝘪𝘴 𝘱𝘰𝘴𝘵 𝘵𝘰 𝘩𝘦𝘭𝘱 𝘪𝘵 𝘳𝘦𝘢𝘤𝘩 𝘮𝘰𝘳𝘦 𝘢𝘶𝘥𝘪𝘦𝘯𝘤𝘦𝘴. 𝘛𝘩𝘦 𝘫𝘰𝘣 𝘮𝘢𝘳𝘬𝘦𝘵 𝘪𝘴 𝘪𝘯𝘤𝘳𝘦𝘥𝘪𝘣𝘭𝘺 𝘵𝘰𝘶𝘨𝘩 𝘳𝘪𝘨𝘩𝘵 𝘯𝘰𝘸, 𝘢𝘯𝘥 𝘢𝘯𝘺 𝘭𝘪𝘵𝘵𝘭𝘦 𝘣𝘪𝘵 𝘩𝘦𝘭𝘱𝘴 𝘢 𝘭𝘰𝘵, 𝘵𝘳𝘶𝘭𝘺 🙏
3 -
Zhoutong Fu
While recent open-source models have gained attention, their accompanying technical reports often remain undervalued. Unlike the high-level and somewhat opaque GPT model technical reports, those from Gemma and Llama are detailed and methodologically robust. These reports provide comprehensive insights into data handling, model architecture, and training processes, effectively serving as runbooks for any company looking to develop a foundational model from scratch. Their methodologies, which include data enhancement, privacy and safety controls, and preference optimization, have broader applications beyond just LLM training. These valuable insights deserve more recognition in the industry and attention from practitioners for their applications. It’s frustrating to see these “hidden gems” overlooked, especially when current costly proposals still reflect outdated approaches from 2022 or earlier, despite the availability of these excellent references.
352 Comments -
Onkar Bhardwaj
Our research got adopted in open-source! ❤️ Large Language Models (LLMs) are fickle - they are pretty sensitive to how they are prompted. This makes benchmarking them hard. Benchmarks usually use a small set of specific prompt formats. This small set makes the results less reliable in real life and harder to reproduce. But how do you make them reliable? Evaluate LLM across a large number of prompt formats? The number of ways a model can be prompted for a task is huge and this makes benchmarking impractical and expensive... and this is what we address in our research! We have a new method called PromptEval. Instead of evaluating an LLM with just one prompt format, PromptEval estimates how well the model performs across many different prompt formats. It does this in a way that is efficient and practical, so that your benchmarking budget does not blow up! And this just got adopted in Open-Source PromptBench Library. Paper: https://lnkd.in/edMyvztT Our Library: https://lnkd.in/e8iydgsJ PromptBench Library: https://lnkd.in/eNKPf5vr Example notebook from PromptBench: https://lnkd.in/e4WqqKBt Shout out to Felipe Maia Polo, Ronald Xu, Lucas Weber, Mírian Silva, Leshem Choshen, Állysson Oliveira, Yuekai Sun, Mikhail Yurochkin, and thanks to Kate Soule for supporting us throughout. #ibm #ibmresearch
1094 Comments
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore MoreOthers named Jinyu H.
0 others named Jinyu H. are on LinkedIn
See others named Jinyu H.