정보
Enthusiastic Data Scientist eager to solve the real-world problem and mining insights…
활동
-
영상은 끝났네요.. GOAT 구글ㄷㄷ 구글 딥마인드에서 오픈AI Sora에 대항하는 최신 영상 생성 AI, 'Veo 2'를 공개했습니다. 예시(이미지1,2)를 비교해 보면 Sora를 훨씬 뛰어넘는 퀄리티와 물리 구현을 확인할 수 있습니다. Veo 2 특징 - 최대…
영상은 끝났네요.. GOAT 구글ㄷㄷ 구글 딥마인드에서 오픈AI Sora에 대항하는 최신 영상 생성 AI, 'Veo 2'를 공개했습니다. 예시(이미지1,2)를 비교해 보면 Sora를 훨씬 뛰어넘는 퀄리티와 물리 구현을 확인할 수 있습니다. Veo 2 특징 - 최대…
추천한 사람: Junwoo Yun
-
Ilya Sutskever gave a talk at NeurIPS about the post pretraining world - here's my talk on his talk - Ilya is implying we need to find something else…
Ilya Sutskever gave a talk at NeurIPS about the post pretraining world - here's my talk on his talk - Ilya is implying we need to find something else…
추천한 사람: Junwoo Yun
-
사용하는 앱이 서비스 장애를 겪을 때마다 "오 대체 무슨 일로 다운된 걸까? 장애 부검이 기대되네" 라는 생각이 먼저 드는 귀여운 개발자 직업병이 있습니다 😅 지난 11일, OpenAI가 제공하는 거의 모든 서비스가 무려 4시간 동안 다운되는 대규모 서비스 장애가…
사용하는 앱이 서비스 장애를 겪을 때마다 "오 대체 무슨 일로 다운된 걸까? 장애 부검이 기대되네" 라는 생각이 먼저 드는 귀여운 개발자 직업병이 있습니다 😅 지난 11일, OpenAI가 제공하는 거의 모든 서비스가 무려 4시간 동안 다운되는 대규모 서비스 장애가…
추천한 사람: Junwoo Yun
경력
학력
-
The Chinese University of Hong Kong
-
• Advancing Stand Student - 24 unit exempted over a total of 123 unit
• Honor on 2018-2019 2nd Semester
자격/수료증
논문·저서
-
Multi Datasource LTV User Representation (MDLUR)
KDD 2023 : Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
In this paper, we propose a novel user representation methodology called Multi Datasource LTV User Representation (MDLUR). Our model aims to establish a universal user embedding for downstream tasks, specifically lifetime value (LTV) prediction on specific days after installation. MDLUR uses a combination of various data sources, including user information, portrait, and behavior data from the first n days after installation of the social casino game "Club Vegas Slots" developed by Bagelcode…
In this paper, we propose a novel user representation methodology called Multi Datasource LTV User Representation (MDLUR). Our model aims to establish a universal user embedding for downstream tasks, specifically lifetime value (LTV) prediction on specific days after installation. MDLUR uses a combination of various data sources, including user information, portrait, and behavior data from the first n days after installation of the social casino game "Club Vegas Slots" developed by Bagelcode. This model overcomes the limitation of conventional approaches that struggle with effectively utilizing various data sources or accurately capturing interactions in sparse datasets. MDLUR adopts unique model architectures tailored to each data source. Coupled with robust dimensionality reduction techniques, this model succeeds in the effective integration of insights from various data sources. Comprehensive experiments on real-world industrial data demonstrate the superiority of the proposed methods compared to SOTA baselines including Two-Stage XGBoost, WhalesDector, MSDMT, and BST. Not only did it outperform these models, but it has also been efficiently deployed and tested in a live environment using MLOps demonstrating its maintainability. The representation may potentially be applied to a wide range of downstream tasks, including conversion, churn, and retention prediction, as well as user segmentation and item recommendation.
프로젝트
-
Citadel APAC Datathon
-
- Analyzed and suggested betting strategy using EU Football data
- Null preprocessing and Data aggregation
- Feature selection using chi-squared test, first rank and vif test
- Modeling and tuning linear, tree bagging, and boosting models for classification and regression with sklearn pipeline
- Implement a betting strategy using Sharpe Ratio and compared with a random guess
- Resulted in positive and continuous betting strategy -
AI recommendation algorithm based Army Cafeteria App
-
• Analyzed issues underlying army foodservice and suggested AI recommendation based solution
• Applied AHPSorting with the hybrid model using content-based filtering and collaborative filtering
• Organized system-design, MLOps, and solution to cold-start & personalization
-
AI Mentoring programme
-
• Analyzed necessity of educational peronsalization · suggested AI tracing individual’s skill level
• Designed Deep-Knowledge-Tracing and organized CI/CD pipeline
• Collaborated with M.Education student to facilitate domain-specific knowledge다른 리더 -
COVID Alarm Application
-
• Analyzed the reason behind contagion and programmed preventive solution
• Conducted data preprocessing and cleaning from data gathering project below
• Implemented service via Flutter, Firebase, and Pandas due to limited computing power during service다른 리더 -
COVID Data mining and distribution
-
• Conducted data mining, wrangling, and cleaning on tracing data from the state · authority
• Distributed organized data on Kaggle to promote exploration · collaboration on COVID-19 solution
수상 경력
-
NeurIPS2022 - Ariel ML Data Challenge (25th/231)
NeurIPS2022
Perform planetary characterization through Auto-Encoder DNN
-
Gas Demand Estimation Competition (Final round 4th / 488 th)
Korea Gas Corporation
• Estimating Gas usage with custom external data
• Utilized pseudo labeling and time-series distribution data
• Mainly conducted feature engineering (adf, vif, corr) -
AI Based Meeting Transcript Summarization Competition (35/734)
LG
- Summarized given Meeting Transcript
- Constructed NLP Models based on BERT, RoBERTa, T5 with huggingface library
- Utilized special tokens, manual encoder-decoder, long-former, and BERTSum
- Conducted 2-step modeling with topic classification layer following with encoder-decoder summarization layer -
System Quality Anomaly Competition (13 / 1365th)
LG AI Research
• Analyzed 24M unidentified system log records (24M train data, 18M test data)
• Predicted inconvenience of users from system quality changes
• Conducted Data preprocessing via NaN handling, skewness handling with box-cox transformation, SMOTE oversampling and undersampling
• Conducted dimension reduction with PCA & TSNE with LTSM to separate anomaly from data
• Conducted feature selection via correlation threshold, feature importance, permutation importance, adversarial…• Analyzed 24M unidentified system log records (24M train data, 18M test data)
• Predicted inconvenience of users from system quality changes
• Conducted Data preprocessing via NaN handling, skewness handling with box-cox transformation, SMOTE oversampling and undersampling
• Conducted dimension reduction with PCA & TSNE with LTSM to separate anomaly from data
• Conducted feature selection via correlation threshold, feature importance, permutation importance, adversarial importance
• Conducted feature extraction via Label · Onehot · Cyclical · Binary · Frenquency encoding
• Conducted OOF Meta Modeling with LSTM · Catboost · LightGBM · Sklearn tree-based model -
2020 NH Investment Big Data Competition (final-round/638th)
-
• Analyzed pattern of investment-related fake news
• Constructed a rapid NLP algorithm based on ELECTRA
• Evaluated different NLP models; BiLTSM, BERT, GPT2, XLA, Cross Encoder -
Author Classification Competition (6/693th)
Dacon
• Analyzed literary style of each author and conducted classification
• Designed Ensemble model with Naive-Bayes, Sent2Vec, TfidfVectorizer, CountVectorizer, Boosting Tree models
Code : https://dacon.io/codeshare/1894 -
Psychological Prediction Competition (12/1282th)
Dacon
• Conducted a prediction model whether test participants voted in national elections
• Facilitated the Machiavellitic psychological test with LReLU based five-layer NN model -
ROK 3rd Startup Competition (3rd place)
Republic of Korea Army
• Designed Google Wavenet PixelCNN Based singing synthesis
• Performed market analysis, technology trend research -
Kaggle 2nd KAKR ML month competition (7/414th)
Google Korea / Kaggle
• Conducted EDA with visualization and feature selection · extraction with correlation
• Ensembled different sklearn based tree models and boosting models
Junwoo님의 활동 더보기
-
👀 Set a *single weight* in a 13B LLama2 model to 0 and the network then outputs gibberish. We call these weights "super weights" and it turns out…
👀 Set a *single weight* in a 13B LLama2 model to 0 and the network then outputs gibberish. We call these weights "super weights" and it turns out…
추천한 사람: Junwoo Yun
-
🎄🎁 10 Xmas Reads for Aspiring ML Engineers! 🎁🎄 As the year comes to a close, what better way to recharge and prepare for 2025 than diving into…
🎄🎁 10 Xmas Reads for Aspiring ML Engineers! 🎁🎄 As the year comes to a close, what better way to recharge and prepare for 2025 than diving into…
추천한 사람: Junwoo Yun
-
If I wanted to break into Machine Learning in 2025, these are 3 types of projects I would have in my portfolio: Forget the cookie-cutter approach of…
If I wanted to break into Machine Learning in 2025, these are 3 types of projects I would have in my portfolio: Forget the cookie-cutter approach of…
추천한 사람: Junwoo Yun
-
BERT 시대에는 Reranker 작업을 위해 encoder-only 구조인 BERT를 기반 모델로 사용하였으며, 이때 쿼리와 문서를 입력으로 받아 BERT의 CLS 벡터를 MLP에 통과시켜 점수를 산출하는 방식으로 Reranking을 수행하였다. 그러나 대규모 언어…
BERT 시대에는 Reranker 작업을 위해 encoder-only 구조인 BERT를 기반 모델로 사용하였으며, 이때 쿼리와 문서를 입력으로 받아 BERT의 CLS 벡터를 MLP에 통과시켜 점수를 산출하는 방식으로 Reranking을 수행하였다. 그러나 대규모 언어…
추천한 사람: Junwoo Yun
-
I am thrilled to announce that our paper received the Outstanding Paper Award (3 out of about 100 papers) at the NeurIPS 2024 Open World Agent…
I am thrilled to announce that our paper received the Outstanding Paper Award (3 out of about 100 papers) at the NeurIPS 2024 Open World Agent…
추천한 사람: Junwoo Yun
-
AI Engineering: Then vs. Now The evolution of AI engineering is perfectly summed up in one meme: 🕹️ Back Then: Weeks of manually tuning models…
AI Engineering: Then vs. Now The evolution of AI engineering is perfectly summed up in one meme: 🕹️ Back Then: Weeks of manually tuning models…
추천한 사람: Junwoo Yun
-
토스증권(Toss Securities) Q3 실적 요약 ✔ 3분기 매출액 1,199억 원, 당기 순이익 324억 원으로 각각 전년 대비 117.9%, 833% 성장 ✔ 3분기 영업이익 296억 원으로 전년 대비 8배 이상 성장. 당기 누적 영업이익 602억…
토스증권(Toss Securities) Q3 실적 요약 ✔ 3분기 매출액 1,199억 원, 당기 순이익 324억 원으로 각각 전년 대비 117.9%, 833% 성장 ✔ 3분기 영업이익 296억 원으로 전년 대비 8배 이상 성장. 당기 누적 영업이익 602억…
추천한 사람: Junwoo Yun
-
안녕하세요~! 링크드인에 가입한 지 거의 2년 만에 첫 게시글을 작성하네요! 소개해 드리고 싶은 내용이 있어 포스팅하게 되었습니다. 제가 참여하고 있는 가짜연구소 추천 스터디에서 새롭게 영화 추천 챗봇을 리뉴얼하여 배포하였습니다. 아래는 저희 스터디에서 만든 영화…
안녕하세요~! 링크드인에 가입한 지 거의 2년 만에 첫 게시글을 작성하네요! 소개해 드리고 싶은 내용이 있어 포스팅하게 되었습니다. 제가 참여하고 있는 가짜연구소 추천 스터디에서 새롭게 영화 추천 챗봇을 리뉴얼하여 배포하였습니다. 아래는 저희 스터디에서 만든 영화…
추천한 사람: Junwoo Yun
-
간만에 회사에서 제가 참여한 프로젝트를 소개하게 되어 기쁩니다 😊 저희가 공개한 **DNA 1.0 8B Instruct**는 한국어에 특화된 최첨단 LLM으로, 자연스러운 대화와 복잡한 지시 이해에 강점을 갖고 있습니다. 한국어는 물론 영어도 훌륭히 처리하며…
간만에 회사에서 제가 참여한 프로젝트를 소개하게 되어 기쁩니다 😊 저희가 공개한 **DNA 1.0 8B Instruct**는 한국어에 특화된 최첨단 LLM으로, 자연스러운 대화와 복잡한 지시 이해에 강점을 갖고 있습니다. 한국어는 물론 영어도 훌륭히 처리하며…
추천한 사람: Junwoo Yun
-
Don't hate the player, hate the game 🎯 I’ve seen so many AI companies whose founders have zero understanding of basic principles of AI/ML, yet they…
Don't hate the player, hate the game 🎯 I’ve seen so many AI companies whose founders have zero understanding of basic principles of AI/ML, yet they…
추천한 사람: Junwoo Yun
-
Quantizing a model to 4bits will sometimes break models entirely! Unsloth AI now has a dynamic 4bit quant format which chooses some parameters to be…
Quantizing a model to 4bits will sometimes break models entirely! Unsloth AI now has a dynamic 4bit quant format which chooses some parameters to be…
추천한 사람: Junwoo Yun
-
안녕하세요, 이한울입니다. 제가 만든 "신한투자증권 연관도 분석 AI"라는 모델이 들어간 AI 솔루션부의 두번째 서비스가 배포되었습니다. 간단한 서비스였지만, 기업 정보(자연어) 매출구조 (트리 그래프) 테마(...) 등등을 통해 "사람이 생각하는 비슷한 기업"의…
안녕하세요, 이한울입니다. 제가 만든 "신한투자증권 연관도 분석 AI"라는 모델이 들어간 AI 솔루션부의 두번째 서비스가 배포되었습니다. 간단한 서비스였지만, 기업 정보(자연어) 매출구조 (트리 그래프) 테마(...) 등등을 통해 "사람이 생각하는 비슷한 기업"의…
추천한 사람: Junwoo Yun
-
✨Welcome 2025 Toss Bank Lounge✨ 토스뱅크가 기존 대면 지원센터를 리뉴얼해 ‘토스뱅크 라운지’를 새롭게 오픈했어요! '토스뱅크 라운지'는 기술적 이유로 비대면 처리가 어려운 업무를 돕거나, 65세 이상 시니어 고객과 디지털 환경에 제약이 있는…
✨Welcome 2025 Toss Bank Lounge✨ 토스뱅크가 기존 대면 지원센터를 리뉴얼해 ‘토스뱅크 라운지’를 새롭게 오픈했어요! '토스뱅크 라운지'는 기술적 이유로 비대면 처리가 어려운 업무를 돕거나, 65세 이상 시니어 고객과 디지털 환경에 제약이 있는…
추천한 사람: Junwoo Yun
다른 비슷한 프로필
Junwoo Yun님의 동명이인
LinkedIn에 가입한 Junwoo Yun님의 동명이인 2명
Junwoo Yun님의 동명이인