Junwoo Yun

Junwoo Yun

대한민국 서울
팔로워 874명 1촌 500+명

정보

Enthusiastic Data Scientist eager to solve the real-world problem and mining insights…

활동

활동을 모두 보려면 지금 가입

경력

학력

자격/수료증

논문·저서

  • Multi Datasource LTV User Representation (MDLUR)

    KDD 2023 : Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

    In this paper, we propose a novel user representation methodology called Multi Datasource LTV User Representation (MDLUR). Our model aims to establish a universal user embedding for downstream tasks, specifically lifetime value (LTV) prediction on specific days after installation. MDLUR uses a combination of various data sources, including user information, portrait, and behavior data from the first n days after installation of the social casino game "Club Vegas Slots" developed by Bagelcode…

    In this paper, we propose a novel user representation methodology called Multi Datasource LTV User Representation (MDLUR). Our model aims to establish a universal user embedding for downstream tasks, specifically lifetime value (LTV) prediction on specific days after installation. MDLUR uses a combination of various data sources, including user information, portrait, and behavior data from the first n days after installation of the social casino game "Club Vegas Slots" developed by Bagelcode. This model overcomes the limitation of conventional approaches that struggle with effectively utilizing various data sources or accurately capturing interactions in sparse datasets. MDLUR adopts unique model architectures tailored to each data source. Coupled with robust dimensionality reduction techniques, this model succeeds in the effective integration of insights from various data sources. Comprehensive experiments on real-world industrial data demonstrate the superiority of the proposed methods compared to SOTA baselines including Two-Stage XGBoost, WhalesDector, MSDMT, and BST. Not only did it outperform these models, but it has also been efficiently deployed and tested in a live environment using MLOps demonstrating its maintainability. The representation may potentially be applied to a wide range of downstream tasks, including conversion, churn, and retention prediction, as well as user segmentation and item recommendation.

    논문·저서 보기

프로젝트

  • Citadel APAC Datathon

    -

    - Analyzed and suggested betting strategy using EU Football data
    - Null preprocessing and Data aggregation
    - Feature selection using chi-squared test, first rank and vif test
    - Modeling and tuning linear, tree bagging, and boosting models for classification and regression with sklearn pipeline
    - Implement a betting strategy using Sharpe Ratio and compared with a random guess
    - Resulted in positive and continuous betting strategy

    프로젝트 보기
  • AI recommendation algorithm based Army Cafeteria App

    -

    • Analyzed issues underlying army foodservice and suggested AI recommendation based solution
    • Applied AHPSorting with the hybrid model using content-based filtering and collaborative filtering
    • Organized system-design, MLOps, and solution to cold-start & personalization

  • AI Mentoring programme

    -

    • Analyzed necessity of educational peronsalization · suggested AI tracing individual’s skill level
    • Designed Deep-Knowledge-Tracing and organized CI/CD pipeline
    • Collaborated with M.Education student to facilitate domain-specific knowledge

    다른 리더
    • Minseok Choi
    • Chan Jeong
    • Jaehyeong Ju
  • COVID Alarm Application

    -

    • Analyzed the reason behind contagion and programmed preventive solution
    • Conducted data preprocessing and cleaning from data gathering project below
    • Implemented service via Flutter, Firebase, and Pandas due to limited computing power during service

    다른 리더
    • Tae hoon Lee
    프로젝트 보기
  • COVID Data mining and distribution

    -

    • Conducted data mining, wrangling, and cleaning on tracing data from the state · authority
    • Distributed organized data on Kaggle to promote exploration · collaboration on COVID-19 solution

    프로젝트 보기

수상 경력

  • NeurIPS2022 - Ariel ML Data Challenge (25th/231)

    NeurIPS2022

    Perform planetary characterization through Auto-Encoder DNN

  • Gas Demand Estimation Competition (Final round 4th / 488 th)

    Korea Gas Corporation

    • Estimating Gas usage with custom external data
    • Utilized pseudo labeling and time-series distribution data
    • Mainly conducted feature engineering (adf, vif, corr)

  • AI Based Meeting Transcript Summarization Competition (35/734)

    LG

    - Summarized given Meeting Transcript
    - Constructed NLP Models based on BERT, RoBERTa, T5 with huggingface library
    - Utilized special tokens, manual encoder-decoder, long-former, and BERTSum
    - Conducted 2-step modeling with topic classification layer following with encoder-decoder summarization layer

  • System Quality Anomaly Competition (13 / 1365th)

    LG AI Research

    • Analyzed 24M unidentified system log records (24M train data, 18M test data)
    • Predicted inconvenience of users from system quality changes
    • Conducted Data preprocessing via NaN handling, skewness handling with box-cox transformation, SMOTE oversampling and undersampling
    • Conducted dimension reduction with PCA & TSNE with LTSM to separate anomaly from data
    • Conducted feature selection via correlation threshold, feature importance, permutation importance, adversarial…

    • Analyzed 24M unidentified system log records (24M train data, 18M test data)
    • Predicted inconvenience of users from system quality changes
    • Conducted Data preprocessing via NaN handling, skewness handling with box-cox transformation, SMOTE oversampling and undersampling
    • Conducted dimension reduction with PCA & TSNE with LTSM to separate anomaly from data
    • Conducted feature selection via correlation threshold, feature importance, permutation importance, adversarial importance
    • Conducted feature extraction via Label · Onehot · Cyclical · Binary · Frenquency encoding
    • Conducted OOF Meta Modeling with LSTM · Catboost · LightGBM · Sklearn tree-based model

  • 2020 NH Investment Big Data Competition (final-round/638th)

    -

    • Analyzed pattern of investment-related fake news
    • Constructed a rapid NLP algorithm based on ELECTRA
    • Evaluated different NLP models; BiLTSM, BERT, GPT2, XLA, Cross Encoder

  • Author Classification Competition (6/693th)

    Dacon

    • Analyzed literary style of each author and conducted classification
    • Designed Ensemble model with Naive-Bayes, Sent2Vec, TfidfVectorizer, CountVectorizer, Boosting Tree models

    Code : https://dacon.io/codeshare/1894

  • Psychological Prediction Competition (12/1282th)

    Dacon

    • Conducted a prediction model whether test participants voted in national elections
    • Facilitated the Machiavellitic psychological test with LReLU based five-layer NN model

  • ROK 3rd Startup Competition (3rd place)

    Republic of Korea Army

    • Designed Google Wavenet PixelCNN Based singing synthesis
    • Performed market analysis, technology trend research

  • Kaggle 2nd KAKR ML month competition (7/414th)

    Google Korea / Kaggle

    • Conducted EDA with visualization and feature selection · extraction with correlation
    • Ensembled different sklearn based tree models and boosting models

Junwoo님의 활동 더보기

Junwoo님의 전체 프로필 보기

  • 공통 1촌 보기
  • 소개 받기
  • Junwoo님에게 직접 연락하기
전체 프로필을 보려면 회원가입

다른 비슷한 프로필

Junwoo Yun님의 동명이인

클래스를 듣고 새로운 보유기술 추가하기