Setu Kumar Basak
Raleigh, North Carolina, United States
1K followers
500+ connections
About
I am a Ph.D. student at North Carolina State University, where I work in the Realsearch…
Articles by Setu Kumar
Activity
-
Tips to solve any DSA question by understanding patterns If the input array is sorted then - Binary search - Two pointers If asked for all…
Tips to solve any DSA question by understanding patterns If the input array is sorted then - Binary search - Two pointers If asked for all…
Liked by Setu Kumar Basak
-
I am happy to share that our paper "Leveraging Large Language Models to Detect npm Malicious Packages" has been accepted in the ICSE 2025 Research…
I am happy to share that our paper "Leveraging Large Language Models to Detect npm Malicious Packages" has been accepted in the ICSE 2025 Research…
Liked by Setu Kumar Basak
-
It is my immense pleasure to share that my oral talk received 3rd place in the Graduate Student Award Competition arranged by the AIChE Forest…
It is my immense pleasure to share that my oral talk received 3rd place in the Graduate Student Award Competition arranged by the AIChE Forest…
Liked by Setu Kumar Basak
Experience
Education
-
-
Core Computer Science Courses:
Computer Basics and programing,Object Oriented Programming, Software Development with Java, Internet Programming, Microprocessor and Assembly Languages, Software Engineering and Information Systems, Data Structure and Algorithms, Algorithm Analysis and Design, Theory of Computation, Computer Architecture, Operating Systems, Database Systems, Compiler Design,Computer Networks, Artificial Intelligence, Computer Graphics, Fault Tolerant Computing etc.
Licenses & Certifications
Volunteer Experience
-
Student Motivator
National High School Programing Contest-NHSPC
- Present 9 years 8 months
Education
It was an event for spreading the contest programming in high school levels.
Publications
-
AssetHarvester: A Static Analysis Tool for Detecting Secret-Asset Pairs in Software Artifacts
Research track of 47th International Conference on Software Engineering (ICSE 2025)
GitGuardian monitored secrets exposure in public GitHub repositories and reported that developers leaked over 12 million secrets (database and other credentials) in 2023, indicating a 113% surge from 2021. Despite the availability of secret detection tools, developers ignore the tools' reported warnings because of false positives (25%-99%). However, each secret protects assets of different values accessible through asset identifiers (a DNS name and a public or private IP address). The asset…
GitGuardian monitored secrets exposure in public GitHub repositories and reported that developers leaked over 12 million secrets (database and other credentials) in 2023, indicating a 113% surge from 2021. Despite the availability of secret detection tools, developers ignore the tools' reported warnings because of false positives (25%-99%). However, each secret protects assets of different values accessible through asset identifiers (a DNS name and a public or private IP address). The asset information for a secret can aid developers in filtering false positives and prioritizing secret removal from the source code. However, existing secret detection tools do not provide the asset information, thus presenting difficulty to developers in filtering secrets only by looking at the secret value or finding the assets manually for each reported secret. The goal of our study is to aid software practitioners in prioritizing secrets removal by providing the assets information protected by the secrets through our novel static analysis tool. We present AssetHarvester, a static analysis tool to detect secret-asset pairs in a repository. Since the location of the asset can be distant from where the secret is defined, we investigated secret-asset co-location patterns and found four patterns. To identify the secret-asset pairs of the four patterns, we utilized three approaches (pattern matching, data flow analysis, and fast-approximation heuristics). We curated a benchmark of 1,791 secret-asset pairs of four database types extracted from 188 public GitHub repositories to evaluate the performance of AssetHarvester. AssetHarvester demonstrates precision of (97%), recall (90%), and F1-score (94%) in detecting secret-asset pairs. Our findings indicate that data flow analysis employed in AssetHarvester detects secret-asset pairs with 0% false positives and aids in improving the recall of secret detection tools.
Other authorsSee publication -
A Comparative Study of Software Secrets Reporting by Secret Detection Tools
International Symposium on Empirical Software Engineering and Measurement (ESEM 2023)
According to GitGuardian’s monitoring of public GitHub repositories, secrets sprawl continued accelerating in 2022 by 67% compared to 2021, exposing over 10 million secrets (API keys and other credentials). Though many open-source and proprietary secret detection tools are available, these tools output many false positives, making it difficult for developers to take action and teams to choose one tool out of many. To our knowledge, the secret detection tools are not yet compared and evaluated…
According to GitGuardian’s monitoring of public GitHub repositories, secrets sprawl continued accelerating in 2022 by 67% compared to 2021, exposing over 10 million secrets (API keys and other credentials). Though many open-source and proprietary secret detection tools are available, these tools output many false positives, making it difficult for developers to take action and teams to choose one tool out of many. To our knowledge, the secret detection tools are not yet compared and evaluated. The goal of our study is to aid developers in choosing a secret detection tool to reduce the exposure of secrets through an empirical investigation of existing secret detection tools. We present an evaluation of five open-source and four proprietary tools against a benchmark dataset. The top three tools based on precision are: GitHub Secret Scanner (75%), Gitleaks (46%), and Commercial X (25%), and based on recall are: Gitleaks (88%), SpectralOps (67%) and TruffleHog (52%). Our manual analysis of reported secrets reveals that false positives are due to employing generic regular expressions and ineffective entropy calculation. In contrast, false negatives are due to faulty regular expressions, skipping specific file types, and insufficient rulesets. We recommend developers choose tools based on secret types present in their projects to prevent missing secrets. In addition, we recommend tool vendors update detection rules periodically and correctly employ secret verification mechanisms by collaborating with API vendors to improve accuracy.
Other authorsSee publication -
SecretBench: A Dataset of Software Secrets
20th International Conference on Mining Software Repositories (MSR 2023)
According to GitGuardian's monitoring of public GitHub repositories, the exposure of secrets (API keys and other credentials) increased two-fold in 2021 compared to 2020, totaling more than six million secrets. However, no benchmark dataset is publicly available for researchers and tool developers to evaluate secret detection tools that produce many false positive warnings. The goal of our paper is to aid researchers and tool developers in evaluating and improving secret detection tools by…
According to GitGuardian's monitoring of public GitHub repositories, the exposure of secrets (API keys and other credentials) increased two-fold in 2021 compared to 2020, totaling more than six million secrets. However, no benchmark dataset is publicly available for researchers and tool developers to evaluate secret detection tools that produce many false positive warnings. The goal of our paper is to aid researchers and tool developers in evaluating and improving secret detection tools by curating a benchmark dataset of secrets through a systematic collection of secrets from open-source repositories. We present a labeled dataset of source codes containing 97,479 secrets (of which 15,084 are true secrets) of various secret types extracted from 818 public GitHub repositories. The dataset covers 49 programming languages and 311 file types.
Other authorsSee publication -
What Challenges Do Developers Face About Checked-in Secrets in Software Artifacts?
International Conference on Software Engineering (ICSE) 2023
Throughout 2021, GitGuardian's monitoring of public GitHub repositories revealed a two-fold increase in the number of secrets (database credentials, API keys, and other credentials) exposed compared to 2020, accumulating more than six million secrets. To our knowledge, the challenges developers face to avoid checked-in secrets are not yet characterized. The goal of our paper is to aid researchers and tool developers in understanding and prioritizing opportunities for future research and tool…
Throughout 2021, GitGuardian's monitoring of public GitHub repositories revealed a two-fold increase in the number of secrets (database credentials, API keys, and other credentials) exposed compared to 2020, accumulating more than six million secrets. To our knowledge, the challenges developers face to avoid checked-in secrets are not yet characterized. The goal of our paper is to aid researchers and tool developers in understanding and prioritizing opportunities for future research and tool automation for mitigating checked-in secrets through an empirical investigation of challenges and solutions related to checked-in secrets. We extract 779 questions related to checked-in secrets on Stack Exchange and apply qualitative analysis to determine the challenges and the solutions posed by others for each of the challenges. We identify 27 challenges and 13 solutions. The four most common challenges, in ranked order, are: (i) store/version of secrets during deployment; (ii) store/version of secrets in source code; (iii) ignore/hide of secrets in source code; and (iv) sanitize VCS history. The three most common solutions, in ranked order, are: (i) move secrets out of source code/version control and use template config file; (ii) secret management in deployment; and (iii) use local environment variables. Our findings indicate that the same solution has been mentioned to mitigate multiple challenges. However, our findings also identify an increasing trend in questions lacking accepted solutions substantiating the need for future research and tool automation on managing secrets.
Other authorsSee publication -
What are the Practices for Secret Management in Software Artifacts?
IEEE Secure Development Conference (SecDev) 2022
Throughout 2021, GitGuardian's monitoring of public GitHub repositories revealed a two-fold increase in the number of secrets (database credentials, API keys, and other credentials) exposed compared to 2020, accumulating more than six million secrets. A systematic derivation of practices for managing secrets can help practitioners in secure development. The goal of our paper is to aid practitioners in avoiding the exposure of secrets by identifying secret management practices in software…
Throughout 2021, GitGuardian's monitoring of public GitHub repositories revealed a two-fold increase in the number of secrets (database credentials, API keys, and other credentials) exposed compared to 2020, accumulating more than six million secrets. A systematic derivation of practices for managing secrets can help practitioners in secure development. The goal of our paper is to aid practitioners in avoiding the exposure of secrets by identifying secret management practices in software artifacts through a systematic derivation of practices disseminated in Internet artifacts. We conduct a grey literature review of Internet artifacts, such as blog articles and question and answer posts. We identify 24 practices grouped in six categories comprised of developer and organizational practices. Our findings indicate that using local environment variables and external secret management services are the most recommended practices to move secrets out of source code and to securely store secrets. We also observe that using version control system scanning tools and employing short-lived secrets are the most recommended practices to avoid accidentally committing secrets and limit secret exposure, respectively.
Other authorsSee publication
Projects
-
Denticon
-
Planet DDS is the established leader in cloud-based dental software. The company’s Denticon practice management software is a powerful, flexible tool trusted by thousands of dental professionals across the country. Built from the ground up for enterprise groups, yet intuitive enough for solo practices.
-
HouseLens
-
HouseLens is the nation's leading provider of visual marketing services for real estate, with a nationwide footprint to serve everyone from individual agents to the largest listing portals. Our product mix is constantly evolving to keep our customers at the forefront of marketing technology. Offerings include professional photography, walk-through video, interactive 3D models, VR, drones, floor plans, and more
-
NOMA
-
New Opportunity Model Analysis (NOMA) is a project of Foodbuy.
Foodbuy, LLC is the foodservice industry’s leading procurement services organization focused on lowering purchasing and product costs for both our parent company, Compass Group, as well as for our clients and members.
Foodbuy negotiates and contracts for more than $20bn of food, beverages, and services that our clients need, utilizing more than 600 leading manufacturers and distributors across the U.S. Ultimately…New Opportunity Model Analysis (NOMA) is a project of Foodbuy.
Foodbuy, LLC is the foodservice industry’s leading procurement services organization focused on lowering purchasing and product costs for both our parent company, Compass Group, as well as for our clients and members.
Foodbuy negotiates and contracts for more than $20bn of food, beverages, and services that our clients need, utilizing more than 600 leading manufacturers and distributors across the U.S. Ultimately, sourcing is at the heart of what we do on behalf of Compass Group and some of the most recognized organizations within the restaurant, healthcare, hospitality, leisure, and entertainment industries.
Languages
-
English
Professional working proficiency
-
Bangla
Native or bilingual proficiency
Organizations
-
SGIPC
Assistant Contest Manager
SGIPC(Special Group of Interest in Programming Contest) (zip'c) is a group of programmer of KUET which mainly focuses on programming contest. Despite programming contest this group covers many aspects related to programming. It arranges programming contest in KUET and also discussion session and workshop regularly.
More activity by Setu Kumar
-
I am glad to share that our paper, "AssetHarvester: A Static Analysis Tool for Detecting Secret-Asset Pairs in Software Artifacts" has been accepted…
I am glad to share that our paper, "AssetHarvester: A Static Analysis Tool for Detecting Secret-Asset Pairs in Software Artifacts" has been accepted…
Posted by Setu Kumar Basak
-
🚨 Exciting New Internship Job Openings 🚨 I have handpicked 20 roles for you - don’t miss out on these amazing opportunities. 🎓 Degree Level:…
🚨 Exciting New Internship Job Openings 🚨 I have handpicked 20 roles for you - don’t miss out on these amazing opportunities. 🎓 Degree Level:…
Liked by Setu Kumar Basak
-
Not technical enough -- that's what they said. I remember walking away from those interview rounds feeling pretty crushed. Not technical? How…
Not technical enough -- that's what they said. I remember walking away from those interview rounds feeling pretty crushed. Not technical? How…
Liked by Setu Kumar Basak
-
https://lnkd.in/e_pjTNx2 Hiring one Postdoctoral Research Associate for my lab. Ideal candidate should have hands on experience on dynamics…
https://lnkd.in/e_pjTNx2 Hiring one Postdoctoral Research Associate for my lab. Ideal candidate should have hands on experience on dynamics…
Liked by Setu Kumar Basak
Other similar profiles
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore More