Setu Kumar Basak

Setu Kumar Basak

Raleigh, North Carolina, United States
1K followers 500+ connections

About

I am a Ph.D. student at North Carolina State University, where I work in the Realsearch…

Articles by Setu Kumar

Activity

Join now to see all activity

Experience

  • North Carolina State University Graphic

    North Carolina State University

    Raleigh, North Carolina, United States

  • -

    Durham, North Carolina, United States

  • -

    Raleigh, North Carolina, United States

  • -

    Dhaka, Bangladesh

  • -

    Gulshan, Dhaka

  • -

    Dhaka

Education

  • North Carolina State University Graphic
  • -

    Core Computer Science Courses:
    Computer Basics and programing,Object Oriented Programming, Software Development with Java, Internet Programming, Microprocessor and Assembly Languages, Software Engineering and Information Systems, Data Structure and Algorithms, Algorithm Analysis and Design, Theory of Computation, Computer Architecture, Operating Systems, Database Systems, Compiler Design,Computer Networks, Artificial Intelligence, Computer Graphics, Fault Tolerant Computing etc.

Licenses & Certifications

Volunteer Experience

  • Student Motivator

    National High School Programing Contest-NHSPC

    - Present 9 years 8 months

    Education

    It was an event for spreading the contest programming in high school levels.

Publications

  • AssetHarvester: A Static Analysis Tool for Detecting Secret-Asset Pairs in Software Artifacts

    Research track of 47th International Conference on Software Engineering (ICSE 2025)

    GitGuardian monitored secrets exposure in public GitHub repositories and reported that developers leaked over 12 million secrets (database and other credentials) in 2023, indicating a 113% surge from 2021. Despite the availability of secret detection tools, developers ignore the tools' reported warnings because of false positives (25%-99%). However, each secret protects assets of different values accessible through asset identifiers (a DNS name and a public or private IP address). The asset…

    GitGuardian monitored secrets exposure in public GitHub repositories and reported that developers leaked over 12 million secrets (database and other credentials) in 2023, indicating a 113% surge from 2021. Despite the availability of secret detection tools, developers ignore the tools' reported warnings because of false positives (25%-99%). However, each secret protects assets of different values accessible through asset identifiers (a DNS name and a public or private IP address). The asset information for a secret can aid developers in filtering false positives and prioritizing secret removal from the source code. However, existing secret detection tools do not provide the asset information, thus presenting difficulty to developers in filtering secrets only by looking at the secret value or finding the assets manually for each reported secret. The goal of our study is to aid software practitioners in prioritizing secrets removal by providing the assets information protected by the secrets through our novel static analysis tool. We present AssetHarvester, a static analysis tool to detect secret-asset pairs in a repository. Since the location of the asset can be distant from where the secret is defined, we investigated secret-asset co-location patterns and found four patterns. To identify the secret-asset pairs of the four patterns, we utilized three approaches (pattern matching, data flow analysis, and fast-approximation heuristics). We curated a benchmark of 1,791 secret-asset pairs of four database types extracted from 188 public GitHub repositories to evaluate the performance of AssetHarvester. AssetHarvester demonstrates precision of (97%), recall (90%), and F1-score (94%) in detecting secret-asset pairs. Our findings indicate that data flow analysis employed in AssetHarvester detects secret-asset pairs with 0% false positives and aids in improving the recall of secret detection tools.

    Other authors
    See publication
  • A Comparative Study of Software Secrets Reporting by Secret Detection Tools

    International Symposium on Empirical Software Engineering and Measurement (ESEM 2023)

    According to GitGuardian’s monitoring of public GitHub repositories, secrets sprawl continued accelerating in 2022 by 67% compared to 2021, exposing over 10 million secrets (API keys and other credentials). Though many open-source and proprietary secret detection tools are available, these tools output many false positives, making it difficult for developers to take action and teams to choose one tool out of many. To our knowledge, the secret detection tools are not yet compared and evaluated…

    According to GitGuardian’s monitoring of public GitHub repositories, secrets sprawl continued accelerating in 2022 by 67% compared to 2021, exposing over 10 million secrets (API keys and other credentials). Though many open-source and proprietary secret detection tools are available, these tools output many false positives, making it difficult for developers to take action and teams to choose one tool out of many. To our knowledge, the secret detection tools are not yet compared and evaluated. The goal of our study is to aid developers in choosing a secret detection tool to reduce the exposure of secrets through an empirical investigation of existing secret detection tools. We present an evaluation of five open-source and four proprietary tools against a benchmark dataset. The top three tools based on precision are: GitHub Secret Scanner (75%), Gitleaks (46%), and Commercial X (25%), and based on recall are: Gitleaks (88%), SpectralOps (67%) and TruffleHog (52%). Our manual analysis of reported secrets reveals that false positives are due to employing generic regular expressions and ineffective entropy calculation. In contrast, false negatives are due to faulty regular expressions, skipping specific file types, and insufficient rulesets. We recommend developers choose tools based on secret types present in their projects to prevent missing secrets. In addition, we recommend tool vendors update detection rules periodically and correctly employ secret verification mechanisms by collaborating with API vendors to improve accuracy.

    Other authors
    See publication
  • SecretBench: A Dataset of Software Secrets

    20th International Conference on Mining Software Repositories (MSR 2023)

    According to GitGuardian's monitoring of public GitHub repositories, the exposure of secrets (API keys and other credentials) increased two-fold in 2021 compared to 2020, totaling more than six million secrets. However, no benchmark dataset is publicly available for researchers and tool developers to evaluate secret detection tools that produce many false positive warnings. The goal of our paper is to aid researchers and tool developers in evaluating and improving secret detection tools by…

    According to GitGuardian's monitoring of public GitHub repositories, the exposure of secrets (API keys and other credentials) increased two-fold in 2021 compared to 2020, totaling more than six million secrets. However, no benchmark dataset is publicly available for researchers and tool developers to evaluate secret detection tools that produce many false positive warnings. The goal of our paper is to aid researchers and tool developers in evaluating and improving secret detection tools by curating a benchmark dataset of secrets through a systematic collection of secrets from open-source repositories. We present a labeled dataset of source codes containing 97,479 secrets (of which 15,084 are true secrets) of various secret types extracted from 818 public GitHub repositories. The dataset covers 49 programming languages and 311 file types.

    Other authors
    See publication
  • What Challenges Do Developers Face About Checked-in Secrets in Software Artifacts?

    International Conference on Software Engineering (ICSE) 2023

    Throughout 2021, GitGuardian's monitoring of public GitHub repositories revealed a two-fold increase in the number of secrets (database credentials, API keys, and other credentials) exposed compared to 2020, accumulating more than six million secrets. To our knowledge, the challenges developers face to avoid checked-in secrets are not yet characterized. The goal of our paper is to aid researchers and tool developers in understanding and prioritizing opportunities for future research and tool…

    Throughout 2021, GitGuardian's monitoring of public GitHub repositories revealed a two-fold increase in the number of secrets (database credentials, API keys, and other credentials) exposed compared to 2020, accumulating more than six million secrets. To our knowledge, the challenges developers face to avoid checked-in secrets are not yet characterized. The goal of our paper is to aid researchers and tool developers in understanding and prioritizing opportunities for future research and tool automation for mitigating checked-in secrets through an empirical investigation of challenges and solutions related to checked-in secrets. We extract 779 questions related to checked-in secrets on Stack Exchange and apply qualitative analysis to determine the challenges and the solutions posed by others for each of the challenges. We identify 27 challenges and 13 solutions. The four most common challenges, in ranked order, are: (i) store/version of secrets during deployment; (ii) store/version of secrets in source code; (iii) ignore/hide of secrets in source code; and (iv) sanitize VCS history. The three most common solutions, in ranked order, are: (i) move secrets out of source code/version control and use template config file; (ii) secret management in deployment; and (iii) use local environment variables. Our findings indicate that the same solution has been mentioned to mitigate multiple challenges. However, our findings also identify an increasing trend in questions lacking accepted solutions substantiating the need for future research and tool automation on managing secrets.

    Other authors
    See publication
  • What are the Practices for Secret Management in Software Artifacts?

    IEEE Secure Development Conference (SecDev) 2022

    Throughout 2021, GitGuardian's monitoring of public GitHub repositories revealed a two-fold increase in the number of secrets (database credentials, API keys, and other credentials) exposed compared to 2020, accumulating more than six million secrets. A systematic derivation of practices for managing secrets can help practitioners in secure development. The goal of our paper is to aid practitioners in avoiding the exposure of secrets by identifying secret management practices in software…

    Throughout 2021, GitGuardian's monitoring of public GitHub repositories revealed a two-fold increase in the number of secrets (database credentials, API keys, and other credentials) exposed compared to 2020, accumulating more than six million secrets. A systematic derivation of practices for managing secrets can help practitioners in secure development. The goal of our paper is to aid practitioners in avoiding the exposure of secrets by identifying secret management practices in software artifacts through a systematic derivation of practices disseminated in Internet artifacts. We conduct a grey literature review of Internet artifacts, such as blog articles and question and answer posts. We identify 24 practices grouped in six categories comprised of developer and organizational practices. Our findings indicate that using local environment variables and external secret management services are the most recommended practices to move secrets out of source code and to securely store secrets. We also observe that using version control system scanning tools and employing short-lived secrets are the most recommended practices to avoid accidentally committing secrets and limit secret exposure, respectively.

    Other authors
    See publication

Projects

  • Denticon

    -

    Planet DDS is the established leader in cloud-based dental software. The company’s Denticon practice management software is a powerful, flexible tool trusted by thousands of dental professionals across the country. Built from the ground up for enterprise groups, yet intuitive enough for solo practices.

    See project
  • HouseLens

    -

    HouseLens is the nation's leading provider of visual marketing services for real estate, with a nationwide footprint to serve everyone from individual agents to the largest listing portals. Our product mix is constantly evolving to keep our customers at the forefront of marketing technology. Offerings include professional photography, walk-through video, interactive 3D models, VR, drones, floor plans, and more

    See project
  • NOMA

    -

    New Opportunity Model Analysis (NOMA) is a project of Foodbuy.

    Foodbuy, LLC is the foodservice industry’s leading procurement services organization focused on lowering purchasing and product costs for both our parent company, Compass Group, as well as for our clients and members.

    Foodbuy negotiates and contracts for more than $20bn of food, beverages, and services that our clients need, utilizing more than 600 leading manufacturers and distributors across the U.S. Ultimately…

    New Opportunity Model Analysis (NOMA) is a project of Foodbuy.

    Foodbuy, LLC is the foodservice industry’s leading procurement services organization focused on lowering purchasing and product costs for both our parent company, Compass Group, as well as for our clients and members.

    Foodbuy negotiates and contracts for more than $20bn of food, beverages, and services that our clients need, utilizing more than 600 leading manufacturers and distributors across the U.S. Ultimately, sourcing is at the heart of what we do on behalf of Compass Group and some of the most recognized organizations within the restaurant, healthcare, hospitality, leisure, and entertainment industries.

    See project

Languages

  • English

    Professional working proficiency

  • Bangla

    Native or bilingual proficiency

Organizations

  • SGIPC

    Assistant Contest Manager

    SGIPC(Special Group of Interest in Programming Contest) (zip'c) is a group of programmer of KUET which mainly focuses on programming contest. Despite programming contest this group covers many aspects related to programming. It arranges programming contest in KUET and also discussion session and workshop regularly.

More activity by Setu Kumar

View Setu Kumar’s full profile

  • See who you know in common
  • Get introduced
  • Contact Setu Kumar directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Add new skills with these courses