Skip to content

[OSS101] Task 6:Open source repository collaboration network and npm artifact library dependency network mapping dataset #62

@bifenglin

Description

@bifenglin

Objective

This dataset is designed to map the relationships between npm package registries and their corresponding open source repositories. It aims to address the challenges posed by incomplete or outdated metadata in npm registries due to individual contributions and repository name changes, facilitating accurate prediction and mapping of these networks.

Contents

Two networks cannot be fully mapped, but subsets of the two networks can have corresponding relationships, and mapping can be done based on the repo_url field in the npm package info.

Open Source Repository Collaboration Network:

Nodes: Represent individual developers or teams.
Edges: Represent collaborative relationships, including contributions like commits, reviews, and discussions.
Attributes: Include metrics such as number of contributions, nature of the contributions (code, documentation, etc.), and duration of collaboration.

npm Artifact Library Dependency Network:

Nodes: Represent individual npm packages.
Edges: Represent dependency links, where one package is dependent on another.
Attributes: Include version numbers, frequency of updates, and popularity metrics (downloads, description).

Data Collection Methods:

Data for collaboration networks is collected from public APIs of popular source code hosting platforms like GitHub, GitLab, and Bitbucket. You can also directly download the sample dataset provided by opendigger, and it is recommended to compare one year of behavioral data. https://github.com/X-lab2017/open-digger/blob/master/sample_data/README.md

Data for npm artifact library dependency networks is extracted from the npm registry's public API, focusing on the package.json files to map dependencies. You can crawl through npm.org. Here are the global npm libraries and their dependencies provided :

npm dependencies: npm_dependencies.zip 7.15M
npm packages:npm_packages.zip 69.28M

Potential Use Cases:

  • Get metrics from two networks:Degree、Clustering Coefficient、Average Path Length、Diameter、Centrality、Density、Modularity、Connected Components etc.
  • Visualize two network mappings
  • Studying the resilience of software ecosystems by examining dependency chains and their impacts on software reliability.
  • Evaluating trends in software development practices over time.

Format

The dataset is provided in formats suitable for machine learning and network analysis, such as CSV for tabular data and JSON for structured metadata.

Output results

  • A complete dataset containing open-source repository collaboration networks and npm artifact library dependency networks.

  • Usage instructions for the dataset, detailing data items, sources, collection, and processing methods.

  • Data analysis report summarizing key findings and insights.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions