GitHub to Use User Data for AI Training by Default
hubie writes:
GitHub will now use developer data to train its AI models by default:
GitHub has confirmed it will begin using developer interaction data to train its artificial intelligence models, marking a significant shift in how user data is handled across its platform.
The move, set to take effect on April 24, introduces an opt-out system, meaning most users will be automatically enrolled unless they explicitly disable the setting.
The Microsoft-owned platform said it will start collecting and using interaction data from its AI coding assistant, GitHub Copilot, to improve model performance.
This includes:
- Code snippets entered by users
- Prompts and inputs
- AI-generated outputs and edits
- Context such as file structure and repository data
- User feedback like ratings and interactions
GitHub says this data will help build "more intelligent, context-aware" coding tools and improve accuracy across different programming languages and workflows.
[...] Users who do not want their data used for training must manually disable the setting in their account preferences.
However, enterprise-focused tiers including Copilot Business and Enterprise are excluded from the change, reflecting stricter data governance expectations in corporate environments.
GitHub says real-world developer interactions are essential to improving AI systems.
Read more of this story at SoylentNews.