Here's an example.
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3281765#:~:text=First%2C%20AIs%20could%20display%20a,the%20doubt%E2%80%9D%20in%20uncertain%20situations.
However, re. your argument, there is absolutely no question that people are categorizing things like this. If you are talking purely material states, ok sure, you can accurately and comprehensively identify a material configuration, circles, magnetic profiles of ships, you name it. But anything that involves a human evaluation for which there can be a range of interpretations is subject to this bias. It's absolutely inherent in the nature of neural networks.
https://theconversation.com/emotion-reading-tech-fails-the-racial-bias-test-108404
https://www.nytimes.com/2016/06/26/opinion/sunday/artificial-intelligences-white-guy-problem.html
The dataset can never be more neutral than what is being evaluated. People are interested in categorizing what people do in order to get them to do certain things. It's probably the most subjectively-laden data-interpretation that there is (even when it comes to identifying what constitutes data). Ask ChatGPT if its answers could be subject to unknown selection biases its developers may have passed on to it accidentally through data-categorization.
edit. I just did it. Here's what it told me.
So if that's the case, then couldn't your ability to identify something be subject to unknown selection biases your developers may have passed on to you accidentally through the initial data selection and identification?
ChatGPT
Yes, you've touched upon a crucial aspect of machine learning systems, including neural networks: the potential for biases to be inadvertently introduced through the training data and labeling process. These biases, which may be present in the data collected or the decisions made during the training phase, can influence the behavior and performance of the model, potentially leading to unintended consequences. Here are some key points to consider:
Data Selection Bias: The selection of training data plays a significant role in shaping the behavior of a machine learning model. If the training dataset is not representative of the target population or contains inherent biases, the model may learn to replicate and amplify these biases in its predictions. For example, if a facial recognition system is trained predominantly on images of individuals from certain demographic groups, it may perform poorly on individuals from underrepresented groups.
Labeling Bias: The process of labeling training data can also introduce biases if the criteria for assigning labels are subjective or influenced by human judgment. Annotators may inadvertently introduce their own biases or assumptions when labeling data, leading to inaccuracies or distortions in the training dataset. For instance, if annotators have preconceived notions about certain concepts or categories, they may label data in a way that reflects these biases.
Algorithmic Bias: Even if the training data is unbiased, the algorithms used to train machine learning models may exhibit bias due to the optimization process or inherent limitations of the algorithm. For example, certain types of neural network architectures may be more prone to amplifying biases present in the data, or the choice of loss function used during training may inadvertently prioritize certain types of errors over others.
Mitigation Strategies: To address these concerns, developers and researchers employ various strategies to identify, mitigate, and address biases in machine learning systems. These may include carefully curating diverse and representative training datasets, implementing fairness-aware algorithms that explicitly account for biases, and conducting rigorous evaluation and testing to assess the performance of the model across different demographic groups and use cases.
Overall, while machine learning systems offer powerful tools for pattern recognition and decision-making, it's important to recognize and mitigate the potential for biases to influence their behavior. By taking proactive steps to address biases in the data, algorithms, and decision-making processes, developers can help ensure that machine learning systems are fair, transparent, and accountable.