For artificial intelligence systems, knowledge is power. The ability of AI to exhibit intelligent behavior across different real-world domains depends fundamentally on having access to large volumes of high-quality knowledge. Self-driving cars rely on extensive knowledge about roads, signs, pedestrians, and more to navigate safely. Medical AI systems need expansive knowledge of symptoms, diseases, treatments, and patient data to advise doctors or even diagnose conditions. Even fundamental technologies like speech recognition and language translation depend on comprehensive grammar, vocabulary, and language use knowledge.
But where does all this knowledge come from? Enter knowledge bases - structured repositories of facts about the world that fuel everything from commonsense reasoning to expert systems. Constructing knowledge bases has long been a significant bottleneck in applied AI, and traditional approaches have fallen short. In their paper "Building Large Knowledge Bases by Mass Collaboration," Matthew Richardson and Pedro Domingos propose a radical new paradigm - crowdsourcing knowledge acquisition through mass collaboration.
Combining Human and Machine Intelligence
The system relies on an intimate combination of human knowledge authoring and machine learning. Humans provide simplified qualitative rules while the system estimates probabilities, resolves inconsistencies, and gauges quality. This plays to the strengths of each.
Continuous Feedback Loops
A vital aspect of the system is continuous feedback loops between users and the knowledge base. Whenever a user submits a query, they provide feedback on whether the response was correct after the fact.
This real-world feedback acts as a "reality check" that constantly tunes the knowledge base to improve relevance and quality. For example, if a diagnostic rule leads to an incorrect fault prediction, this negative outcome updates the system to trust that rule less.
User feedback is also aggregated over many queries to learn the weights of different rules using machine learning techniques like expectation maximization. This allows the automatic discerning of high-quality knowledge automatically.
By perpetually incorporating new feedback, the system can rapidly adapt its knowledge in response to actual usage. This prevents the drifting into irrelevant tangents that often occurs in knowledge bases developed in isolation. The hands-on guidance of users steers the system in invaluable directions.
Addressing Key Challenges
The proposed architecture tackles several challenges that could hinder the success of collaborative knowledge bases:
- Ensuring content quality is addressed through statistical machine learning on user feedback.
- Handling conflicting rules is enabled by representing knowledge as probabilistic logic.
- Keeping knowledge relevant is achieved by allowing contributors to enter practical domain-specific knowledge.
- Incentivizing participation happens through a credit assignment system that rewards helpful contributions.
- Scaling to large volumes of knowledge is accomplished via a local compilation of rules relevant to each query.
By recognizing and solving these potential pitfalls, the system design provides a robust mass collaborative knowledge engineering framework.
Modular Contributions
A decentralized approach to contribution allows each person to add knowledge independently without centralized control or coordination. This supports natural scalability since contributors can plug in new rules modularly.
The modularity of rule contributions also enables combining knowledge across different topics and domains. Separate fragmentary rules authored by thousands of people can be chained together to infer new knowledge spanning multiple areas of expertise.
This freeform participation style allows any willing contributor to expand their knowledge base in whatever direction. By aggregating many modular contributions, the system can automatically construct rich knowledge graphs that connect concepts in ways no individual envisages.
Many-to-Many Interactions
A vital feature of the architecture is that interactions between contributors and users of the knowledge base are many-to-many rather than one-to-one. This enables emergent knowledge that no single contributor possessed originally. For example, a user's query may leverage rules authored by multiple contributors to infer an answer that none knew alone.
Likewise, the feedback from the query's outcome propagates back to update the weights of all the rules contributing to the answer. Over many queries, each rule accumulates credit based on its involvement in successful inferences across the whole knowledge base. This indirect, distributed interaction between contributors via the evolving knowledge base allows for integrating knowledge in ways not anticipated by any individual contributor.
The many-to-many nature of these interactions facilitates the development of knowledge that is more than the sum of its parts. The system can infer new insights that bootstrap its learning in complex domains by connecting fragments of knowledge from an extensive, decentralized network of contributors.
User-Developers Drive Relevance
A key motivation strategy is allowing contributors to add knowledge that helps solve their real-world problems and interests. This aligns with the open-source principle that a user-developer perspective leads to practical utility.
For example, someone struggling to troubleshoot printer issues can contribute diagnostic rules to the knowledge base that capture their hard-won experience. When they or others later query the system about similar printer problems, these rules will prove helpful in providing solutions. This creates a self-reinforcing cycle between contribution and benefit that keeps knowledge focused on valuable domains.
Empowering contributors to scratch their itches in this manner significantly enhances the real-world relevance of the evolving knowledge base. By seeding it with knowledge geared toward specific needs, the system is guided along productive directions rather than accumulating abstract facts.
Credit Assignment Fuels Participation
To incentivize quality contributions, the system provides feedback to contributors on the utility of their knowledge. When rules successfully contribute to answering user queries, credit is propagated back to the relevant rules and their authors.
This credit assignment can be used to rank contributors and reward the most helpful ones, fulfilling people's desire for recognition. Adverse credit is also assigned when rules lead to incorrect inferences, creating motivation to enter only high-quality knowledge.
By quantifying the impact of each contribution, the system offers meaningful feedback that can sustain engagement. Seeing their knowledge used successfully provides contributors satisfaction and a sense of accomplishment, inspiring further participation.
Local Compilation for Scalability
A critical technical innovation enabling scalability is that query processing compiles only a subset of relevant knowledge into a small Bayesian network tailored to that query. The network size depends on the applicable knowledge rather than the complete knowledge base size.
This localization makes inference tractable even for extensive knowledge bases. Only rules related to the particular query are activated, rather than bombarding the query with everything known. For example, diagnosing a printer problem may only involve a few dozen candidate causes and manifestations, not the entirety of human knowledge.
Intelligent pre-processing to extract relevant knowledge also mirrors how human experts focus on pertinent facts when solving problems in specialized domains. The system learns to mimic this domain-specific perspective for robust and efficient reasoning.
Synthetic and Real-World Evaluation
Experiments on synthetic knowledge bases and printer troubleshooting with accurate user contributions demonstrate the advantages of the architecture.
Enabling Web-Scale AI
This pioneering paper laid the groundwork for a monumental advance in artificial intelligence - constructing the massive knowledge bases needed for versatile real-world applications by harnessing the collective intelligence of millions of people. The collaborative knowledge engineering paradigm pioneered here foreshadowed the rise of crowdsourcing platforms that have made mass participation in complex projects feasible—the participatory structure crystallized principles for effectively coordinating and combining decentralized contributions from large networks of non-experts.
Equally importantly, the hybrid human-machine approach provides a template for complementing the strengths of both. Humans handle intuitive rule authoring, while algorithms handle inference, disambiguation, and quality control. This division of labor enables symbiotic amplification of capacities. This work created solutions that make crowdsourced knowledge bases viable by recognizing and addressing challenges like relevance, motivation, and scalability. The proposed methods for ensuring quality, consistency, and scalability continue to guide collaborative knowledge systems today.
The vision of web-scale knowledge engineering is coming to fruition through projects like Cyc, Wikidata, DBpedia, and more. However, the journey is just beginning - fully realizing the paradigm's potential could make AI more capable and widely beneficial—the insights from this paper chart the way forward.
10 Interesting Facts in This Paper
- Combining human knowledge authoring with machine learning techniques like probabilistic inference and expectation maximization.
- Using continuous feedback loops from users querying the knowledge base to improve relevance and quality.
- Employing probabilistic logic to handle inconsistent rules from different contributors.
- Achieving scalability by compiling only relevant subsets of knowledge into Bayesian networks for each query.
- Incentivizing participation through credit assignment based on the utility of contributions.
- Assuming contributors are topic experts who can author rules leveraging shared general concepts.
- Driving practical utility by allowing user-developers to contribute knowledge for solving their problems.
- Supporting modular, decentralized contributions without centralized control.
- Facilitating emergent knowledge through many-to-many interactions between contributors and users.
- Validating the approach through experiments on synthetic data and printer troubleshooting knowledge from real volunteers.
This paper proposes an architecture for constructing large AI knowledge bases via mass collaboration over the web. The system combines decentralized contributions of logical rules from many volunteers with machine learning techniques. Continuous feedback loops ensure the evolving knowledge stays relevant to real-world needs. Key ideas include:
- Complementing human qualitative knowledge with machine probability estimation and quality learning
- Using real-world feedback loops to validate and improve the knowledge constantly
- Employing probabilistic logic to resolve conflicting rules from diverse sources
- Compiling only relevant knowledge to answer each query for scalability
- Incentivizing participation through a credit system that rewards helpful contributions
- Allowing user-developers to contribute valuable knowledge for their problems
- Facilitating emergent knowledge by combining modular rules in novel ways
- Addressing critical challenges like quality, relevance, motivation, and scalability
Experiments demonstrate the viability of aggregating knowledge from distributed non-expert contributors to produce an intelligent system more significant than the sum of its parts. The proposed architecture provides a foundation for collectively engineering the massive knowledge bases for practical AI.
Listen to this Post