Knowledge Graph


(Kousik) #1

Hi there,

Has anyone done research on ground up building of knowledge graphs? @jeremy could u shed some light or resources or people related to this field ?
Thanks in advance!!


(Sam) #2

Yes, there are many resources available on this topic. Depending upon your interests, problem, goal, or idea, I would recommend different resources. Can you provide more information about your specific interest?

In its simplest definition, a knowledge graph is a representation of concepts and relationships between them. Concepts usually implemented by objects/classes/individuals and as Nodes in the graph. Relationships between the concepts are usually implemented as Object_Properties and as Edges in the graph.

For example, the “Semantic Web” is an attempt to apply a structure to store & represent knowledge that you would typically manipulate on web pages. If your interest is primarily web based solutions, you might find the book Programming the Semantic Web to be helpful

If your interest is deeper than web, you might be interested in The description logic handbook: theory, implementation, and applications. Description Logics are the basic formalism used in many of today’s ‘Knowledge Graphs.’ You can also search for resources on “Ontology” which is usually used by people synonymous with Knowledge Graph.

Computer Scientists call this area of research “Knowledge Management” or “Knowledge Base”


(Kousik) #3

Hey @Sam1,

So the problem im looking to solve is basically take a set of documents/emails and build it from scratch -> a totally unsupervised approach to solving the problem of making sense of the data and have an efficient reasoning machine/search system.
Thanks a lot for the help, really appreciate it! Sorry bout the late reply!!


(Sam) #4

Dear @codeck,
Thank you for the response. I gather than you mean a build a knowledge graph from “take a set of documents/emails and build it” is that correct?

Here are a few resources you might be interested in on the topic of “Ontology Engineering” (which is very similar to having various approaches for how to construct a knowledge-graph and then how to use it in applications). You can think of a Knowledge-Graph as a specialized type of Ontology for purposes of this post (aka. there are usually implied restrictions when using the term “knowledge graph” to make it more amenable to existing methods for deploying internet applications and automated reasoning)

Probably the closest result I can think of to your interest is confusingly called “boot-strapping.” This is an overloaded term and is not the same as boot-strap sampling in statistics. So far, the success stories in automatically building knowledge-graphs from scratch (Aka “Boot-strapping knowledge-graph/ontology construction”) always involve a certain amount of existing Structure of the knowledge in the underlying data source such as you would find in the index of a Textbook or Online Website with a menu. Wikipedia for example is already highly (but not entirely) curated. There is a Hierarchy of categories and sub-categories.

sample of links:

Of course if you are doing anything with Wikipedia, I would recommend using DPopedia which is essentially a project to create a knowledge graph of the data that is on Wikipedia.

This in-person course may also be of high interest to you. This is a great team and they create software (Protégé) for the community to use to create and edit ontologies. The course has a fee, but the software does not. Here is their next offering:

Dear all,
We are excited to announce the next Protégé Short Course, to be held at Stanford University, California, between October 22–24, 2018!

The Protégé Short Course is a 3-day intensive training, which provides an in-depth introduction to ontology engineering in the Web Ontology Language (OWL). We cover best practices in ontology building and the latest Semantic Web technologies, including OWL 2, RDF and SPARQL. We also cover topics such as collaborative development, and data access and import from different data sources.

Read more about it at:
http://protege.stanford.edu/shortcourse/201810/

If you have any questions about the Protégé Short Course, please email us at: protege-shortcourse@lists.stanford.edu

Please feel free to forward this announcement to anyone who might be interested in the course. Thank you!

We look forward to seeing you at the course!

Best regards,
The Protégé Team

Hope this helps,
Sam


(Kousik) #5

Hey Sam,
THank you so much for the amazing information that uve shared with me, will read it for sure by today and hope i can get back to u with any questions i might have!!

Thanks
Ck


(Sam) #6

Anytime! I also encourage you to take your time, it is a big topic that has been applied many ways in different fields. Medical is by far the most advanced in its uses.


(Sam) #7

Suggested implementation approach that might help you make incremental progress on your goal:

  1. Using a sample, extract the ‘concepts’ (akin to classes in programming) you are interested in from your set of documents/emails. Use either manual or do it via automated assistance. For example, you could try to use Named Entity Recognition’ among many other techniques. Here is an interesting post on using NER

  2. Organize the ‘concepts’ (classes) into some kind of hierarchy. You may look for ‘parent-child’ relationships in your data and a predefined hierarchy for certain classes. Wordnet synonym sets may be able to help reduce your concept space if you are working with natural language communications. This is also a good place to ‘borrow or inherit’ someone else’s knowledge graph, for example many web applications inherit their data model from http://schema.org/

  3. Build other kinds of relationships between your ‘concepts’ (classes). Without a pre-existing relationship model, you may find it difficult to construct this in an entirely unsupervised manner. I’m suggesting you might first try convert this part of the problem into a supervised one.

  4. Use this sample as the beginning ‘data model’ and iterate on it (repeat steps 2&3) using a combination of automation and manual (according to your time/amount of data).

  5. When you stop finding new concepts and new relationships that should be added, you have a data model that covers your dataset. Now you can make a pass through the whole dataset and create ‘individuals’ (objects) of each class. If you set it up right, your individuals will have the correct relationships between each other, and thus you will have a knowledge graph for your input dataset.

  6. use your knowledge graph in your desired task :slight_smile: