Entity Resolution and Knowledge Representation


Matheus Schmitz

LinkedIn

Github Portfolio

1. Load Crawled Data

IMDB

TMD

2. Blocking for Efficient Candidate Evaluation

Generate Blocks

Ground Truth

Create RLTK from Labeled Samples

Reduction Ratio

Pair Completness

3. Create Similarity Model & Predict Entity Matches

Sequence similarity: https://docs.python.org/2/library/difflib.html#sequencematcher-objects

Evaluate Predictions & Find Best Threshold

4. Define Ontology

<br>@prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# . <br>@prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . <br>@prefix xml: http://www.w3.org/XML/1998/namespace . <br>@prefix xsd: http://www.w3.org/2001/XMLSchema# . <br>@prefix schema: http://schema.org/ . <br>@prefix my_ns: http://inf558.org/myfakenamespace# .


#### Movie Class ####
my_ns:Movie a schema:Class ;
rdfs:subClassOf schema:Movie ;
schema:name schema:text ; # movie name
schema:datePublished xsd:gYear ; # year of release
schema:director schema:Person ; # movie directors
schema:author schema:Person ; # movie writers
schema:actor schema:Person ; # movie actors

5. Join Datasets & Merge Linked Entities

6. Visualize Resulting Ontology

entity_visualization.png