ldt.relations package

Submodules

ldt.relations.antonymy_by_derivation module

Sometimes dictionaries lack antonymy relations for derivational pairs where one of the word has a negative suffix or prefix (e.g. regular ~ irregular). LDT attempts to establish the relation through lists of such language-specific suffixes/prefixes that negate the meaning of the stem.

It is also possible to detect pairs of words with complementary affixes, such as -ful and -less in careful : careless.

Example

>>> test_dict = ldt.relations.antonymy_by_derivation.DerivationalAntonymy(language="English")
>>> test_dict.detect_anonymy("regular", "irregular")
True
>>> test_dict.detect_anonymy("pre-war", "post-war")
True
>>> test_dict.detect_anonymy("regular", "cat")
False
class ldt.relations.antonymy_by_derivation.DerivationalAntonymy(language)[source]

Bases: object

Dictionary of language-specific derivational patterns that could be used to detect derivational antonymy.

The main method for detecting derivational antonyms.

Parameters:
  • neighbor (target,) – the words to check for this
  • relationship;
Returns:

True if derivational antonymy was detected.

Return type:

(bool)

ldt.relations.distribution module

This module provides functionality for retrieval of distributional information.

class ldt.relations.distribution.DistributionDict(language='English', corpus='Wiki201308', frequencies=True, gdeps=False, cooccurrence=False, cooccurrence_freq=False, wordlist=None)[source]

Bases: object

The class provides a single interface to retrieving distributional information in ldt: cooccurrence in the specified corpus, in google dependency ngrams, and freqnuency in the specified corpus.

Parameters:
  • language (str) – the language of the resource
  • corpus (str) – the corpus, for which the distributional information is to be retrieved
  • gdeps (bool) – whether to use google dependency resource (memory-intensive)
  • cooccurrence (bool) – whether to use cooccurrence information (memory-intensive)
  • cooccurrence_freq (bool) – if True, cooccurrence counts are returned rather than booleans (even more memory-intensive)
  • wordlist (list of str) – if a wordlist is provided, the resources with distributional data will be filtered down to the words in the wordlist, significantly decreasing the memory usage
analyze(target, neighbor)[source]

Helper method for retrieving distributional data, if the corpus was specified in config file.

Parameters:
  • target (ldt Word object) – the target word
  • neighbor (ldt Word object) – the neighbor word
  • res (dict) – dictionary with already-discovered relations
Returns:

dictionary with already-discovered relations, updated with distributional data.

Return type:

(dict)

cooccur_in_corpus(word1, word2)[source]

Wrapper method for retrieving cooccurrence information.

Parameters:word (str) – the word to look up.
Returns:True if the two words do cooccur.
Return type:(bool)
cooccur_in_gdeps(word1, word2)[source]

Wrapper method for retrieving cooccurrence information for google dependency ngram resource.

Parameters:word (str) – the word to look up.
Returns:True if the two words do cooccur in google dependency ngrams.
Return type:(bool)
cooccurrence = None

ResourceDict – cooccurrence resource

frequencies = None

ResourceDict – frequency dictionary

frequency_in_corpus(word)[source]

Wrapper method for retrieving word frequency.

Parameters:word (str) – the word to look up.
Returns:the frequency of the word in the corpus.
Return type:(int)
gdeps = None

ResourceDict – google dependency resource

language = None

str – the language of the resource

ldt.relations.pair module

This module provides functionality for detecting relations in a pair of words.

Examples

>>> relation_analyzer = ldt.relations.RelationsInPair()
>>> relation_analyzer.analyze("black", "white")
{'Hyponyms': True,
 'SharedMorphForm': True,
 'SharedPOS': True,
 'Synonyms': True,
 'Antonyms': True,
 'ShortestPath': 0.058823529411764705,
 'Associations': True,
 'TargetFrequency': 491760,
 'NeighborFrequency': 509267}
class ldt.relations.pair.RelationsInPair(language='English', lowercasing=True, derivation_dict=None, normalizer=None, lex_dict=None, ontodict=None, association_dict=None)[source]

Bases: ldt.dicts.dictionary.Dictionary

This class implements analyzer for all possible relation types in a word pair.

Parameters:
  • language (str) – the language of the dictionaries
  • lowercasing (bool) – whether output from all resources should be lowercased
  • derivation_dict (ldt dictionary object) – see DerivationAnalyzer
  • normalizer (ldt dictionary object) – see ldt.dicts.normalize.Normalization
  • lex_dict (ldt dictionary object) – see ldt.dicts.semantics.metadictionary.MetaDictionary
  • ontodict (ldt dictionary object) – see ldt.relations.ontology_path.ontodict.OntoDict
  • association_dict (ldt dictionary object) – see ldt.dicts.resources.AssociationDictionary
  • Note – If no wordlist is provided, cooccurrence and google dependency information will be disabled. The wordlist is supplied automatically during large annotation experiments on the basis of extracted target:neighbor pairs. You can also provide your own.
analyze(target, neighbor, silent=True, debugging=False)[source]

Catch-all wrapper for _analyze() that ensures that large-scale annotation continues even if something breaks on a particular pair. The offending data will be logged in experiment metadata.

is_a_word(word)[source]

Stub for the compulsory method for all subclasses that determines the existence of an entry.

Parameters:word (str) – the word to be looked up
Returns:whether the target word has an entry in the resource
Return type:(bool)

Helper function for identifying matches in lexicgraphic relations.

All relations except hyponymy and hypernymy are treated as symmetrical; hyponymy and hypernymy are identified in the target:neighbor direction.

Parameters:
  • target – the ldt word object for the target word.
  • neighbor – the ldt word object for the neighbor word.
Returns:

the lexicographic relations that the two words are related as.

Return type:

(list of str)

ldt.relations.pair.get_candidate_words(dictionary)[source]

ldt.relations.word module

This module provides an alternative, word-based interface for assembling all the information from all the ldt resources.

Example

>>> word = ldt.relations.word.Word("fishy")
>>> word.pp_info()
======MORPHOLOGICAL INFO======
POS :  ['adjective']
IsLemma :  True
Lemmas :  ['fishy']
======DERIVATIONAL INFO======
Stems :  ['fish']
Suffixes :  ['-y']
Prefixes :  []
OtherDerivation :  []
RelatedWords :  ['cold fish', 'fishline', 'fishwoman', 'fish
pondfishpond', 'fishery', 'fishmoth', 'unfishy', 'fishmonger',
'fish sauce', 'starfish', 'fish feed', 'jellyfish', 'shellfish',
'fishgig', 'fish tankfishtank', 'fish bowlfishbowl', 'surgeonfish',
'fishnetfishnet stockings', 'fishpox', 'theres plenty more fish in the
sea', 'have other fish to fry', 'fishcake', 'fish hookfishhook',
'fish slice', 'fishling', 'drink like a fish', 'tuna fish', 'fishpound',
'lumpfish', 'queer fish', 'overfish', 'like shooting fish in a barrel',
'fish and chips', 'swim like a fish', 'fish pastefishpaste', 'fishless',
'fishtail', 'fish food', 'fishable', 'fishbrain', 'fishmeal',
'goatfish', 'dragonfish', 'goldfish', 'fishing', 'fisher',
'unfishiness', 'fishly', 'fish finger', 'fish-eating grin', 'fish out',
'give a man a fish and you feed him for a day teach a man to fish and
you feed him for a lifetime', 'silverfish', 'fish', 'big fish in a small
pond', 'fish ladder', 'fishskin', 'fish out of water', 'fish tape',
'fishkill', 'fishroom', 'fishworm', 'neither fish nor fowl', 'fishful',
'fishway', 'fishy', 'sailfish', 'fishwife', 'fishlike', 'fishskin
disease', 'fisherman', 'fish supper', 'fishkind', 'bony fish']
======SEMANTIC INFO======
Synonyms :  ['fishlike', 'fishly', 'fishy', 'fishy wishy', 'funny',
'ichthyic', 'piscine', 'shady', 'suspect', 'suspicious']
======EXTRA WORD CLASSES======
ProperNouns :  False
Noise :  False
Numbers :  False
URLs :  False
Hashtags :  False
Filenames :  False
ForeignWords :  False
Misspellings :  False
class ldt.relations.word.Word(original_spelling, derivation_dict=None, normalizer=None, lex_dict=None)[source]

Bases: object

Class that binds together all linguistic information about a word from across ldt.dicts modules. This is simply to provide an alternative interface to all the information in the setting where all the different types of information are queried across vocabulary. If only a few resources are needed, it is more efficient to use the necesary dicts modules directly.

analyze[source]
pp_info()[source]

Pretty printing all the attributes of the word object.

Module contents