ldt.dicts package

Subpackages

Submodules

ldt.dicts.dictionary module

Dictionary class

This module creates the base ldt dictionary class that is inherited by classes for individual resources. It stores the global variables that are either set by the user or read from the config.py file in the user’s home directory.

Basic functionality required in any subclass:

  • checking if the queried word has an entry in the resource;
  • retrieve the list of word with the specified relation;
  • retrieve a dictionary with specified relations as values and lists of related words as values
class ldt.dicts.dictionary.Dictionary(language='English', lowercasing=True)[source]

Bases: object

The base LDT dictionary class.

It stores the global variables that are used by all LDT dictionary classes. These variables can be set individually at any point in work. If none are provided, the default values from ldt config file are used.

Note

Any future resources extending LDT should inherit from this class.

is_a_word(word)[source]

Stub for the compulsory method for all subclasses that determines the existence of an entry.

Parameters:word (str) – the word to be looked up
Returns:whether the target word has an entry in the resource
Return type:(bool)
language

Gets or sets the language of the current dictionary. Depending on the dictionary, this may involve additional processing.

ldt.dicts.metadictionary module

ldt.dicts.resources module

Extra resources classes

This module implements the base dictionary classes for looking up names, numbers, stopwords and any other word categories that may be useful and can be defined as lookup in a simple resource file. Each of these dictionaries only needs to implement an is_a_word method.

class ldt.dicts.resources.AssociationDictionary(language='English', lowercasing=True, path=None, resource='associations')[source]

Bases: ldt.dicts.resources.ResourceDict

A class for language-specific name resources.

class ldt.dicts.resources.FileDictionary(language='en', lowercasing=False, path='helpers/generic_files/file_extensions.vocab', resource='file')[source]

Bases: ldt.dicts.resources.ResourceDict

A class for language-specific name resources.

Determining if two words are related: a helper method for resources with lists of related words per word entry.

Note

The relations are assumed to be bidirectional. That does not really apply to associations, but in the ldt use case (evaluation of target:neighbor word pairs) it is hard to justify that only one direction should be taken into account, and if so, than what direction it should be.

Parameters:
  • word2 (word1,) – the words to check
  • freq (bool) – True if the entries are frequency dictionaries (currently it’s only the case for corpus cooccurrences)
Returns:

True if the words are found to be related,

or cooccurrence frequency in case of cooccurrence dictionary resource

Return type:

(bool, int)

is_a_word(word)[source]

Stub for the compulsory method for all subclasses that determines the existence of an entry.

Parameters:word (str) – the word to be looked up
Returns:whether the target word has an entry in the resource
Return type:(bool)
class ldt.dicts.resources.NameDictionary(language='English', lowercasing=True, path=None, resource='names')[source]

Bases: ldt.dicts.resources.ResourceDict

A class for language-specific name resources.

Determining if two words are related: a helper method for resources with lists of related words per word entry.

Note

The relations are assumed to be bidirectional. That does not really apply to associations, but in the ldt use case (evaluation of target:neighbor word pairs) it is hard to justify that only one direction should be taken into account, and if so, than what direction it should be.

Parameters:
  • word2 (word1,) – the words to check
  • freq (bool) – True if the entries are frequency dictionaries (currently it’s only the case for corpus cooccurrences)
Returns:

True if the words are found to be related,

or cooccurrence frequency in case of cooccurrence dictionary resource

Return type:

(bool, int)

class ldt.dicts.resources.NumberDictionary(language='English', lowercasing=True, resource='numbers')[source]

Bases: ldt.dicts.resources.ResourceDict

A class for language-specific name resources.

Determining if two words are related: a helper method for resources with lists of related words per word entry.

Note

The relations are assumed to be bidirectional. That does not really apply to associations, but in the ldt use case (evaluation of target:neighbor word pairs) it is hard to justify that only one direction should be taken into account, and if so, than what direction it should be.

Parameters:
  • word2 (word1,) – the words to check
  • freq (bool) – True if the entries are frequency dictionaries (currently it’s only the case for corpus cooccurrences)
Returns:

True if the words are found to be related,

or cooccurrence frequency in case of cooccurrence dictionary resource

Return type:

(bool, int)

is_a_word[source]

Returns True if the word is an ordinal or cardinal numeral, or if if contains an Arabic number.

Parameters:word (str) – a potential number
Returns:True if the word is or contains a number.
Return type:(bool)
class ldt.dicts.resources.ResourceDict(path=None, resource='names', language='English', lowercasing=True, corpus='Wiki201308', freq=False, wordlist=None)[source]

Bases: ldt.dicts.dictionary.Dictionary

A class for simple resources like vocabulary lists, which only require simple lookup.

Determining if two words are related – a helper method for resources with lists of related words per word entry.

Note

The relations are assumed to be bidirectional. That does not really apply to associations, but in the ldt use case (evaluation of target:neighbor word pairs) it is hard to justify that only one direction should be taken into account, and if so, than what direction it should be.

Parameters:
  • word2 (word1,) – the words to check
  • freq (bool) – True if the entries are frequency dictionaries (currently it’s only the case for corpus cooccurrences)
Returns:

True if the words are found to be related,

or cooccurrence frequency in case of cooccurrence dictionary resource

Return type:

(bool, int)

is_a_word[source]
path = None

the path from which the resource is loaded

class ldt.dicts.resources.WebDictionary(language='en', lowercasing=False, path='helpers/generic_files/web_domains.vocab', resource='domain')[source]

Bases: ldt.dicts.resources.ResourceDict

A class for language-specific name resources.

Determining if two words are related: a helper method for resources with lists of related words per word entry.

Note

The relations are assumed to be bidirectional. That does not really apply to associations, but in the ldt use case (evaluation of target:neighbor word pairs) it is hard to justify that only one direction should be taken into account, and if so, than what direction it should be.

Parameters:
  • word2 (word1,) – the words to check
  • freq (bool) – True if the entries are frequency dictionaries (currently it’s only the case for corpus cooccurrences)
Returns:

True if the words are found to be related,

or cooccurrence frequency in case of cooccurrence dictionary resource

Return type:

(bool, int)

is_a_word(word)[source]

Stub for the compulsory method for all subclasses that determines the existence of an entry.

Parameters:word (str) – the word to be looked up
Returns:whether the target word has an entry in the resource
Return type:(bool)