Skip to content

zesch/linguistic-features-in-text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

282 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Linguistic Features in Text (LiFT)

LiFT is a library for extracting linguistic features from textual data.

LiFT is currently maintained by:

First steps

See: First Steps with LiFT

Philosophy

We rely on a UIMA CAS repesentation model based on the DKPro Core type system and preprocessing components. This makes LiFT multi-lingual, supporting all the languages included in DKPro Core. However, not all structures might be supported in each language.

LiFT distinguishes betwen linguistic structures (lemmas, POS tags, syllables, spelling errors, etc.) and features (based on these structures). Structures are represented in the document model and can be visualized. Features are numeric values that represent properties of the document, e.g. SpellingErrorRatio may have a value of 0.06 meaning that 6% of all tokens in the text contain a spelling error.

The project is under heavy development, but we are working towards a stable release.

We plan to implement the following types of structures:

  • casing
  • lemmas
  • quotations
  • POS tags
  • phrases
  • spelling errors
  • stems
  • syllables
  • tokens
  • T-units
  • voice

We also support various meta-features of linguistic complexity:

  • readability measures
  • type-token ratio (TTR)

About

LiFT is a library for extracting linguistic features from textual data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 11