python - Bigrams with NLTK: problems with script -
i trying "calculate" bigrams in corpus nltk. however, there still bugs in script seems. can't figure out doing wrong, hope able give me @ least clue. please keep in mind, new this. thanks!
tekst.collocations() bgm = nltk.collocations.bigramassocmeasures() finder = bigramcollocationfinder.from_words(mijn_corpus) # mijn_corpus should it's loc finder.apply_freq_filter(3) # filter out ones appear 1,2 times finder.nbest(bgm.pmi, 10) scored_bgm = finder.score_ngrams( bgm.likelihood_ratio ) prefix_keys = collections.defaultdict(list) key, scores in scored: # sorting on first word of bigram prefix_keys[key[0]].append((key[1], scores)) key in prefix_keys: #strongest association prefix_keys[key].sort(key = lambda x: -x[1])
Comments
Post a Comment