Skip to content
This repository was archived by the owner on Feb 22, 2021. It is now read-only.
This repository was archived by the owner on Feb 22, 2021. It is now read-only.

Issue creating corpus #32

@RishabGargeya

Description

@RishabGargeya

Getting this error:

[info] Assembly up to date: /home/rg203/work/scripts/wiki2vec/target/scala-2.10/wiki2vec-assembly-1.0.jar
[success] Total time: 2 s, completed Jan 5, 2017 7:29:26 AM
Creating Readable Wiki..
Exception in thread "main" java.io.IOException: Stream is not in the BZip2 format
	at org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream.init(BZip2CompressorInputStream.java:255)
	at org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream.<init>(BZip2CompressorInputStream.java:138)
	at org.idio.wikipedia.dumps.ReadableWiki.getWikipediaStream(ReadableWiki.scala:19)
	at org.idio.wikipedia.dumps.ReadableWiki.createReadableWiki(ReadableWiki.scala:31)
	at org.idio.wikipedia.dumps.CreateReadableWiki$.main(ReadableWiki.scala:55)
	at org.idio.wikipedia.dumps.CreateReadableWiki.main(ReadableWiki.scala)
Creating Word2vec Corpus
/home/rg203/work/scripts/wiki2vec/working/spark-1.2.0-bin-hadoop2.4/bin/spark-class: line 113: [: : integer expression expected
/home/rg203/work/scripts/wiki2vec/working/spark-1.2.0-bin-hadoop2.4/bin/spark-class: line 187: /usr/lib/jvm/java-8-oracle/jre/bin/java/bin/java: Not a directory
/home/rg203/work/scripts/wiki2vec/working/spark-1.2.0-bin-hadoop2.4/bin/spark-class: line 187: exec: /usr/lib/jvm/java-8-oracle/jre/bin/java/bin/java: cannot execute: Not a directory
Joining corpus..
cat: 'part*': No such file or directory
 ^___^ corpus : /home/rg203/work/scripts/wiki2vec/spanish_output//eswiki.corpus

Any ideas? Thanks for the help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions