Skip to content

Commit 994ef36

Browse files
author
sboesch
committed
Do not kill sparkcontext: also updated README.md
1 parent cdef532 commit 994ef36

File tree

2 files changed

+60
-1
lines changed

2 files changed

+60
-1
lines changed

README.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,63 @@ This package contains a set of distributed text modeling algorithms implemented
66
- **Gibbs sampling LDA**: the implementation is adapted from Spark PRs(#1405 and #4807) and JIRA SPARK-5556 (https://github.com/witgo/spark/tree/lda_Gibbs, https://github.com/EntilZha/spark/tree/LDA-Refactor, https://github.com/witgo/zen/tree/lda_opt/ml, etc.), with several extensions (e.g., support for MLlib interface, predict and in-place state update) added
77

88
- **Online HDP (hierarchical Dirichlet process)**: implemented based on the paper "Online Variational Inference for the Hierarchical Dirichlet Process" (Chong Wang, John Paisley and David M. Blei)
9+
10+
- **Notes from Stephen Boesch December 2017**
11+
12+
13+
This Repo lacked working code for the HDP. I added an ```OnlineHDPExample``` program. In addition the dependencies were udpated to Spark 2.2 and Scala 2.11 and latest Breeze (linear algebra library).
14+
15+
To run the example:
16+
17+
```mvn exec:java -Dexec.mainClass="org.apache.spark.mllib.topicModeling.OnlineHDPExample" -Dexec.args="--master local --stopwordFile src/main/resources/stopwords.txt --maxDocs 100 --maxIterations 2 /git/topmetrics/data/mininews"```
18+
19+
Note: *maven* is unable to stop the job properly - and so a spurious error message is generated at the end: something like:
20+
21+
22+
```
23+
Results
24+
LDAMetrics(OnlineHDP,274,-2147.483648,2147.483647,List()),LDAMetrics(OnlineHDP,274,-2147.483648,2147.483647,List())
25+
17/12/21 23:52:28 INFO AbstractConnector: Stopped Spark@5b3c8e38{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
26+
17/12/21 23:52:28 WARN FileSystem: exception in the cleaner thread but it will continue to run
27+
java.lang.InterruptedException
28+
at java.lang.Object.wait(Native Method)
29+
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
30+
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
31+
at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:2989)
32+
at java.lang.Thread.run(Thread.java:748)
33+
[WARNING] thread Thread[org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner,5,org.apache.spark.mllib.topicModeling.OnlineHDPExample] was interrupted but is still alive after waiting at least 12891msecs
34+
[WARNING] thread Thread[org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner,5,org.apache.spark.mllib.topicModeling.OnlineHDPExample] will linger despite being asked to die via interruption
35+
[WARNING] NOTE: 1 thread(s) did not finish despite being asked to via interruption. This is not a problem with exec:java, it is a problem with the running code. Although not serious, it should be remedied.
36+
[WARNING] Couldn't destroy threadgroup org.codehaus.mojo.exec.ExecJavaMojo$IsolatedThreadGroup[name=org.apache.spark.mllib.topicModeling.OnlineHDPExample,maxpri=10]
37+
java.lang.IllegalThreadStateException
38+
at java.lang.ThreadGroup.destroy(ThreadGroup.java:778)
39+
at org.codehaus.mojo.exec.ExecJavaMojo.execute(ExecJavaMojo.java:321)
40+
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
41+
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
42+
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
43+
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
44+
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
45+
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
46+
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
47+
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
48+
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
49+
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
50+
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
51+
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
52+
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
53+
at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
54+
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
55+
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
56+
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
57+
at java.lang.reflect.Method.invoke(Method.java:498)
58+
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
59+
at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
60+
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
61+
at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
62+
[INFO] ------------------------------------------------------------------------
63+
[INFO] BUILD SUCCESS
64+
[INFO] ------------------------------------------------------------------------
65+
```
66+
67+
You can safely ignore that ```ThreadGroup.destroy``` error.
68+

src/main/scala/org/apache/spark/mllib/topicModeling/OnlineHDPExample.scala

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -163,7 +163,6 @@ object OnlineHDPExample {
163163

164164
println(s"Finished training ${getClass.getSimpleName}")
165165
println(s"Results\n${results.mkString(",")}")
166-
sc.stop()
167166
results
168167
}
169168

0 commit comments

Comments
 (0)