jdbc - clojure sqlkorma library: out of memory error -


i'm doing thought straightforward task: run sql query (over 65k rows of data) using sqlkorma library (http://sqlkorma.com), , each row transforming in way, , writing csv file. don't think 65k rows large given have 8gb laptop, assumed sql result set lazily fetched , whole thing never held in memory @ same time. really surprised when ended stack trace:

exception in thread "main" java.lang.outofmemoryerror: java heap space @ clojure.lang.persistenthashmap$bitmapindexednode.assoc(persistenthashmap.java:777) @ clojure.lang.persistenthashmap.createnode(persistenthashmap.java:1101) @ clojure.lang.persistenthashmap.access$600(persistenthashmap.java:28) @ clojure.lang.persistenthashmap$bitmapindexednode.assoc(persistenthashmap.java:749) @ clojure.lang.persistenthashmap$transienthashmap.doassoc(persistenthashmap.java:269) @ clojure.lang.atransientmap.assoc(atransientmap.java:64) @ clojure.lang.persistenthashmap.create(persistenthashmap.java:56) @ clojure.lang.persistenthashmap.create(persistenthashmap.java:100) @ clojure.lang.persistentarraymap.createht(persistentarraymap.java:61) @ clojure.lang.persistentarraymap.assoc(persistentarraymap.java:201) @ clojure.lang.persistentarraymap.assoc(persistentarraymap.java:29) @ clojure.lang.rt.assoc(rt.java:702) @ clojure.core$assoc.invoke(core.clj:187) @ clojure.core$zipmap.invoke(core.clj:2715) @ clojure.java.jdbc$resultset_seq$thisfn__204.invoke(jdbc.clj:243) @ clojure.java.jdbc$resultset_seq$thisfn__204$fn__205.invoke(jdbc.clj:243) @ clojure.lang.lazyseq.sval(lazyseq.java:42) @ clojure.lang.lazyseq.seq(lazyseq.java:60) @ clojure.lang.cons.next(cons.java:39) @ clojure.lang.persistentvector.create(persistentvector.java:51) @ clojure.lang.lazilypersistentvector.create(lazilypersistentvector.java:31) @ clojure.core$vec.invoke(core.clj:354) @ korma.db$exec_sql$fn__343.invoke(db.clj:203) @ clojure.java.jdbc$with_query_results_star_.invoke(jdbc.clj:669) @ korma.db$exec_sql.invoke(db.clj:202) @ korma.db$do_query$fn__351.invoke(db.clj:225) @ clojure.java.jdbc$with_connection_star_.invoke(jdbc.clj:309) @ korma.db$do_query.invoke(db.clj:224) @ korma.core$exec.invoke(core.clj:474) @ db$query_db.invoke(db.clj:23) @ main$_main.doinvoke(main.clj:32) @ clojure.lang.restfn.applyto(restfn.java:137) 

as far can tell stack, has not made outside query code (meaning hasn't reached transformation/write csv code @ all). if matters, sql straightforward, select * my_table some_id not null , rownum < 65000 order some_id asc. oracle (to explain rownum above), don' think matters.

edit:

code sample:

(defmacro query-and-print [q] `(do (dry-run ~q) ~q)) (defn query-db []       (query-and-print          (select my_table              (where (and (not= :my_id "bad data")                         (not= :my_id nil)                         (raw (str "rownum < " rows))))             (order :my_id :asc))))  ; args contains rows 65000, , configure-app sets jdbc ; connection string, , sets var rows value (defn -main [& args]     (when (configure-app args)          (let [results (query-db)               dedup (dedup-with-merge results)]             (println "result size: " (count results))             (println "dedup size: " (count dedup))             (to-csv "target/out.csv" (transform-data dedup))))) 

clojure.java.sql creates lazy sequences:

(defn resultset-seq "creates , returns lazy sequence of maps corresponding  rows in java.sql.resultset rs. based on clojure.core/resultset-seq  respects current naming strategy. duplicate column names  made unique appending _n before applying naming strategy (where  n unique integer)." [^resultset rs] (let [rsmeta (.getmetadata rs)       idxs (range 1 (inc (.getcolumncount rsmeta)))       keys (->> idxs              (map (fn [^integer i] (.getcolumnlabel rsmeta i)))              make-cols-unique              (map (comp keyword *as-key*)))       row-values (fn [] (map (fn [^integer i] (.getobject rs i)) idxs))       rows (fn thisfn []              (when (.next rs)                (cons (zipmap keys (row-values)) (lazy-seq (thisfn)))))]   (rows))) 

korma realizes sequence dropping each row vector:

(defn- exec-sql [{:keys [results sql-str params]}] (try (case results   :results (jdbc/with-query-results rs (apply vector sql-str params)              (vec rs))   :keys (jdbc/do-prepared-return-keys sql-str params)   (jdbc/do-prepared sql-str params)) (catch exception e   (handle-exception e sql-str params)))) 

Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

android - send complex objects as post php java -

charts - What graph/dashboard product is facebook using in Dashboard: PUE & WUE -