--Boundary-00 GG8AgQIywgLsRv
Content-Type: Multipart/Mixed;
boundaryoundary-00 GG8AgQIywgLsRv"
--Boundary-00 GG8AgQIywgLsRv
Content-Type: text/plain;
charset s-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Hello,
Just to share an experience of something that happened to me today; i wrote a
script and it took all the memory of my computer and I had no idea why...
Then after many hours, I had a flash.. Maybe SyncEnumerator was guilty, since
it (does it?) seems to use continuations...
Well.. in my code I actually could change the use of SyncEnumerator by an
Array#zip.. the result: the calculation took about no memory (instead of
several hundred megabytes) and finished almost instantly (as it always should
have)!
So, maybe just a warning: SyncEnumerator is not as cheap as it seems: prefer
Array#zip, and at the moment I can't think of a reason when SyncEnumerator
brings something over Array#zip since Array#zip is so much faster...
Attached a script.. it doesn't actually work to the end (the script is
working on data that I can't send here), but you can see the difference if
running with SyncEnumerator or Array#zip. Amazing!
At line 12 in the script:
# s yncEnumerator.new(persons, values)
s ersons.zip(values)
you can switch the comment between the two lines to see the change.
my conclusion for now: maybe SyncEnumerator will be one day more readable than
Array#zip, but in current implementations of ruby, it's way too slow..
emmanuel
--Boundary-00 GG8AgQIywgLsRv
Content-Type: text/plain;
charset s-ascii";
name az.cvs"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="raz.cvs"
198 199 200 201 202 203 204 205 206 207
col1 M M M M M M M M M M
col2 25 24 25 23 25 22 20 18 25 28
col3 aaaaaaaaaaa aaaaaaa aaaaaa aašaaa šaaaaaa aaaaaaaaaaa šaaaaaa aaaaaaaaaaa a.a. aaaaaaa aaaaa
col4 b b b b b b b b.o. a b
col5 4 4 3 4 3 5 3 3 3 2
M N S N N N M M N M
aaaaa - - aaaaa aaaaša - - - - - aaaaaaa, aaaaaaa aaaaaaaaaa aaaaa -
198 199 200 201 202 203 204 205 206 207
208 209 210 211 212 213 214 215 216 217
col1 M M M M M M Ž Ž M Ž
col2 14 25 24 25 22 17 25 23 25 20
col3 aaaaaaašaaaa aaaaaaaaa aaaaaa aaaaaaa aaaaaaaaaaa aaaaaaaa aaaaaaa aaaaa aaaaaaa šaaaaaaaa šaaaaaa šaaaaaaaa
col4 b a b b a a b b b b
col5 4 4 2 4 5 3 2 5 3 3
N N S N N N N N N N
208 209 210 211 212 213 214 215 216 217
208 209 210 211 212 213 214 215 216 217
218 219 220 221 222 223 224 225 226 227
col1 izloen, opisan primer 38 letnega Ž M M Ž Ž M M M Ž
col2 21 25 25 25 19 20 20 18 24
col3 šaaaaaaaa šaaaaaa aaaaaaaa aaaaaa aaaaaaa aaaaaaaaa aaaaaa aaaaaaaa aaaaaaa šaaa aaaaaa aaaaaaa šaaa aaaaaaa aaaaaaaaaa šaaaaaaaa
col4 b b a b b b a a b
col5 4 4 4 3 4 2 3 2 5
218 219 220 221 222 223 224 225 226 227
M N S M M M M
218 219 220 221 222 223 224 225 226 227
228 229 230 231 232 233 234 235 236 237
col1 M M M M M M M M Ž M
col2 25 22 25 23 23 20 25 27 22 24
col3 aaaaaaaa aaaaaaa-aaaaaaaa aaaaaaaaaaa aaaaaaa aaaaaaaa aaaaaa aaaaaaaaaaa aaaaaaaaa aaaaaa aaaaa šaaaaaaaa aaaaa aaaaaa
col4 b b.o. b a b b b b a b
col5 2 3 3 4 3 4 3 3 5 3
M M M N E M N N N S
228 229 230 231 232 233 234 235 236 237
228 229 230 231 232 233 234 235 236 237
238 239 240 241 242 243 244 245 246 247
col1 a Ž a a aaaaaa, aaaaaa aaaaaa aa aaaaaaa aaaaaa, aaaaaaaaaaaa aaaaa a a a a
col2 25 24 20 19 25 21 21 18
col3 žaaaaaaaa a.a. a.a. aaaaa šaaaaaa aaaaaaa šaaa aaaaa aaaaa
col4 a b b b b b a b
col5 3 do4 2 3 2 4 3 3 do 4 4
238 239 240 241 242 243 244 245 246 247
E M N E S N E M
238 239 240 241 242 243 244 245 246 247
248 249 250 251 252 253 254 255 256 257
col1 Ž M M M Ž M M M M M
col2 21 24 22 25 25 21 24 22 19 24
col3 aaaaaaaaa aaaaaa aaaa aaaaaaa šaaaaaa a.a. šaaaaaaaa šaaaaaa aaaaaaaaaa šaaaaaa aaaaa aaaaaaaa aaaaaa
col4 b.o. brez poklica b a b b b.o. b a b
col5 4 2 4 3 5 4 do 5 3 4 4 2
M M M M M N S E N M
248 249 250 251 252 253 254 255 256 257
258 259 260 261 262 263 264 265 266 267
col1 M M M M M M M M M M
col2 17 21-22 23 25 23 29 28 20 26 21
col3 aaaaaaašaaaa aaaaaaa aaaaaaaaaaa aaaaaaa, aaaaaaaa aa aaaa aaaaaaaaa šaaaaaa aaaaaaaaa aaaaa aaaaaaa
col4 b b b b.o. b a b b b a
col5 1-6razred 5,7-8 pa2 3 3 3 5 3 3 3 3 2
M M N S E S N S M S
258 259 260 261 262 263 264 265 266 267
258 259 260 261 262 263 264 265 266 267
268 269 270 271 272 273 274 275 276 277
col1 M M M M M M M Ž M M
col2 22 20 21 22 22 24 21 17 25 15
col3 aaaaaaaašaa aaaaaa aaaaaaa aaaaa aa šaaa aaaaa aa šaaa aaaaa aaaaaaa aaaaaa aaaaaaaa aaaaaa aa aaaaa šaaa aaaaaa a aa aaaaa aaa
col4 b b b b b b b b b b
col5 b.o. 3 5 3 2 3 3 3 2 3
268 269 270 271 272 273 274 275 276 277
M M N N M N M N S E
268 269 270 271 272 273 274 275 276 277
278 279 280 281 282 283 284 285 286 287
col1 M M M M M M M
col2 21 22 18 26 23 21 24 23 25 24
col3 aa aaaaa aaa aa aaaaa aaa aa aaaaa aaa aa aaaaa aaa aaaaaaaa aaaaaa aaaaaaa aa aaaaa aaa aaaaaaaaaaa aaaaaa a.a. aaaa a aaaaaaaa a aaaaaaaa, aaaaaa aaaaaa
col4 b b b b b b b b b b
col5 3 5 b.o. 5 3 ali4 3 4 5 2 4
278 279 280 281 282 283 284 285 286 287
S N M E N M E N N M
278 279 280 281 282 283 284 285 286 287
288 289 290 291 292 293 294 295 296 297
col1 M M M M M M M M M M
col2 18 20 23 25 21 25 21 23 20 27
col3 aaaaa aa aaa a.a. aaaaaaaaaa aaaaaa aa aaaaa aaa aaaaaa aaaaa aaaaa a aaaaa aa aaa aaaaaa
col4 b b b b.o. a b b a b b
col5 3 3 4 b.o. 3 b.o. 3 2 4 3
288 289 290 291 292 293 294 295 296 297
M M N M E M N M N N
288 289 290 291 292 293 294 295 296 297
298 299 300 301 302 303 304 305 306 307
col1 M M M M M M M M M M
col2 25 25 18 20 21 28 22 25 22 24
col3 aaaaaaaaa aaaaaa aaaaaaa aaaaaaaaaaa aaaaa aa aaaaa aaa aaaaaa a.a. aaaa aaaaaaa a.a. aaaaaaa aaaaaa aaaaaaaaa aaaaaa
col4 b b b b b b a a b b
col5 3 2 zadosten 5 3 3 2 2 2 4
N S N N E M M E S N
298 299 300 301 302 303 304 305 306 307
308 309 310 311 312 313 314 315 316 317
col1 M M M M M M M ni ankete
col2 25 24 18 23 25 24 23 22 21
col3 aaaaaaaaaa a.a. aa aaaaa aaa aaaaa aaaaaa: aaaaaaa aaaaaaa aaaaaaaa aa aaaaa aaa aa aaa
col4 b b b b b b b b b
col5 3 2 5 3 3 4 4 5 2
N S M N N M M M E
308 309 310 311 312 313 314 315 316 317
318 319 320 321 322 323 324 325 326 327
col1 M M M M M M M M ni ankete
col2 19 21 26 20 20 23 23 24 23
col3 aaaaaaaaaaaaaaa aaaaaaa aaaaa aaaaaaaaa aa aaaaa aaa aaaaa aaaaaaa a aaaaaaa aaaaaa a.a. aaaaa
col4 b a b b b b b b b
col5 4 2 3 5 3 2 3 2 b.o.
M N N M M N E E M
318 319 320 321 322 323 324 325 326 327
328 329 330 331 332 333 334 335 336 337
col1 M M M M M M
col2 24 23 24 17 22 22 21 24 19 24
col3 aa aaaaa aaa aa aaaaa aaa a.a. aa aaaaa aaa aaaaaaaaaaaaaaa a.a. aa aaaaa aaa aaaaa-aaaaaa aaaaaa
col4 b b b.o. b.o. b b b b a b
col5 4 5 4 4 3 b.o. 4 4 3 4
N N N M N M N N N E
328 329 330 331 332 333 334 335 336 337
338 339 340 341 342 343 344 345 346 347
col1 M M M M M
col2 24 17 20 24 18 25 21 15 16 20
col3 aa aaaaa aaa aa aaaaa aaa aa aaaaa aaa aaaaaaaaaaa aa aaaaa aaa aaaaaaaa aa aaaaa aaa aaaaa aa aaa aaaaa aa aaa aaaaa aa aaa
col4 b b b b b b b b b b
col5 4 3 4 3 4 3 5 3 4 5
M N M M M E S M N M
338 339 340 341 342 343 344 345 346 347
348 349 350 351 352 353 354 355 356 357
col1 M M M M M M M M
col2 20 22 25 24 20 24 23 20 20 23
col3 aa aaaaa aaa aaaaa aa aaa aaaaaaaaa aaaaaa aaaaaaa a.a. aaaa aaaaaaaaaa aaaa aaaaaaaaaaaa a aaaaaaa aa aaaaa aaa aaaaaa aaa
col4 b b b b b b b b b a
col5 5 5 3 3 4,5 3 3 4 2 3
E N N S N N N S N E
348 349 350 351 352 353 354 355 356 357
348 349 350 351 352 353 354 355 356 357
358 359 360 361 362 363 364 365 366 367
col1 M M M ni ankete M M M M
col2 22 20 18 17 20 23 22 20 25
col3 aaaaa a.a. aa aaaaa aaa aa aaaaa aaa aaaaaaaa aaaaa aaaaaaaaa aaaaaa a.a. aaaaaaa aaaaaaa
col4 a b b b b a b b a
col5 b.o. 3 5 3 3 3 3 2 2do 3
M E N M M N M M E
358 359 360 361 362 363 364 365 366 367
368 369 370 371 372 373 374 375 376 377
col1 M M M M M M
col2 18 20 23 18 25 21 22 15 23 24
col3 aa aaaaa aaa aaaaa aa aaa aa aaaaa aaa aa aaaaa aaa aaaa.aaaaaaaaa aa aaaaa aaa aa aaaaa aaa aa aaaaa aaa aa aaaaa aaa aaaaaaaaaaa aaaaaa
col4 b b b a b b b a b a
col5 4 5 3 2 5 3 4 4 3 4
N N D M M M N E N E
368 369 370 371 372 373 374 375 376 377
378 379 380 381 382 383 384 385 386 387
col1 M M M M M M M M M
col2 25 20 20 23 23 22 25 19 25 24
col3 aaaaaaa aaaaaa aa aaaaa aaa aa aaaaa aaa aa aaaaa aaa aaaaa aaaaaaaaaaa aaaaaaaaa aaaaaaaaa aaaaa aa aaa aaaaaaaaaaa aaaaa
col4 b b b b b b a b a b
col5 4 4 4 3 do 4 2 2 3 3 3 4
M M M N N S N N M M
378 379 380 381 382 383 384 385 386 387
388 389 390 391 392 393 394 395 396 397
col1 ni ankete M M M M izloen, anketiran 40 letni moki M napaka
col2 22 20 24 21 21 18 21
col3 aa aaaaa aaa aa aaaaaaa aaa aaaaaaaaaaaaa aaaaa aaaa aaa aaaaaaa aa aaaaa aaa aaaaaaa
col4 b b a b b b b
col5 4 4 5 3 5 3 3
b.o. M N M N N M
388 389 390 391 392 393 394 395 396 397
388 389 390 391 392 393 394 395 396 397
398 399 400 401 402 403 404 405 406 407
col1 a a aaaaaa a aaaaaa aaaaaa, aaaa aaaaaaaaa aaaaa aaaaaa-aaaaa aaaa aaaaaa-aaaaa aaaa aaaaaa-aaaaa aaaa a
col2 23 25 24 21
col3 aaaaaaaa aaaaaaaa aaaaaa aaaaaaa aaaaaa-aaaaa aa aaa
col4 a b b b
col5 3 3 3 5
M M N N
398 399 400 401 402 403 404 405 406 407
408 409 410 411 412 413 414 415 416 417
col1 M M M M M M M M
col2 24 25 20 22 25 23 21 21 23 25
col3 aaaaaaaaa aaaaaa aaaaaaa aaaa aaaaaaa aaaaa aa aaa-aaaaaa aaaaaaaa aaaaaa aaaaaaa aaaaaa a.a. aaaaa aa aaa-aaaaaa aaaaaaaaa aaaaaa aaaaaa
col4 b b brez poklica b b b b b b b
col5 5 3 2 5 3 4 3do4 b.o. 4 2
N M M S N E M S E S
408 409 410 411 412 413 414 415 416 417
418 419 420 421 422 423 424 425 426 427
col1 M M M M M M
col2 16 21 25 20 25 24 21 20 24 21
col3 aaaaa aa aaa-aaaaaaaaa aaaaa aa aaa aaaaa aa aaa aaaaa aa aaa aaaaaa aaaaaaa aaaaaa aaaaaaaaaa a aaaaaaaaaaa aaaaaaa aaaaaa
col4 b b b b a b b b b.o. b
col5 3 4 4 4 4 4 4 3 2 2
N M N M N N M S S N
418 419 420 421 422 423 424 425 426 427
428 429 430 431 432 433 434 435 436 437
col1 M M M izloen-ni anketnega lista M izloen, ni anketnega lista M M
col2 20 23 17 28 24 23 23 22
col3 aaaaa aa aaa aaaaaa aaaaa aaaaaaa-aaaaaaaaaaaaaaa a aaaaaaa-aaaaaaaaa aaaaa aa aaa aaaaaaaaaaa aaaaaaaaa aaaaaaa aaaaaa-aaaaaaaaa aa aaaaaaaaa aaaaaa aaaaa aaaaa-aaaaa aa aaa
col4 a b b b b a a a
col5 3 3 5 5 3 3 2 4
E N S b.o. M S E E
428 429 430 431 432 433 434 435 436 437
438 439 440 441 442 443 444 445 446 447
col1 M M M M M M
col2 24 25 26 24 22 16 25 23 24 24
col3 aaaaaaaaa aaaaaa aaaaaaaa aaaaaa aaaaaaaaaaaa aaaaaa-aaaaa aa aaa aaaaaa-aaaaa aa aaa aaaaaaaaa-aaaaa aa aaa aaaaaaaaaa,aaaaaaa aaaaaaa aaaaaaaaaa aaaaaaaaaa
col4 b b b b b b b a b a
col5 4 4 3 3 4 5 3 3 4 3
M N N N M N N N M M
438 439 440 441 442 443 444 445 446 447
438 439 440 441 442 443 444 445 446 447
448 449 450 451 452 453 454 455 456 457
col1 M M M M M M M
col2 23 23 24 25 23 22 23 24 16 24
col3 aaaaaa aaaaaa aaaa aaaaaaa aaaaaa aaaaaaaaaaaa aaaa aaaaaaa a.a. aaaaaaaa aaaaaaaaa aaaaaaa
col4 a b brez poklica b b brez poklica b.o. b iz b v a b
col5 3 4 4 b.o. 3 2 4 5 4 4
448 449 450 451 452 453 454 455 456 457
M M M N S S E N N N
448 449 450 451 452 453 454 455 456 457
458 459 460 461 462 463 464 465 466
col1 M M M M M M M M
col2 25 24 20 25 25 20 20 24 21
col3 aaaaaa aaaaaaaaa aaaa aaaaaaa aaaaaaaa aaaaaaa aaaaaaa-aaaaaaa aaaaaaaaa aaaaaaaaaa aaaaa
col4 a a a b.o. a b b a b
col5 4,5 3 3 4 do 5 3 3 4 4 3
N M M M M E M E M
458 459 460 461 462 463 464 465 466
458 459 460 461 462 463 464 465 466
--Boundary-00 GG8AgQIywgLsRv
Content-Type: text/plain;
charset s-ascii";
name arser2.rb"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="parser2.rb"
require 'generator'
def usage(msg); puts msg; exit 1; end
def parse_line(persons, lines, member, col_name)
while (line ines.next.chomp) !~ /^#{Regexp.escape(col_name)}/
#puts "#{line} doesn't match skipping it"
end
values ine.split /\t/
values.shift # remove the column name
# s yncEnumerator.new(persons, values)
s ersons.zip(values)
s.each { |person, value|
if person.send(member)
# not overwriting if value is not nil
next
end
_value f block_given?; yield value; else value; end
person.send(member.to_s + ' _value)
}
end
class Person
attr_accessor :person_id, :is_male, :age, :job, :state,
:education
def initialize(person_id)
@person_id erson_id
end
end
if $0 __FILE__
fName RGV.shift || usage("missing filename of statistics file")
f ile.open(fName)
lines enerator.new(f)
all_persons ]
while lines.next?
line ines.next.chomp while (lines.next? && line !~ /^(\t\d+)+\s*$/)
break if line !~ /^(\t\d+)+\s*$/
# found a series of data. first line series
# of person numbers
person_ids ine.scan(/\d+/).map {|i| i.to_i}
line il # clear for next iteration
persons erson_ids.map { |p_id| Person.new(p_id) }
# now the series of genders
parse_line(persons, lines, :is_male, 'col1') { |gender_s| (gender_s 'M') }
# now the age
parse_line(persons, lines, :age, 'col2')
# now the job
parse_line(persons, lines, :job, 'col3')
# now the stanje
parse_line(persons, lines, :state, 'col4')
# now the education
parse_line(persons, lines, :education, 'col5')
all_persons.concat(persons)
puts "parsed #{all_persons.size} people"
File.open('/proc/meminfo') {|memi| puts memi.grep(/MemFree/)} if RUBY_PLATFORM /linux/
end
f.close
File.open('data.dta', 'w') do |file|
Marshal.dump(all_persons, file)
end
end
--Boundary-00 GG8AgQIywgLsRv--
--Boundary-00 GG8AgQIywgLsRv--