On Wed, Feb 12, 2014 at 9:24 AM, Himel Sarkar <hsarkar.007 / gmail.com> wrote:
> I'm trying to write a script to download my edX course videos for me. This
> is what I've written but it doesn't seem to be working. can someone help me
> out please?
>
> require 'open-uri'
> require 'rubygems'
>
> File.open("video.mp4", "w+") do |file|
>
>
> open("https://courses.edx.org/courses/HarvardX/AI12.2x/2013_SOND/courseware/fe939c73594e454da10d734884e54db2/d8c67b383a114ccfbd4e088ab77f4db0/")
> do |read_file|
>      file.write(read_file.read)
>    end
> end

It seems that in order to access that video, you first have to login.
If you check the file you generated, it's an html file with a form to
login.

In order to web scrape like you need, I recommend using the mechanize
gem, with which you can direct the navigation, fill out forms, submit,
follow links, etc programmatically. If the site doesn't require
javascript to work it works fine. I use it a lot to automate tasks on
websites. Here an example to post to an internal wiki we use at work:

        agent = Mechanize.new
        agent.user_agent_alias='Linux Mozilla'
        #get login page
        page = agent.get(URL)

        # do login
        form = page.form("login_form")
        form.username = configuration["WikiUser"]
        form.password = configuration["WikiPassword"]
        page = agent.submit(form, form.buttons.first)

        # go to B!Wiki
        page = page.link_with(:text => "B!Wiki").click
        # Submit a form to login in the wiki
        page = page.form("userlogin").submit
        page = page.link_with(:text => "Welcome").click

        # find the page and edit it
        calendar_page = configuration["WikiPage"] % year
        page = agent.get(calendar_page)
        page = page.link_with(:href => /action=edit&section=#{month+1}/).click

        # insert the new value for the table
        form = page.form("editform")
        form.wpTextbox1 = content
        # submit
        agent.submit(form, form.buttons.first)

Hope this helps,

Jesus.