Patch-free-Processing » Quotations

_][_
[_Overview_]
[_<<_|_>>_]

Advanced Search...

SETI@home Top 2%

Quotations
July 2000

These quotations come from postings made on usenet...

Name - Date in sci.astro.seti (s.a.s.)/alt.sci.seti (a.s.s.)/other, topic: quoted text

« prev ] 2000.07 [ next »

This month's most interesting messages:

All from the SETI@home team members at Berkeley. Non-official statements but interesting anyway.

S@H Beta Account - 2000.07.05 in s.a.s./a.s.s., an announcement with subject 'SETI@home v3.0 Beta Testing' is made:

Hello everyone,

        Public beta testing for the SETI@home client v3.0 is beginning now. We
are accepting a limited number of applications to beta test the newest
SETI@home client.

        To sign up, please send an email to beta@setiathome.berkeley.edu with
the subject line "Beta Test". You will be contacted shortly thereafter
with instructions regarding testing. Thank you!

Eric J. Korpela - 2000.07.07 in s.a.s., thresholds for pulses and triplets:

For pulses (bp_score) 1.0: Every step of 0.04 above that means the pulse is twice as unlikely to arise from random noise. A score of 1.16 or better should only show up in one of 13 work units due to random noise. A score of two should only happen (due to random noise) in one out of 28 million work units. Of course RFI could easily cause a score of two because it's not random.

For triplets (bt_score) 7.75: This represents the minimum power that three evenly spaced spikes need to have in order to be reported. The reported value is the power of the highest of the three spikes. The probability of this happening in a given time chunk is a bit more difficult to express because it depends upon the length of the array. This threshold represents a bit less than one triplet per work unit.

Eric J. Korpela - 2000.07.08 in a.s.s., beta test version 2.66:

[ >8 ] there appears to be a contention problem between the threads. There's a recent change that could have caused this, and we're testing some machines with that portion altered. If that fixes it, we'll have a new beta version next week. If anyones seen this problem on anything besides Windows, let me know. The anticipated runtime difference between this and the old version should be around 25% [ >8 ] more than 2.04.

Eric J. Korpela - 2000.07.08 in a.s.s., talking in pulses and triplets:

The [ >8 ] Triplet is displayed a single time with vertical lines to mark the three peaks.

A triplet is any three evenly spaced peaks above a certain threshold. A pulse can have any number of repeated peaks (between 3 and 6826), and is displayed as the pulse profile averaged over the all the peaks present in the data. The triplet finder is more sensitive to a few bright peaks. The pulse finder is more sensitive to faint peaks that have been repeated a large number of times.

Eric J. Korpela - 2000.07.11 in a.s.s., pulses in outfile.sah:

[ >8 ] the prof field only contains a value if len_prof is less than 256.

Eric J. Korpela - 2000.07.11 in a.s.s., pulses in outfile.sah and state.sah:

In outfile.sah, the value of 'power' is 1.0000 greater than the 'bp_power' value in the state.sah file. [ >8 ] The internal value of the pulse power if the peak power measured from the zero point. The reported value is the peak power measured from the mean power. Since they are both normalized to the mean power, the reported power should be always 1.0000 [ >8 ] above [ >8 ] the value in the state file.

Eric J. Korpela - 2000.07.11 in a.s.s., answering why the second half of the pulse graphic is an exact duplicate of the first half:

This is an old Jedi mind trick that astronomers have always used when representing periodic data. If you plot two full periods, it's more obvious that the data you are plotting is periodic. Of course, it's something that you need to be careful of, because all data looks periodic when you plot it twice. :-)

Eric J. Korpela - 2000.07.11 in s.a.s., best result graphics:

The client [ >8 ] should alternate between all three (best gaussian, best pulse, best triplet) at 30 seconds per.

Eric J. Korpela - 2000.07.11 in a.s.s., about the pulse graphic:

We are plotting the actual data folded on itself over the pulse period. Here's an example of some (one bit) data folded on a period of 5. Here's the original data: 10010010101001010101000110101000010001001010101010 When you divide this data into groups of 5 add the groups to each other you get: 43373 When displaying a pulse the client subtracts the minimum value and displays it twice, so what would be displayed is: 1004010040 or

      _         _
     | |       | |
     | |       | |
_    | |  _    | |
 |___| |_| |___| |_

Eric

Eric J. Korpela - 2000.07.11 in a.s.s., pulses in outfile.sah:

There is a 256 byte limit for the prof field. [ >8 ] It's more for ability to display the pulse profiles on the web site than anything else. The reason we limited was to conserve database space, and to ease programming. Informix has a type called varchar that can hold up to 256 bytes. The longest possible pulse profile is 13654 bytes. To hold somethine that long we'd need to use a less space efficient variable type.

Eric J. Korpela - 2000.07.14 in a.s.s., with more about the pulse graphic and giving a beta update:

The output is [ >8 ] only a single pulse waveform, (i.e. the waveform averaged over all of the pulses present). The only reason it is displayed twice is to make it look periodic. We considered displaying it as many times as it occurs in the data, but that can be as high a number as 20480. (In fact the pulses that repeat the most times are the easiest to detect). The reason that we can't really display the original data (like my string of ones and zeros in the previous post) is that in general the pulse won't be visible to the naked eye in the raw data.

About the beta clients. [ >8 ] The 2.04 Mac client was very slow for some reason. There is a bug on the Mac, though that causes invalid pulse information to be displayed, and occasional crashes when this happens. If you see pulses displayed with scores that are zero, or slightly negative, that's the bug rearing it's ugly head. We're hoping to track it down soon.

We're putting the new Windows version (2.70) into limited release tonight. The bug was that the worker thread was waiting for graphics to be drawn before continuing. That was a bad idea. I can't promise a general release tomorrow, but I'm hoping that all works well tonight.

Eric J. Korpela - 2000.07.14 in a.s.s., when asked if the results from the beta testing are kept separate:

The spikes and gaussians go into the same database as normal results, because we're pretty sure that the beta does those properly. The pulses are being kept separate for now.

Right now there are no plans to [ >8 ] re-process some of the older work units that had interesting results.

Eric J. Korpela - 2000.07.14 in a.s.s., in an answer to someone trawling for the link to the download site of the beta:

The link is so secret, even I don't know it. :) I'll forward your post to Matt.

Eric

Eric J. Korpela - 2000.07.14 in a.s.s., explaining the difference between 'pulse' and 'triplet' (there's a doughnut to be won too):

The pulse finding algorithm works like this (in general, not quite in detail): The algorithm is called a folding algorithm. Suppose we have a data stream of 66 points that looks like this:

001100010010010011010001000100001111010110000110011101010011111100

The folding algorithm looks first for things with a period of N/3 or 22 samples by adding up the points in groups of 22 0011000100100100110100 0100010000111101011000 0110011101010011111100 ---------------------- 0221021201221212232200 Now we look in this folded stream for an event above a threshold. If there was a strong signal with a period of 22 samples, the peaks from those signals would line up and we would see a peak in the folded array. Now we take the folded array and fold it in half again to get a period of 11 samples: 02210212012 21212232200 ----------- 23422444212 And we look for peaks above a threshold. And again to get an (average) period of 5.5 samples (it gets tricky with non-integer periods, and I'm really not attempting to do a good explaination of this part) 234224 444212 ------ 678436 And again to get a period of 2.75 6 8 7 4 6 3 -------- 10 14 10 Then we go back to the original data and search on a slightly smaller period, in this case 21+2/3 ~= 21.6667. We do this by shifing our end point by one sample. Here's the original data again: 001100010010010011010001000100001111010110000110011101010011111100 Here's the samples we add together, note the last row was shifted by one. 0011000100100100110100 0100010000111101011000 0011001110101001111110 ---------------------- 0122011210312202232210 Now we fold that in half to search on a period of (21+2/3)/2 or about 10.83333 01220112103 12202232210 ----------- 13422344313 And so on. And so on. For a given array of length N, we search periods of N/(3*2^n) to N/(4*2^n) in period steps of 1/(3*2^n) with n=0 to log_2(N/3)-1 N/(4*2^n) to N/(5*2^n) in period steps of 1/(4*2^n) with n=0 to log_2(N/4)-1 N/(5*2^n) to N/(6*2^n) in period steps of 1/(5*2^n) with n=0 to log_2(N/5)-1 In principle, you could go on from there to N/(6*2^n) and onward, but you've reached a point of diminishing returns, most of the periods you would search would have already been covered. You only gain in sensitivity to pulse duration much smaller than the sampling rate, and since SETI@home is designed to be insensitive to things with large bandwidth, we probably wouldn't detect signals of that short a duration anyway. The more difficult thing is to design a good threshold for these signals. I'll buy a doughnut for anyone who can give me an analytical expression for the probability that a pulse will be detected in initially exponentially distributed random data at any given inital data length and number of folds done.

Eric

Matt Lebofsky - 2000.07.17 in s.a.s., beta test version 2.70:

As stated in the letter I sent to all the beta-testers, as well as on the download page, the windows speed problem has been found and fixed. Preliminary tests on our own home computers (and a small subset of beta testers) have shown the windows client returning to normal CPU times per work unit.

Eric J. Korpela - 2000.07.18 in a.s.s., doing the FFT routine part of seti:

The FFT routine [ >8 ] is not the same data being computed over and over again, but the data chirped (doppler shifted at a certain rate). We're looking into ways of avoiding some of the FFTs, (perhaps some way of applying the chirp to the data post transform) but we haven't yet found a way that preserves sensitivity. We're working on things like this for S@H 2, (let's keep our fingers crossed).

We do use a [ >8 ] different method [ >8 ] in certain areas, though. The threshold function of the pulse finder, for example, involves calculating the inverse of the complimentary incomplete gamma function. This is an expensive calculation, so we store values indexed by the parameters the function was given. That cuts about a hour off the run time of the beta client.

Hiram Clawson - 2000.07.20 in a.s.s., Carolyn's mass-old-timer-delurking:

I'm probably not a fair entry in this old timer contest since I am related to the team, but I was first testing the code in November of 1998 before I was related to the team. (The code was available at that time simply by looking for it.)

Currently my user_info states:
        register_time= 2451239.00000 (Mon Mar  1 12:00:00 1999)
        nresults=10972
        total_cpu=780978452.540498

which evidently is an artifical date no doubt due to early testing of the ports I was responsible for at that time. On the day when the project went live (17 May 99) and the first official stats began to accumulate, I was the first user to return results and was number 1 on the top 100 list for almost 30 hours. I am currently at 689th place, and 99.968%, processing anywhere from 20 to 35 WUs per day on about 20 CPUs ranging from 486/50Mhz to Pentium III/500Mhz.

--Hiram - WB6RSS

Eric J. Korpela - 2000.07.21 in a.s.s., Carolyn's mass-old-timer-delurking:

Name (and URL) Eric Korpela 
Results Received 1251
Total CPU Time 4.28 years
Average CPU Time per work unit 29 hr 57 min 36.9 sec
Last result returned: Thu Jul 20 12:01:12 2000 UTC

I honestly don't know when I registered. My user_info says 'register_time=0.0' which converts to 12 noon on Nov 24, 4714 B.C. Gregorian calendar. (In other words, we hadn't implemented that feature yet.) Of course, it's not fair for me to compare as I got to register long before the client was released. (I was the 2nd person to register. David was first.)

Eric

Eric J. Korpela - 2000.07.21 in a.s.s., nwus and nresults in user_info.sah:

We used to count both the number of work units we had sent you and how many results we had recieved. Due to database constraints we have stopped counting how many we've sent you. No nwus shouldn't be changing.

Eric J. Korpela - 2000.07.21 in a.s.s., many beta testers report that some workunits process quite a bit longer than the rest - bimodal work unit times with the 2.70 beta:

Let me guess, angle_range is a small number? 0.1 or less?

Under the old client these would have been 'fast' work units, because gaussian finding isn't done on them. It's a different situation on the new client, and we haven't yet decided what to do about it.

Let me explain the situation. After the FFT's are done, the data is broken up into chunks the width of a beam for pulse finding, if that chunk is longer than 15 points, and shorter than 40961 points, pulse finding is done. The problem is that length of those chunks is inversely proportional to the slew rate. Slow slew=longer chunks. At siderial slew rate, there are less than 16 points in a chunk for long FFTs (longer than 8k give or take). At zero slew rate, you get more than 15 points at a 64K transform. So for zero slew rate, we do a lot more pulse searches. We also do longer ones. The longest pulse array we search in a normal slew rate work unit is about 25000 points. The longest we do in a zero slew work unit is 32768 points. Longer arrays take longer to search. Both of these add up to a longer run time. Of course, the longer you watch the same point, the easier it is to detect a pulse, so the pulse search in these zero slew work units is more than twice as sensitive as the same search in a normal work unit. So the question is, is the additional time worth it?

Hiram Clawson - 2000.07.23 in s.a.s., UNIX client caching scripts:

Good Afternoon SETI Fans:

Today I updated the old scripts FetchCache and RunCache mentioned in the Berkeley FAQ at:

        http://setiathome.ssl.berkeley.edu/faq.html#q1.16

They are a bit safer and slightly more useful than before. The also utilize a secondary utility: http://jday.sourceforge.net/ which I find useful to do graphing of the WU processing times after the run logs are collected. (with additional awk/sed/perl/etc... scripting on the run logs, these are definately do it yourself utilities)

Eric Heien - 2000.07.28 in s.a.s., widely varying times for the 2.70 beta - a (bug)fix report:

We're pretty sure this is due to the pulse finding code running an excessive amount for workunits recorded while the telescope was not moving much. The code was just fixed, and in the next beta as well as the final 3.0 release, all workunits should take roughly the same amount of time on a given computer.

Also, the bug involving the client halting about 3% into the WU and restarting has been fixed. It was the result of a very tiny rounding error in some floating point units. The next beta will probably be coming out sometime next week, barring any significant problems.

Added: green - Snipped: [ >8 ]