Everything about nothing: problem

Showing posts with label problem. Show all posts

Thursday, January 17, 2013

USB cable and strange behavior with disk in enclosure...

I think one of disks in USB disk enclosure I had just got broken because of faulty, or something, USB cable. Now, I don't know how it is possible, nor what exactly happened, but I have a strong feeling that I'm right. Namely, what happened is that when I plugged the cable into the enclosure I heard strange sounds, like the heads are trying to move but are being retracted back to initial position; a series of clicks, about a second apart. That happened almost every time I used that cable. At first, I thought that the problem is that the USB ports are USB3.0 while enclosure is USB2.0 and something is wrong with currents or who know what. But googling didn't turned anything about that. Then, I tried another one and disk worked normally. WTF?!

Well, I found out that when power source isn't strong enough the symptoms are clicking that's heard in the disk. In that case you should unplug the disk as soon as possible. Also, you probably received additional cable with a caddy that will allow you to solve this problem. What happened in my case, probably, is that the cable is somehow faulty and probably decreased current so that disk didn't have enough power.

Thursday, October 25, 2012

Blogger: A Nightmare...

Well, I have to say that blogger is becoming a nightmare for me. I already wrote about versioning feature missing, but now I have four more complaints.

First, try to open some existing draft post and press ^Z constantly. What will happen is that at some point your complete post will disappear!? Now, your reflex will be to close edit tab without saving changes. But, that is a big mistake! Autosafe feature already kicked in and, no matter that you've said it's OK to lose changes, the post is gone for good. In other words autosafe feature should save independent copy of post, not overwrite the existing until I click Save button!

And while I'm at that, second thing that annoys me is that I sometimes click on some draft in order to see what I wrote in it, and autosafe feature saves unchanged version but it also updates a timestamp and this post comes to the top. I don't want it to be on top if I didn't change anything in it!

Third, if I change something, and later I'm not satisfied, there is no way I can revert changes. There is no way to make snapshots of the post.

Finally, the editor itself is catastrophic! It is simple, that's true, but that is everything that is positive about it. Sometimes, it will not allow you to add space, and frequently what you see in edit mode will be different from what you see in preview mode, not to mention different font sizes! One thing that it does is very annoying. If you click on one paragraph and make formatting changes (e.g. bold, font size) and then you go to the later part of the text (e.g. two paragraphs lower) it transfers formatting even though it is completely different!

Google! Can we get a better Blogging tool?! Or otherwise, I'll seriously start to consider switch to WordPress...

Disclaimer: I have to say that I'm using Firefox 16 with different plugins, among others NoScript and RequestPolicy which might influence Blogger's behavior. But I, as an ordinary user, don't have time to investigate this, and, I think NoScript is important component of my protection.

Sunday, October 7, 2012

Word kao pisaća mašina...

Dakle, jednu od stvari koju moram napraviti je procjena rizika korištenja jedne lokacije kao pričuvnog računalnog centra. Možda o tome napišem nešto u jednom drugom postu. Ono o čemu ovdje želim pisati je o neznanju korištenja Worda od strane uprave Grada Zagreba. Naime, jedan od bitnih dokumenata koji koristim u procjeni rizika je Procjena ugroženosti stanovništva, materijalnih i kulturnih dobara i okoliša od katastrofa i velikih nesreća za područje Grada Zagreba. Zakonska je obveza svih gradova (u biti, ne da mi se čitati zakon pa je moguće i da sela i tko zna tko ne ima istu obavezu) donošenje takvih procjena. Jednostavnom pretragom u Google-u ćete za čas pronaći takve procjene i za druge gradove. Moram reći da se radi o zanimljivim, a da i ne govorim koliko bitnim, dokumentima.

Međutim, razlog zašto sam se odlučio na ovaj post je konkretno dokument za Grad Zagreb koji je pisan u Wordu. Pregledavajući malo taj dokument shvatio sam da je pisan tako što je Word korišten kao pisaća mašina, drugim riječima, netko je mukotrpno formatirao svaki paragraf. Dobro, možda se malo provukao s kopiranjima i sličnim, ali upotrebi stilova nema ni traga ni glasa. Numeracija poglavlja je obavljena ručno (i postoji pogreška jer od 1.1.2 se prelazi na 1.1.3.1!), pobrojavanje je također obavljeno ručno, slike su isformatirane katastrofalno. Navikao sam inače da studenti koriste Word na takav način, ali da i profesionalci to isto rade to mi je nepojmljivo. Tim više što je Word napravljen tako da, ispravnim korištenjem, štedi značajne količine vremena u obradi teksta.

Konkretno što me je izbacilo iz takta je potreba za sadržajem. Naime, htio sam gledajući sadržaj dobiti pregled i steći dojam o cjelokupnom dokumentu. No, to je bilo jednostavno nemoguće s postojećom verzijom. Iz tog razloga odlučio sam da ću malo preformatirati dokument, što sam u konačnici i napravio te ću ga u jednom trenutku postaviti i na Web da ga drugi mogu dohvatiti.

Moja poruka gradskoj upravi je pošaljite ljude na tečaj korištenja Worda jer se očito radi o priučenim ljudima koji su vjerojatno dobar dio života piskarali po pisaćim mašinama i onda im je računalo dano kao zamjena za pisaću mašinu!

Tuesday, September 18, 2012

Fedora 17 and VMWare Workstation 9...

I just upgraded VMWare Workstation to version 9. Everything went fine except I couldn't start virtual machines. :) Each start crashed the machine with some nasty kernel error. :) A quick googling reviled this link. Basically, you have to download patch, unpack it and run installation script from within.

The only problem I had was that script, somehow, thought that it was already applied:

# ./patch-modules_3.5.0.sh
/usr/lib/vmware/modules/source/.patched found. You have already patched your sources. Exiting

What happened actually is that previous patches I applied left this file and because of some error (I didn't investigate details) the script was confused. So, I simply removed offending file (/usr/lib/vmware/modules/source/.patched) and restarted script.

Wednesday, August 22, 2012

F17 and Python setuptools mixup...

For some time now I was having problems with Python's setuptools package. When I downloaded some package and tried to install it with the usual:

python setup.py install

I would receive the following error:

Traceback (most recent call last):
File "setup.py", line 1, in
    from setuptools import setup, find_packages
File "/usr/lib/python2.7/site-packages/setuptools/__init__.py", line 2, in
    from setuptools.extension import Extension, Library
File "/usr/lib/python2.7/site-packages/setuptools/extension.py", line 5, in
    from setuptools.dist import _get_unpatched
File "/usr/lib/python2.7/site-packages/setuptools/dist.py", line 6, in
    from setuptools.command.install import install
File "/usr/lib/python2.7/site-packages/setuptools/command/__init__.py", line 8, in
    from setuptools.command import install_scripts
File "/usr/lib/python2.7/site-packages/setuptools/command/install_scripts.py", line 3, in
    from pkg_resources import Distribution, PathMetadata, ensure_directory
File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 2755, in
    add_activation_listener(lambda dist: dist.activate())
File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 704, in subscribe
    callback(dist)
File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 2755, in
    add_activation_listener(lambda dist: dist.activate())
File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 2258, in activate
    map(declare_namespace, self._get_metadata('namespace_packages.txt'))
File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 1847, in declare_namespace
    _handle_ns(packageName, path_item)
File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 1817, in _handle_ns
    loader.load_module(packageName); module.__path__ = path
File "/usr/lib64/python2.7/pkgutil.py", line 246, in load_module
    mod = imp.load_module(fullname, self.file, self.filename, self.etc)
File "/usr/lib/python2.7/site-packages/chameleon/__init__.py", line 1, in
    from .zpt.template import PageTemplate
File "/usr/lib/python2.7/site-packages/chameleon/zpt/template.py", line 9, in
    from ..i18n import fast_translate
ImportError: cannot import name fast_translate

I already managed to solve this problem some time ago, but it reappeared and it was driving me nuts! And I forgot what I did! Googling didn't tell me a lot. So, I had to look into the code.

I had to search modules on disk, use print to see which one is actually loaded, and rpm/yum to query packages, but to cut the story short, in the end I found out that I had some dangling files and directories, probably because I did upgrade instead of clean install.The following ones were offending ones:

/usr/lib/python2.7/site-packages/chameleon/ast
/usr/lib/python2.7/site-packages/chameleon/core
/usr/lib/python2.7/site-packages/chameleon/genshi
/usr/lib/python2.7/site-packages/chameleon/i18n

This is easy to verify. Using 'rpm -qf <dir_or_file' shows for all of them (including files in those directories) that they are not owned by any package. So, I simply removed then and suddenly everything worked!

Wednesday, August 1, 2012

Google Chrome yum repository...

Yesterday I noticed that yum repository on Google servers with packages for Chrome doesn't work. I see this in yum output when I run update process:

google-chrome | 951 B 00:00 !!!

Note that three exclamation marks at the end! Also, once I received one other error message, but I didnt' write it down, so I don't remember exactly what it was. Anyway, checking manually URL of the repository shows that it is not available:

# lftp http://dl.google.com/linux/chrome/rpm/stable/x86_64
cd: Access failed: 404 Not Found (/linux/chrome/rpm/stable/x86_64)

But, it turns out that there is no linux repository at all:

# lftp http://dl.google.com/linux/
cd: Access failed: 404 Not Found (/linux)

I did some Googling and what puzzles me is that there is nothing about this change. I found several posts in the past that the repository isn't available, but that was over a year ago, and in the mean time repository was restored. Also, there are some more recent posts, but again, those were solved!

I checked my update logs and found that the last update was on July, 27th, so is it possible that the URL will be restored as it was in the past?

Friday, June 29, 2012

BIND and network unreachable messages...

Sometimes you'll see messages like the following ones in your log file (messages are slightly obfuscated to protect innocent :)):

Jun 29 14:32:11 someserver named[1459]: error (network unreachable) resolving 'www.eolprocess.com/A/IN': 2001:503:a83e::2:30#53
Jun 29 14:32:11 someserver named[1459]: error (network unreachable) resolving 'www.eolprocess.com/A/IN': 2001:503:231d::2:30#53

What these messages say is that network that contains address 2001:503:231d::2:30 is unreachable. So, what's happening?

The problem is that all modern operating systems support IPv6 out of the box. The same is for growing number of software packages, among them is BIND too. So, operating system configures IPv6 address on interface and application thinks that IPv6 works and configures it. But, IPv6 doesn't work outside of the local network (there is no IPv6 capable router) so, IPv6 addresses, unless in local networks, are unreachable.

So, you might ask now: but everything otherwise works, why is this case special! Well, the problem is that some DNS servers, anywhere in hierarchy, support IPv6, but not all. And when our resolver gets IPv6 address in response, it defaults to it and ignores IPv4. It obviously can not reach it so it logs a message and then tries IPv4. Once again, note that this IPv6 address can pop up anywhere in hierarchy, it isn't necessary to be on the last DNS server. In this concrete case name server for eolprocess.com doesn't support IPv6, but some name server for the top level com domain do support it!

To prevent those messages from appearing add option -4 to bind during startup. On CentOS (Fedora/RHEL) add or modify the line OPTIONS in /etc/sysconfig/named so that it includes option -4, i.e.

OPTIONS="-4"

Wednesday, June 6, 2012

Fedora 17, XeLaTeX and Beamer...

I decided to write this post about XeLaTeX and my attempts to process one presentation written using Beamer class into PDF file. The reason is that I certainly lost few hours trying to resolve the set of errors, and maybe someone finds this information/experience helpful.

The first problem I had after starting xelatex was the following error:

! Package pgfsys Error: Driver file ``pgfsys-xetex.def'' not found..

See the pgfsys package documentation for explanation.
Type H for immediate help.
...

l.847 ...ver file ``\pgfsysdriver'' not found.}{}}

? X
No pages of output.

So, I tried to find out which package provides this file using yum. But, no luck there. Then, I continued with google-fu. I found many links, but nothing that could be applied to my case, until I stumbled on this solution which basically tells me to copy one existing pgfsys file into the missing file, without any changes. What this post doesn't tell you explicitely is that you have to run texhash after that so that tex's file database is rebuilt.

When I again started xelatex, this time it seemed as if it will work! It was processing files and constantly giving some output. But no! At one point it stopped and I got the following error:

(/usr/share/texmf/tex/latex/beamer/themes/font/beamerfontthemeserif.sty)
! Font \zf@basefont="TeX Gyre Cursor" at 10.0pt not loadable: Metric (TFM) file
or installed font not found.
\zf@fontspec ...ntname \zf@suffix " at \f@size pt
\unless \ifzf@icu \zf@set@...
l.80 ...d,NoCommon,NoContextual}]{TeX Gyre Cursor}

?

Back to google-fu! Still, for this one it was easier to identify what's the problem. Namely, the font is missing (Metric (TFM) file or installed font not found). Again, using yum I found out that this font isn't packaged with Fedora 17 but it can be obtained from here. I downloaded zip archive, unpacked it, and moved all the font files into /usr/share/fonts/otf directory (before that I created otf subdirectory as it didn't exist). Afterwards, everything worked and presentation was successfully created. Note that I initially tried to place those fonts into /usr/share/texmf/fonts/opentype/public/tex-gyre/ directory. But that didn't work for unknown reason!

And guess what?! Now the source tex file was successfully transformed into PDF file.

While googling to find out solutions to problems I had, I realized that TeXLive distribution in Fedora 17 is 2007, while the newest version is 2011 with 2012 in development. Furthermore, I found out that there is a plan for Fedora 18 to update TeXLive to at least version 2012. Since beta (or alpha) packages are readily available for F17, and I felt bold enough, I decided to give it a try. So I installed repository per instructions on a given page and run 'yum update', and after a long wait, I tried again to run xelatex on the beamer presentation. This time I was completely out of luck:

kpathsea: Running mktexfmt xelatex.fmt
I can't find the format file `xelatex.fmt'!

For some reason, there was no xelatex.fmt file. As usual, first thing was to run 'yum whatprovides' to find out which package provides that file. But, none does! It seems to be a 'bug' in packages, so I had to revert texlive distribution. This was also a kind of a trial and error process, but to cut the story short what I did was: disabled texlive repository with experimental packages. Then, I used rpm to force removal of existing texlive packages, and then using yum I reinstalled the old ones (note that yum downgrade didn't work!). Finally, I used 'yum check/install' to fix any unresolved dependencies.

Saturday, January 21, 2012

Cisco's bug in ARP behavior - A story of firewall configuration oddisey...

Well, a very interesting situation happened to me. I was changing an old firewall with a new one and after switching I added secondary IP address to a firewall's public interface with the intention that all the traffic that was comming to that secondary IP address is redirected to one internal server. All nice except that it didn't work! I created such configuration numerous times and it simply had to work, but it didn't! The similar functionality for the primary IP address worked as expected, but secondary didn't! This was driving me nuts, until I finally figured out what was happening. This is a story of how I finally resolved this problem.

In cases when something isn't working as expected I use Wireshark, or better yet, tcpdump, and start to watch what's happening with packets on interfaces. I use tcpdump as it is more ubiquitous than Wireshark, i.e. available with default install on many operating systems, Unix like I mean. Now using this tool has one drawback, at least on Linux. The point where it catches packets is before PREROUTING chain and there is no way (at least I don't now how) to see packet flows between different chains. This is actually a restriction of Linux kernel so any other tool (Wireshark included) will behave the same.

Additional complication in this particular case was that to debug firewall I had to run it in production. This makes things quite complicated because when you run something in production that doesn't (fully) work as expected there will be many unhappy people and in the end you don't have much time to experiment, you have to revert old firewall so that people can continue to work. In the end this translates into longer debug period as you have relatively short time windows in which you can debug. Oh yeah, and it didn't helped that this was done at 8pm, outside of usual working hours, for various reasons I won't go into now.

So, using tcpdump I saw that packets with secondary address were reaching the firewall interface and they mysteriously disappeared within a box! Naturally, based on that I concluded that something strange is happening within Linux itself.

I have to admit that usually this would be a far reached hypothesis as it would mean that there is a bug in a relatively simple NAT configuration and it had to be due to the bug which would certainly be known. Quick googling revealed nothing at all and added a further confirmation that this hypothesis isn't good. But what kept me going in that direction was that I decided to use Firewall Builder as a tool to manage firewall and firewall rules. This was my first use of this tool ever (very good tool by the way!). The reason I selected that tool was that this firewall is used by one customer for which I intended to allow him to change rules by himself (so that I have less work to do :)). I wasn't particularly happy with the way rules are generated by that tool, and so I suspected that maybe it messed something, or I didn't define rules as it expects me to. To see if this is true, I flushed all the rules on the firewall and quickly generated a test case by manually entering iptables rules. Well, it turned out that it doesn't work either.

The next hypothesis was that FWBuilder somehow messed something within /proc filesystem. For example, it could be that I told him to be overlay restrictive. But trying out different combinations and poking throughout /proc/sys/net didn't help either, the things were not working and that was it!

Finally, at a moment of despair I started again tcpdump but this time I requested it to show me MAC addresses too. And then I noticed that destination MAC address doesn't belong to firewall! I rarely use this mode of tcpdump operation as L2 simply works, but now I realized what the problem was. The core of the problem was that the router (which is some Cisco) didn't change MAC address assigned to secondary IP address when I switched firewall. What I would usually do in such situation is to restart Cisco. But, since this router was within cabinet that I didn't have access to, and also I didn't have access to its power supply, it was not an option. Yet, it turned out that it is easy to "persuade" some device to change MAC address, just send it a gratuitous ARP response:

arping -s myaddr -I eth0 ciscoaddr

Both addresses are IP addresses, with myaddr being the address for which I want to change MAC address and ciscoaddr is device where I want this mapping to be changed. And that was it! Cisco now had correct MAC address and thing worked as expected. The primary address worked correctly because firewall itself sent a packet to Cisco and in that way changed MAC address belonging to primary IP address.

To conclude, this is a short version of everything that happened as I also used iptables' logging functionality (that obviously didn't help, as there was nothing to log for a start :)). Finally, there's left only one question to answer, i.e. How did I saw packets with "wrong" MAC address it tcpdump output? First, switch shouldn't send it to me, and second, interface should drop them before OS sees them (and by extension tcpdump). Well, switch was sending those frames to me because it's entry for MAC address expired and it didn't know where the old MAC address is, so it sent every frame to all the outputs/ports (apart from the receiving one, of course). The reason input interface didn't drop the packet was that sniffing tool places interface into promiscuous mode, and so it sees every frame that reaches it. Easy, and interesting how things combine to create problem and false clues. :)

Tuesday, December 20, 2011

Problem with inactive agent in OSSEC Web Interface

I was just debugging OSSEC Web interface. Namely, it incorrectly showed that one host was not responding event though there were log entries that showed otherwise. The problem was that this particular host was transferred to another network, and thus, its address was changed.

I figured out that the list of available agents within Web interface is generated from a files found in /var/ossec/queue/agent-info directory. There, you'll find one file per agent. The file name itself consists of agent name and IP address separated by a single dash. In order to display if an agent is connected or not the PHP code from Web interface (which itself is placed in /usr/share/ossec-wui directory) obtains time stamp of a file belonging to a particular client and if this time stamp is younger that 20 minutes, it proclaims agent OK, otherwise, it shows it as inaccessible.

In this case it turned out that the old agent wasn't removed using manage_client tool (selecting option R, for remove). So, the old information remained, which wasn't updated and thus the Web interface reported inactive agent.

Saturday, November 19, 2011

NVidia Linux drivers...

Well, after several hard lockups during past several months and few other bugs in even longer time, I'll be certain next time I buy computer to try very hard to buy one with ATI video card. I can not describe how much I hate binary driver from NVidia, and by extension, NVidia! Here is why I hate it:

I have to manually install it. This means that every time either kernel or X driver is updated, I have to go to a single user mode or runlevel 3 to compile NVidia driver. Yeah, I know, there are rpm packages in rpmforge or similar repository, but for some reason I wasn't satisfied and I don't use it for a long time now. Nevertheless, even if I were to use it, it want help for the next two problems!
It lockups, and the lockups are hard, i.e. nothing but power button helps. This happens regularly without any signs before, suddenly system is freezed and nothing works! Nor it is possible to login via network to restart the computer!
Some programs, most frequently LibreOffice, have problems with redrawing the screen. At first, I thought that it is a bug in those programs, but now I'm convinced that the problem is in the video driver.

And not to forget, when I had ATI card (on Lenovo W500) dual monitors and rotations worked fantastically! And all could be controlled from a small applet in the tray. With NVidia, nothing worked so flawlessly.

I tried to download newer driver from NVidia ftp site. That was 290.06 at the time I was looking. But it locked up machine even more frequently. Now, it is marked as beta and it is, in some way, expected. So I went on NVidia site to see which version is considered stable, and that was 285.05.09, the one I had problems with in the start and the one I tried to replace!

The reason I went with NVidia binary driver was gnome-shell. Namely, when Fedora switched to gnome-shell it required 3D support and nouveau didn't support 3D capabilities of my graphic card (Quadro FX 880M on Lenovo W510). That meant using fall-back GUI that wasn't usable for me, and besides, I wanted to try gnome-shell.

So, after all this, I decided to try again nouveau driver. And for that reason I had to disable NVidia driver. At first I thought that it will be simple, but it turned out not to be!

Disabling NVidia propriatery driver

First I switched to runlevel 3 in order to turn off graphics subsystem (i.e. X11):

init 3

Ok, the first thing I did is that I blacklisted nvidia driver and removed nouveau from blacklist. This is done in /etc/modprobe.d/blacklist.conf file. In there, you'll find the following line:

blacklist nouveau

Comment out (or remove) that line, and add the folowing one:

blacklist nvidia

Since I didn't want to reboot machine, I also manually removed nvidia driver module from the kernel using the following command:

rmmod nvidia

To be sure that nvidia will not be loaded during boot, I also recreated initramfs image using:

dracut --force

Finally, I changed /etc/X11/xorg.conf. There, you'll find the following line:

Driver "nvidia"

That one I changed into

Driver "nouveau"

Ok, now I tried to switch to runlevel 5, i.e. to turn on X11, but it didn't work. The problem was that NVidia messed up Mesa libraries. So, it was clear it isn't possible to have both, NVidia propriatery driver and nouveau, in the same time. At least not so easy. So, I decided to remove NVidia propriatery driver. First, I switched again to runlevel 3, and then I run the following command:

nvidia-uninstall

It started, complained about changed setup, but eventually did what it is supposed to do. I also reinstalled mesa libraries:

yum reinstall mesa-libGL\*

And, I again tried to switch to runlevel 5. This tome I was greated with GDM login, but the resolution was probably the lowest possible!!! I thought that I can login and start display configuration in order to change resolution. But that didn't work! After some twidling and googling, first I tried to add the following line to xorg.conf:

Modes "1600x900"

Note that 1600x900 is the maximum resolution supported on my laptop. I placed that line in Screen section, Display subsection. After trying again, it even didn't work! Then, I remembered that new X server is autoconfigured so I removed xorg.conf all together, and the resolution was correct this time!
Ok, I tried to login, but now something happened to gnome-shell. Nautils worked (because I saw desktop), and Getting Things Gnome also started, but in the background I was greeted with gnome-shell error message that something went wrong and all I could do is to click OK (luckily moving windows worked or otherwise because of Getting Things Gnome overlaping with OK button I wouldn't be able to reach that button). When I clicked it I was back to login screen. So, what's now wrong!?

I connected from another machine to monitor /var/log/message file and during login procedure I noticed the following error message:

gnome-session[25479]: WARNING: App 'gnome-shell.desktop' respawning too quickly

Ok, the problem is gnome-shell, but why!? I decided to enter again graphicsless mode (init 3) and to start X from the command line, using startx command. Now, I spotted the following error:

X Error of failed request: GLXUnsupportedPrivateRequest

Actually, that is only the part of the error (as I didn't saved it), but it gave me a hint. Library was issuing some X server command that X server didn't understand. And, that is some private command, probably specific to each driver. So, NVidia didn't properly deinstall something. But, what? I again went to check if all packages are correct. mesa-libGL and mesa-libGLU were good (as reported by rpm -q --verify command). Then I checked all X libraries and programs:

for i in `rpm -qa xorg\*`; do echo; echo; echo $i; rpm -q --verify $i; done

Only xorg-x11-server-Xorg-1.10.4-1.fc15.x86_64 had some inconsistency, so I reinstalled it and tried again (go to graphics mode, try to login in, go back in text mode). It didn't work.

So, I didn't know what to look more, and decided to reinstall again mesa libraries. Then, I checked again with 'rpm -q --verify' command and now, I noticed something strange, the links were not right (signified by letter L in the output of the rpm command). So I went to /usr/lib64 directory, listed libraries that start with libGL and immediately spotted the problem. Namely, libGL.so and libGL.so.1 were symbolic links that were pointing to NVidia libraries, not the ones that were installed with Mesa. It turned out that the installation process didn's recreate soft links!!! Furthermore, NVidia left at least five versions of its own libraries (that many times I installed new NVidia binary driver). So, I removed NVidia libraries, recreated links to point to right libraries, and tried to login again. This time, everything worked as expected!

So, we'll see now if this is going to be more stable or not.

Edit 1
Ok, at least one of those lockups wasn't related to NVidia. It happened that I, in the same time, upgraded kernel and NVidia driver. Starting from that moment I saw new kernel oops, and since I already had problems with NVidia, I wrongly concluded that it was NVidia's fault. But now, I suspect on wireless network driver. So, I downgraded kernel version to see what will happen.

Edit 2
More than a day working without a problem. So, it seems that it was the combination of newer kernel (wireless) and NVidia binary driver!

Edit 3 - 20111123
Several days now everything is working. Once I had a problem while suspending laptop that made me reset machine. Oh, and detection and control of second monitor works much better than with NVidia proprietary (i.e. binary) driver.

Thursday, September 15, 2011

CentOS 6.1...

... or not!

Ok, before I continue, first a disclaimer. This is my personal view of the whole situation around CentOS. If you disagree, that's ok. If you wish you can argue via comments, but please, keep to the point and give arguments for you statements. Don't troll. Oh, and if you spot grammatical and similar errors, please email me corrections.

So, this whole situation is frustrating, a lot! First, there was a long long wait for 6.0 to appear. According to this Wikipedia article, about 240 days! Then, finally 6.0 appeared, and everyone was very happy. But now, there are no updates and no 6.1. And it's already about 120 days behind. Not to mention some serious bugs that are present in fully patched CentOS 6.0 installation. And note well, if you plan to use CentOS and security is important (e.g. Internet facing Web server), don't use CentOS 6. If you desperately need CentOS, use version 5.

On the other hand, Scientific Linux managed to track a Prominent North American Distributor quite well. Obviously, no matter what CentOS developers claim, it is possible to be faster.

Now, whenever someone says on mailing lists or forums: Hey, this is a problem, two things happen (actually, three because there are those that agree). First, there are those that repeat constantly that you get as much as you pay for! Second, there are those that repeat that '... the CentOS developers have to work on CentOS and it's not good for them to waste time arguing about development process, so don't say those thing here!' But, there is one big BUT! And that is that at least some of the people that complain at the same time offer help! Also, if something is so important like CentOS is to many people and companies, developers can not behave as it if doesn't concern them. Well, they can but then they risk the project failure!

And failure of CentOS is not only the problem for CentOS and it's users, but also for RHEL. The reason is simple, the majority that use CentOS won't buy RHEL anyway, so they'll search alternative. First alternative, Scientific Linux has a main problem in its name, it doesn't sound like something that some serious business would run, no matter how this argument is actually stupid. Anyway, what's the next alternative? Oracle's Unbreakable Linux. And, this means Oracle's distribution will have more users, and thus will be more commercially successful. But even if CentOS users do not go with Oracle's Linux, the only alternative they have is to go with Ubunut or Debian (not to mention other Unixes like FreeBSD) and those are completely different types which means that the ones that go that route, won't ever return to RHEL types.

For completeness, I have to say that some things did improve. For example, based on critiques they received, Atrium site was introduced to track development. Well, they improved microscopically, because if you look into this site you'll find that it's not used. Calendar is empty. There are some sporadic comments, many of them old, and that's it. Yes, I know, developers started to talk a bit more about what's happening in different ways, e.g. twitter. But that's not enough!

Problem

So, where is the problem? I don't know, to be honest, because the project is all but transparent. But there could be several problems.

Lack of people

The main problem is probably the lack of people working on CentOS. There are, as far as I understand, core developers and QA team. On main CentOS page there is a list of people with their functions (Select Information, The CentOS Team, Members). There are 11 people listed of which 6 don't participate in core development (infrastructure, wiki, Web, QA), four are listed as supporting older versions of CentOS (2, 3, 4), and this leaves one working on CentOS 5 and 6. The information there is certainly old (maybe the Web gay left?), but nevertheless, for such an important project more people are necessary.

To solve this problem CentOS team has to introduce more people into the project. But how will they do that when all of them (i.e. CentOS team) are heavily busy trying to catch RHEL and they don't have enough time to do anything else? The best way, IMHO, is to talk to vendors that use CentOS and ask them to provide payed engineers. And, with those people try to create community that will recruit new project members.

Decision/Strategic problem

Under this title I'll put the decision that was made during RHEL 6 Beta program. The decision was not to do anything and wait for RHEL release. The reason is to help RHEL better test new version of RHEL. Well, that's certainly good intent, but development of CentOS 6 had to progress intermediately because there was no way CentOS team could deliver beta version of CentOS in parallel with RHEL's beta version. In the end, they lost a lot of precious time!

Collaboration with Scientific Linux and others

This subject was beaten to the death, and nothing happened. I don't know where the problem is. The CentOS developers didn't say anything about it - did they try to approach SL developers? Was there any discussions? What they think? Nothing! What rings in my head is the perceived problem of compatibility, and, here we are with a next problem.

Strict compatibility

Actually, this is not a problem per se. CentOS has a mission to be as close to upstream as possible, and this is advantage actually. But this advantage is turning to a great disadvantage since CentOS is late because of that. The one reason cited why CentOS couldn't work with SL is that SL doesn't care much about strict compatibility. And this, I believe, is fallacy. Both projects have to remove RH's trademarks and such from packages and at least here there is possibility to cooperate.

Next, because of strict compatibility no package updates can be provided without upstream. The reason is that package name in that case (and version) has to be changed and this could bring two problems. The first is that, when upstream releases update, there could be divergence and collision. And second, package names/versions differ what might confuse some third party software made explicitly for RHEL.

As for this, I don't understand why yum can not be patched (or plugin provided) that will allow packages to have same name but also release date will be taken into account? Also, why there couldn't be pre-releases of CentOS, named for example 6.1.test0, 6.1.test1? With all the packages the same, but with different release dates that will be taken care of by yum?

Finally, people that use CentOS don't need support because they now how to do it. And if they don't, who cares?

Conclusion

In conclusion I'll say that CentOS isn't going the right direction. CentOS team has to do something and do it as quick as possible. Maybe the most important is to hire some capable project manager that will change all this and open up more.

Saturday, January 27, 2007

VIP UMTS/EDGE/whatever...

Ok, I prepared this post while I was trying to connect to the Internet and I was very angry! For the problems I have I blame VIP and this post summarizes my experiences with them. I doubt that the others are different. And yes, for those that don't know, VIP is Internet provider in Croatia.

To buy the card and subscription was the least problem and it was quick. Although, I heard that now they require contract for two years instead of one. Probably because they were giving devices for 1 kuna and it turned out that it doesn't pay off. Namely, you could have PCMCIA card that costs about 1800 kuna (cca. 7.5kuna is 1EUR) for already mentioned 1 kuna. Lowest subscription per month is 50 kn, so it turns out that you could have a device for 600kn! Clearly, math was not on their side in that one.

But before I took subscription, the first problem was finding out if devices they offer work on Linux. And finding that information was impossible. Even though I contacted technical support through regular channels and via some friends. So, I took device not knowing if it works and hoping at best. It turned out that with a bit of luck and some hacking it worked! The device is Nozomi, and it can be recognized by NZ letters in serial number. More on that you can find on my homepage.

The second problem was with connecting to VIP. Namely, first it turned out I have to use PAP, and the second problem was that PAP always returns success code, no matter if it succeeded or not!?

After finally overcoming and that obstacle, the next one was random disconnections. Not only that, but I also had problems trying to connect or reconnect to VIP. And now, story leads us to the help desk service. When I called them I never expected to help me resolve problem. How could they when they probably never saw Linux!? I just wanted to find out if they know of some current problems in the network so that I know if the problem is with me, or with them. Well, I never found out if they have a problem. Also, sometimes they blamed CARNet. And the story usually starts with something like: “What you see in the application about signal strength...” and after telling them that I don't have that application, all the further conversation stops. And so much about help desk. Well, to be honest, those interruptions are now rare, but still, they can become very frustrating and actually, they are the reason I'm writing this.

And finally, something about the speed. It's not even close to the promised speed of UMTS! It seems to be good in some larger towns, but at the moment you are in suburbs it drops sharply! It never goes above 50 kbps (that's kilobits), and usually it's around 20 kbps!

All in all, I started to think about using DSL. But it's another story....

Everything about nothing